Introduction
In a previous blog, we discussed how Paragon Pathfinder (formerly known as NorthStar Controller) greatly increases the level of automation in networks. Pathfinder does this by employing Closed-Loop Automation. Streaming telemetry sent by network devices gives Pathfinder real-time visibility of what is happening in the network enabling it to tune the paths of Label-Switched Paths (LSPs) according to the observed conditions. One of the most popular Closed-Loop Automation applications of Paragon Pathfinder is Automated Congestion Avoidance, which ensures that links in the network do not get overloaded with traffic. If the traffic on a link is trending toward congestion, Pathfinder manages the network traffic automatically, modifying the paths of Label-Switched Paths (LSPs) without any human intervention. This removes the burden of manual traffic management, which is traditionally very labor-intensive and error-prone. Now, let’s take an example and see how Pathfinder delivers automated congestion avoidance.
Example Congestion Scenario
Figure 1 shows an example of a network topology. Traffic is traveling along various LSPs between different nodes, as shown by the colored arrows. In reality, there would be more LSPs in the network, but these are not shown for clarity. Pathfinder is receiving streaming telemetry data from each node in the network. This includes:
• The current traffic level traveling in each direction on each network link.
• The amount of traffic entering in each RSVP Traffic-Engineered (RSVP-TE) LSP or Segment-Routed Traffic Engineered (SR-TE) LSP at the ingress router of the LSP (bearing in mind that traffic only ever enters a TE-LSP at the ingress router). The user can configure a threshold that they consider the maximum desirable traffic level on a link. For example, the user might choose to set it at 90%. Let’s see what happens when the threshold is exceeded.
Figure 1: Link between R1 and R2 has too much traffic
In Figure 1, consider:
1. Pathfinder sees that the traffic on the link between R2 and R3, in the R2 to R3 direction, is exceeding the configured threshold. Pathfinder knows this because R2 is reporting the traffic level on that link (as well as its other links) to Pathfinder on a quasi-continuous basis via streaming telemetry.
2. Pathfinder knows which LSPs are using the congested link, and it also knows through streaming telemetry how traffic is traveling along each of them. In the example, R1 reports on a quasi-continuous basis via streaming telemetry how much traffic is entering the blue and purple LSPs, while R4 reports how much traffic is entering the green LSP. So, Pathfinder has all of the information it needs to work out which LSPs it needs to move to ease the congestion. Of course, when determining which LSPs to move, it needs to make sure not to cause congestion elsewhere!
3. Pathfinder sends signals using the Path Computation Element Protocol (PCEP) to the ingress router of each LSP that is to be moved, providing details of the new path. The ingress router changes the path accordingly, in a make before break manner, so that no traffic is lost during the LSP path changes. This is shown in Figure 2. In the example, Pathfinder tells R1 to move the blue LSP to the path {R1, R5, R6, R7}. In this way, the volume of traffic on the previously congested link has been reduced.
Figure 2: Pathfinder moves the blue LSP in order to ease the congestion
If the above process were carried out by a human operator instead of Pathfinder, the operator first all needs to notice the congestion in the link between R2 and R3. Then, they would need to identify which LSP(s) to move and what new path(s) were feasible that wouldn’t cause congestion elsewhere. Finally, the operator would need to reconfigure the ingress router of each LSP with details of the new path. This process is very time-consuming and error-prone. With Pathfinder, these issues are completely avoided.
The Traffic Matrix and TE-LSPs
For the Automated Congestion Avoidance to be effective, a reasonable proportion of the traffic in the network should be carried by TE-LSPs, as the controller can move only TE-LSPs via PCEP. The operator could choose to fully mesh all of the edge routers with TE-LSPs, so that all of the traffic in the network is using TE-LSPs. However, the traffic patterns in many networks are such that this can be overkill. This is illustrated in Figure 3. The 3D chart shows the traffic matrix, i.e., how much traffic travels from each router to each of the other routers in the network.
Figure 3: Traffic matrix
If there are N edge routers in the network, then there are N(N-1) cells, called demands, in the traffic matrix. As seen in the figure, the traffic matrix is far from uniform. For example, there is a lot of traffic going from router K to most of the other routers, but there is very little traffic going from router F to any other router. In this situation, the operator could choose to only use TE-LSPs corresponding to the demands in the multi-colored “mountain ranges” but not to the blue-colored demands in the “foothills”. This reduces the number of TE-LSPs needed by an order of magnitude, yet those TE-LSPs are still carrying a large proportion of the total network traffic volume. The remainder of the traffic can use shortest path MPLS forwarding, either using LDP or SR node SIDs.
In such a scenario, Pathfinder’s Automated Congestion Avoidance still works very well. Streaming telemetry reports the total traffic using each network link, regardless of whether it’s TE traffic, LDP traffic, shortest path SR node SID traffic or plain IP traffic. The total traffic level on a link is what triggers Automated Congestion Avoidance when the pre-configured traffic threshold is reached. As seen in Figure 4, although more demands use shortest path routing, the demands that use TE-LSPs comprise the majority of the traffic. This makes it easy for Pathfinder to ease congestion by diverting some of the TE-LSPs away from the congested link. Note that in reality, there could be dozens or hundreds of demands underpinned by TE-LSPs on a given link and thousands of demands underpinned by shortest-path routing.
Figure 4: Traffic demands on a link
This is a strong example of how Pathfinder’s ability to make effective use of Closed-Loop Automation, obtain visibility of traffic levels across the entire network via streaming telemetry and tune the paths of TE-LSPs accordingly. As such, this is a key stepping stone toward the Self-Driving Network™. Live deployments have shown that Pathfinder’s Automated Congestion Avoidance functionality [1] is effective in the real world. Pathfinder successfully removes the manual labor associated with traffic management, which is traditionally very time-consuming and error-prone.