Onwards with Apstra AOS 3.2 and the Industry’s First and Only Intent Time Voyager!

This blog was originally published to the Apstra website – in 2021, Juniper Networks acquired Apstra. Learn more about the acquisition here.

We recently announced the availability of Apstra AOS^® 3.2, our latest release of the Intent-Based Networking system that transforms the way you build and manage data center networks. While this is indeed “dot release” (3.2), it includes a huge number of major system enhancements. In fact, when sharing details on these new features, many experts commented that this should be considered our 4.0 release. Indeed, our feature velocity is incredible, and this announcement deserves further explanation.

In Apstra AOS 3.2, there are four major themes:

Network Recovery
Scale-Out Enhancements
Design Flexibility
Operational Improvements

Let’s start by discussing what I consider to be one of the coolest features we’ve EVER built.

Network Recovery

Network devices typically support multiple saved versions of the running configuration, allowing the operator to switch between different behaviors for any device. However, the way in which vendors have implemented this feature varies widely. More importantly, this functionality is limited to an individual node. Rarely do we make changes to a single device, infrastructure services are provided through a combination of configurations across many devices, as well as continuous validation tests that ensure that the network is delivering on these services as intended. Intent is not just about device configurations, it also includes the related expectations for what those configs will do.

Changing services requires coordinated changes across tens or even hundreds of nodes at once, as well as initiating new telemetry collection and continuous validation probes to test that the network is delivering on those services. Imagine trying to do this across multiple vendors operating systems all at the same time. Impossible?

SPOILER ALERT – No.

With Apstra AOS 3.2 network operators can save snapshots of the configuration for the entire network, as well as the entire state of the network including all telemetry and continuous validation tests, and easily move between versions based on the demands of the business. For example, creating a new virtual overlay network with EVPN requires adding the various custom configurations to all devices participating in the EVPN routing topology, as well as a large number of telemetry collectors that feed IBA probes that validate network overlay operation. If we deploy a new set of VXLANs, we can store all of these configuration changes and state as a single checkpoint so we can easily rollback if this change is no longer needed, or worse, if this change causes a network problem. The operator can record a name and description for every group of changes (AOS supports batch changes) and with 3 clicks we can immediately move forward or backward to a given network intent, including all services and related configurations, telemetry, and continuous validation. This is like a time machine for your network, and it is drop-dead simple to use. AOS also creates snapshots for all recent changes, so you can simply say “hey let’s go back to how the network was yesterday.” Once again, this works across all supported vendors, regardless of how they implement their configuration management. So if you have a topology that combines multiple vendor NOSs, AOS will automatically manage the process for changing the given state of the network. We call this feature Intent Time Voyager.

To emphasize a key point, Apstra AOS doesn’t just manage the device configurations, it also tracks expectations within our Intent, so changing from one snapshot to another also changes how we manage and monitor the network. If we deployed a new external router, the connection to that router is automatically added to monitoring, and if we rollback to an earlier snapshot, the monitoring for those links is also removed. So when you change to a new snapshot you don’t end up with a bunch of red alarms. The entire desired state of the network is changed. This is the benefit of managing the system state through intent.

Raise your hand if you wish you had this feature before.

Scale-Out Enhancements

As customers deploy more and more devices with AOS, they frequently want to extend an existing topology or bridge the overlay network from one network to another. AOS now supports two major features to enable larger systems.

Data Center Interconnect (DCI)

The most widely deployed overlay protocol supporting multi-tenancy is VXLAN, but we need to use EVPN to advertise reachability for individual networks and hosts to scale and minimize traffic replication. Since EVPN utilizes BGP as a routing protocol, there is a need to extend this routing plane between two networks, or even to extend it between two data centers. This gives architects the ability to move workloads between networks and even to run Active/Active application setups. With Apstra AOS 3.2, network operators can now design, build, and deploy a Data Center Interconnect service with a simple yet powerful intent-based approach. AOS takes care of all the details, including advertising the necessary EVPN route types outside of the topology so this reachability information can be shared. Apstra AOS 3.2 can share the EVPN control plane information with a non-AOS managed system. And as with everything Apstra, the appropriate telemetry is collected and continuous validation probes initiated to validate that the Data Center Interconnect service is indeed operating as intended.

5 Stage Clos Expansion

Large networks require an additional set of devices to interconnect multiple pods of compute, we refer to these devices as Superspines. AOS previously supported the creation of Superspines to connect multiple pods. Now in Apstra AOS 3.2 you can easily add new racks to existing 5 Stage networks, as well as add entire pods to the fabric. Adding a rack or pod is managed using an intent-based approach and utilizes a simple workflow: you select what you want to add from the existing catalog of network templates and then select how many of these objects you want to add. The Graph model is immediately updated, telemetry and continuous validation are automatically initiated, and once the devices are online and connected you can easily put the new network into service with no service disruption for the pre-existing traffic flows.

This is what we like to call Just-In-Time Infrastructure.

Design Flexibility

At Apstra we always emphasize the goal of not creating “snowflakes”, these are one-off changes to networks that cause the design to deviate from our standard templates. But there are often good reasons to modify a system in small ways to meet our business needs. For example, imagine that a single port on one of our leaf switches has failed. We obviously want to replace the device at some point, but it may not be easy to perform this maintenance in the near term.

So in AOS 3.2 we introduce the concept of Flexible Fabric, which permits the operator to adjust the design of the network while still maintaining our intent. Flexible Fabric enables the following service operations through a simple change in the UI (or API for my DevOps friends):

Change the port speed on a single link
Change the role of a single link
Add a new external router port
Add a new server port
Modify existing ports
Move connections within a fabric (ex. my MLAG peer link is flaky so I want to switch the ports used)

All of these changes modify the Intent Graph Model, so our monitoring and telemetry is automatically updated. This is HUGE! If I change the location of my MLAG peer link, all of my MLAG system checks are automatically updated. If I add a new external router, the interface is immediately monitored, so my telemetry for external traffic stays up to date. Literally nothing needs to be done to the complex system checks that I already have deployed.

Operational Improvements

AOS 3.2 adds a huge number of new tools for managing your network. The biggest enhancement is what we call Service-Oriented Dashboards. These are pre-built system checks and visualizations that are automatically activated when new services are deployed in the network. As an example, suppose you decide to deploy VMware ESX in an existing pod for the first time. When you tell AOS that you have a vCenter Server or NSX-T controller, AOS immediately enables system checks, widgets, and a simple dashboard. You don’t need to do anything, AOS recognizes the changes to intent and activates the necessary telemetry. You can choose to use the pre-built dashboard or simply add the new visualizations to an existing view within the system. The same goes for EVPN, as soon as you turn on this new service the dashboards are created and then you are free to customize the views for your specific needs.

And More…

I said that AOS 3.2 has 4 major enhancements but that wasn’t exactly true. In fact, we’ve also added a number of new features that are quite large in their own right, including: