This blog was originally published to the Apstra website – in 2021, Juniper Networks acquired Apstra. Learn more about the acquisition here.
When talking about my 25 years of Network Engineering experience, one common thread is the pain I had to deal with when upgrading Network Operating Systems (NOS) for all kinds of network devices: Routers, Switches, etc.. Upgrading Network Operating Systems (NOS) almost always involved interruptions to the network, which usually means the Network Engineer is always doing this work in the middle of the night.
I don’t like to date myself. However, my first NOS upgrades were upgrading a Cisco AGS router, which involved replacing a large number of PROM chips in the router. The biggest risk of the upgrade was getting to the last chip and having a pin break off. After a long series of expletives, you had to manually “back out” the upgrade by carefully pulling out the new chips and re-installing the old chips. The upgrade process was hard, but NOS software was more simple and these types of upgrades were not needed that often.
Network devices have come a long way since then. Network devices now have flash memory to hold several NOS images, making it much easier to “back out” a NOS upgrade if the new NOS version does not work as expected. For many Network Engineers, the process is still a manual process; SSH to the device, copy the NOS image to the device via SCP, change and save the boot configuration, reload, watch the console and cross your fingers. With the increased complexity of NOS software, upgrades can still be the most difficult activity a Network Engineer has to endure.
Why Upgrade the Network Operating System?
There are many reasons for a Network Engineer to upgrade a NOS on the network. The first reason is to load a new NOS version that fixes bugs, security vulnerabilities or other issues in the NOS. The second is to enable new features. Too frequently, loading a new NOS for features also introduces bugs, necessitating another upgrade.
Most vendors have recommended NOS versions, but it’s unlikely that the recommended version has been tested against your network architecture and the features you need. If you run a multi-vendor network, not only do you have twice the amount of work selecting and testing the different vendor NOS versions, you also have additional work making sure there are no vendor-specific vendor compatibility issues.
Upgrading Network Operating Systems is Made Easy with Apstra AOS®
Apstra AOS provides pain relief for NOS headaches. Apstra pre-qualifies selected vendor NOS versions for supported network device vendors such as Cisco, Cumulus and Arista against Apstra’s Reference Design for data center networks.
Apstra Engineering Testing and Global Support Teams use extensive, automated testing to verify the quality of the Apstra AOS software platform. In conjunction with this, Apstra tests the software with all supported network vendors. During this process for each new version of AOS, Apstra selects from one to three recommended versions of each NOS and completely certifies that each version works against Apstra’s Reference Design for data center, spine-leaf networks. Using this testing, Apstra can assure its customers that they can be confident using AOS and the network vendor’s hardware with the NOS version recommended by Apstra.
If Apstra finds bugs in a vendor’s NOS, it opens and tracks them with the vendor. Apstra adjusts its recommended NOS once a vendor delivers a fixed version.
Apstra AOS has the ability to automatically implement a Vendor NOS-specific workaround. A perfect example of this is an issue with Cumulus Linux, where on certain network hardware ASIC models (e.g. Broadcom Trident II+), in certain network architectures using VxLAN encapsulation, the network switch does not properly rewrite network addresses and their time-to-live (TTL) values. Cumulus Networks developed a workaround involving an alternate configuration that customers would need to implement. For Apstra AOS users who use Cumulus Linux devices, AOS automatically determines if the Cumulus Linux workaround is needed based on the version of Cumulus Linux and hardware device being used. Apstra AOS then automatically implements the correct “workaround” configuration without any extra effort by the user. If the user replaces the device with a different device that does not require the workaround, AOS automatically removes the “workaround” configuration from the new device.
If a customer’s network vendor recommends that the customer use a different NOS version, the Apstra customer can contact Apstra Global Support to ask for testing and support for the new NOS version. Apstra Global Support and Engineering will do initial testing and limited certification of the NOS so the customer can do their own lab testing. Apstra Engineering and Product groups will work towards full certification of the new NOS in a future version of AOS.
Once the customer has selected a new NOS version to move to, Apstra AOS supports NOS upgrades for managed network devices, allowing the user to upgrade their network devices directly from AOS, using AOS Device Operating System (DOS) Upgrade. AOS currently supports NOS upgrades for Cisco NX-OS, Cumulus Linux, and Arista EOS. The user can also use AOS Maintenance Mode to drain traffic from redundant network devices before executing an upgrade. After a successful upgrade, AOS Intent-Based Analytics (IBA) can let the user know when the device is completely online and ready for operation. The user can then confidently change the AOS Maintenance Mode to “undrain” traffic, restoring traffic to the device. In this white paper, Device Operating System (DOS) Upgrade, we walk through a step-by-step explaining the upgrade process.
One of the most frustrating issues I’ve had with vendors is finger-pointing. With Apstra AOS certifying best-practices reference design that works with recommended vendor NOS versions, the finger-pointing can be avoided. Network Engineers can avoid headaches, their lives can be improved and the efficiency of their businesses can be improved, allowing them to add capabilities for the users they support.