What does the SONiC progress, Nvidia acquisitions, and emerging Red Hat model mean for networking?

This blog was originally published to the Apstra website – in 2021, Juniper Networks acquired Apstra. Learn more about the acquisition here.

The last few weeks have been quite eventful for the tech industry. First, we witnessed Nvidia’s acquisition of Cumulus. Along with Mellanox, they are likely to offer an industry-leading, optimized networking stack (silicon + software) that Nvidia will use to advance their deep computing goals and stay ahead. Nvidia’s words, of “the exponential growth in AI and high-performance computing.”

Two weeks later, we witnessed and participated in many SONiC related announcements at the OCP summit. Dell is taking SONiC to the next level with enterprise features, including EVPN/VXLAN support, and multiple management features. This makes Dell/SONiC a viable offering for the enterprise.

We also witnessed Arista announcing their support for SONiC on their hardware platforms. All in all, and as I mentioned in my social networks post, the progress Microsoft, along with Dell and other leading industry leaders have made with SONiC over the last year is nothing short of spectacular.

With these announcements, two new credible switch vendors are becoming viable choices for enterprise customers: Nvidia, with their Mellanox/Cumulus stack, and Dell, with SONiC. These three vendors, in addition to Cisco, are now offering Open Source switch operating systems, in some cases utilizing the same core open source components. For example, Cumulus and SONiC share the same open-source routing engine, FRR, which supports the majority of key routing protocols in deployment, including BGP, OSPF, and ISIS.

The emergence of a Red Hat model for Switch Operating Systems?

This is a new emerging model – essentially, vendor offerings based on existing (open source) software, rather than on proprietary switch operating systems they’ve developed themselves. This is reminiscent of Red Hat, which emerged as an OS vendor by taking to market an existing open-source operating system, Linux. The value Red Hat provided was a number to call and release management. Enterprises would only trust their computing infrastructure to Linux if they worked with a vendor that delivered on these capabilities – and the primary vendor was Red Hat. Also, Oracle will only support your Oracle database implementation if it runs on Red Hat Enterprise Linux – so not only does the enterprise require a number to call, your most critical vendors need it too.

Similarly, before enterprises can entrust their networks to SONiC and Cumulus/Linux, they need to trust that a vendor will provide them support, i.e., a number to call, and release management. So in some sense, these vendors are adopting the Red Hat model for Switch Operating Systems.

But like all analogies, they break down at some level; and indeed, it is at the management level that the Red Hat analogy breaks down. Let me explain why.

Networks versus Servers – very different beasts

For starters, networks are fundamentally different from servers. Why?

Network devices are part of a system

Network devices in a network are closely interconnected yet have different roles, and have to interact very carefully to have the network operating correctly. Configuration of the various network devices needs to be highly consistent for protocols such as EVPN/VXLAN to work correctly, or for policies such as reachability or security to be adequately enforced. In contrast, servers execute largely independently. In fact, a key objective of “cloud-native” software is to allow portions of the software to run on any server and restarted seamlessly on a different server upon failure.

Network devices configurations are unique and arcane

Related to the above, it is a complex and delicate undertaking to configure different network devices in the same networks to operate with the correct behavior required to achieve your intent or outcome. Mixing switches from different vendors render the task even more complicated, as different vendors often need different sets of options or commands, sometimes including workarounds or arcane configurations, to achieve the required functionality and interoperability.

Network failures can be catastrophic and hard to debug

A failure in a network device is not always transparent to the overall operation of the network. In many cases, network devices fail in some subtle ways that confuse the rest of the network and create grey failures which can bring down applications and be devastating to the business, yet are notoriously difficult to diagnose and root cause.

At Apstra, we know of these issues first hand because of our extensive experience in configuring and operating networks, including multi-vendor networks. The premise of delivering an intent-based interface to the network at the management level is to separate intent from the often complex specifics of the how which usually involves a “Command Line Interface” with arcane vendor-specific commands. An intent-based system takes on the responsibility of translating this intent to the particulars of vendor-specific commands and configuration peculiarities. And an intent-based approach is that much more needed when the network includes devices from multiple vendors.

The increased importance of multi-vendor networks

Digital transformation is forcing IT to change, requiring enterprises to be flexible to move quickly and meet the compressed timelines demanded by the business. So how do enterprises proceed? Traditionally, they’ve often mandated a single vendor strategy. While that was likely acceptable in the past, it is no longer so. Indeed, this new category of switch vendors is heralding a new era of multi-vendor networking, and choosing a single vendor network is often no longer a sufficient choice. Why?

Optimized stack for specific applications

Increasingly with these new Red Hat-like offerings, a specific vendor is becoming the standard choice offered in a vendor stack. For example, if you purchase a POD from Nvidia to handle AI or ML workloads, then you’re likely to get Nvidia’s networking stack, which includes Mellanox Silicon and Cumulus software.

In a similar vein, a VxRail solution from Dell is likely to come standard with a Dell Networking stack.

Lower CapEx or choose best of breed

By having new options on the market, you could decide to cut costs. For example, you may determine that for some workloads (e.g., HPC workloads), your risk tolerance is higher, and lower-cost options are acceptable. However, for other workloads, having a best-of-breed networking stack is necessary to get the best performance, which can be critical to your business. So, you may want the lowest latency stack for your High Performance Computing cluster.

As another example, traditional enterprises running sensitive workloads require network devices that have the “security” features and those other features that are required to pass regulatory compliance.

Leafs from new vendors, spines from established vendors

When you interconnect PODs and racks along with the rest of your existing infrastructure, you’ll likely be doing so across spines and super-spines from established vendors such as Juniper, Cisco, or Arista. Consequently, you’re likely to end up with different vendors for your leaves and your spines.

The management software that enterprises require

Given the fundamental differences between networks and services, and with the rise of multi-vendor data center network infrastructures, what becomes clear is the necessity for a new breed of network management vendor to deliver on two key capabilities:

“Single Point of Support” for support and release management.
“Consistent Operating Model” for management and operations across the networking stack and the various vendors.

Let’s take a closer look at each of these.

Single Source of Support

Enterprises need a vendor to step in and perform testing to ensure all versions of all components in a system work well, not just individually, but together as a whole. In other words, the vendor would ensure that your NVIDIA rack/cluster interoperates well with your Juniper spine and Dell/VMware cluster.

At Apstra, we have developed a sophisticated automated testing infrastructure and run upwards of 10 Million tests a day across all use cases that Apstra supports in the data center, and all vendor combinations, including Mellanox, Cumulus, Dell, SONiC, Juniper, Arista, and Cisco. When a vendor releases a new switch or a new version of a switch OS, we simply plug it into this testing infrastructure. We become aware of any bug or issue that would present itself – not just in the context of this version of the software, but also how this new version interoperates with the other vendor switches and switch OS as part of the more extensive network and intended outcome.

Because of this thorough network-wide testing, Apstra offers as part of its subscription a single source of support (a.k.a one throat to choke) for the entire network, including all devices and releases from the various vendors. Many times, Apstra is the first to identify Switch OS issues from 3rd party vendors; and over the years, we’ve become our customers’ trusted advisor when it comes to the choice of switch OS.

“Consistent Operating Model” across vendors

Enterprises will also increasingly need a vendor to deliver impactful and intelligent automation and abstraction software that empowers architecture and operation teams to manage the process of designing, building, deploying, and operating these multi-vendor network stacks using a consistent operating model. The network architects and operators only concern themselves with the “what,” and the software would take care of all aspects of the “how,” including the specific configuration commands or APIs. This includes protocol settings or vendor “knobs” to achieve a specific intent, for example, to allow for reachability between two endpoints, or to enforce a security policy, or to meet compliance.

Thereby, you can ensure the operating model is consistent across all of the vendors in their infrastructure, simplifying operational tasks, and simplifying training. No longer do network engineers have to train for months before deploying a new switch from a new vendor. You’d like to bring in a new NVIDIA rack? Seamless. You’d like to bring in a VxRail rack, but keep your Cisco spines? Seamless.

This is why we founded Apstra

So Apstra was founded to dramatically simplify network operations by delivering a “single source of support” model and a “single pane of glass,” resulting in a consistent operating model across multiple vendors.

A basic premise behind Apstra is that it is not feasible for an established networking hardware vendor to step in and deliver this type of multi-vendor network management layer the enterprise requires. And we don’t see this basic tenet changing with these new Red Hat-like offerings from switch vendors. Quite the contrary – taking the Red Hat analogy further, many of these vendors will seek to add value with proprietary management software, often multiple “bolt-on” solutions, that is provided on top of devices, similar to Red Hat Satellite; and as a matter of fact, most of these switch vendors have. And our customers see a switch vendor-specific network automation solution as the ultimate vendor lock-in because it locks them into a single vendor approach, which is inconsistent with the secular trend. Consequently, the need for Intent-Based Networking and this new breed of management networking software becomes increasingly critical to enterprises.

In Summary

Open Source is accelerating the emergence of new viable switch vendors.
As a result, we’re seeing an increase in multi-vendor deployments, a trend we expect will continue to accelerate in the future.
Beyond vendors taking on the role of Red Hat for networking, multi-vendor management at the system level is the critical challenge that needs to be solved – consisting of a single source of support and a consistent operational model.

This is precisely what Apstra delivers. Six years in, we’re the most advanced multi-vendor intent-based network solution on the market. And we are continuously delivering to our customers, expanding our breadth of choices and intent-based capabilities.

Don’t miss our upcoming events: Webinar: Automating and Accelerating VMware’s SDDC with Open Networking on June 16, 18 and 23rd Learn more, and our next Apstra LiveStream event is June 11th, and it will be on Intent-Based Analytics learn more at apstra.com/unboxing, and you can view our upcoming Apstra Academy training class schedule here.

About me