Juniper Networks developed the Junos OS® Evolved disaggregated network operating system (NOS) building on the strengths of the Junos operating system, to bring industry leading routing and switching solutions to a native Linux environment. Junos OS Evolved provides a modern, programmable, highly available and resilient platform and at the same time, delivers a secure execution environment. This blog offers “a peek under the hood,” and is the first in a series where we will discuss the Junos OS Evolved system, take a deep dive into the capabilities and review the design choices that brought these capabilities to life.
Almost all important characteristics of a system can be traced to a handful of fundamental choices that form the foundation of the NOS, including the choice of the environment and the choice of the communications infrastructure. Some may argue that these choices ultimately play a larger role in the system characteristics than the actual implementation itself. At Juniper, we believe that the intent, as expressed in these choices and the quality of the implementation, play equal roles.
Choice of the environment – Picking the battlefield in which to wrestle with the problem space is one such choice. An argument can be made that the most efficient way to solve some of the core problems in a network operating system (NOS) are by doing this within the operating system kernel, by customizing and extending a standard kernel or by building a proprietary kernel. The Junos OS Evolved system chose to perform most operations in the Linux user space, recognizing that the kernel is a particularly anemic environment in which to develop rich features. The kernel is well suited to manage shared resources and the physical hardware, but any efficiency gains or optimizations obtained by building key communications infrastructure inside the kernel are easily offset by the lack of agility in that environment.
Choice of communications infrastructure – Junos OS Evolved, at the very core, uses a formally modelled state-based, persistent and self-compressing publish-subscribe mechanism for all inter-process interactions. The communications infrastructure handles all aspects of marshalling and demarshalling, addressing, delivery and back pressure, enabling applications to focus exclusively on their business logic. The pub-sub system removes any space and time coupling between the sender and the receiver(s) bringing in the much-needed isolation between processes when there are failures, along with advanced capabilities like the ability to transparently distribute processes across distinct compute nodes. Contrast this with a system based on multiple point-to-point connections each needing to get involved on a component/service failure and the advantages of a pub-sub system are clear.
State-based – All exchanges over the Junos OS Evolved pub-sub system are modelled as state updates that owners of state publish and to which consumers react. These exchanges are distinctly different from systems where message exchanges drive changes in state that are always internal to the application. Such systems suffer from message explosion problems due to fast changing external triggers that cannot be compressed, as computation of the final state relies on visibility into all intermediate state transition driven by individual messages. This aspect is fundamental and key to ensuring that the information seen on the pub-sub system is always complete and usable for reconstructing application state on a failure.
Self-compressing – The number of events are not bounded in a typical NOS as these are the result of external triggers, such as an unstable interface link that keeps flapping or a peer device gone bad. On the other hand, the total state in a system is bounded, at least to the extent that one can model the upper bound more accurately. Subscribers to the Junos OS Evolved pub-sub system rely only on the final state, allowing the system to compress intermediate state updates to slow consumers. This is critical in preventing unbounded queuing in the system when under network churn and in the presence of a slow consumer.
Formal modelling – All state is formally modelled using Domain Specific Languages (DSLs) that describe every bit of information encapsulated by the object. This includes field information and relationship information, tying together individual state objects into a larger directed acyclic graph of information that makes up the operational state of a router/switch. Formal modelling elevates this meta-data information from inside the application code to a plane that is visible, inspectable to generic application-agnostic infrastructure layers. This opens up a whole host of new possibilities around automatic failure detection and modern tooling to view the system state in a more intuitive manner, some of which we will discuss in future blogs.
Persistent – The pub-sub system guarantees persistence of state objects, allowing services to handle failures and upgrades with minimal impact. A service restart does not need to be coordinated with adjacent services due to the lack of time coupling in a pub-sub model. State persistence allows simple reconciliation schemes at the service level to minimize impact. Upgrades are restarts of a service into a new version of the software.
Visibility – Visibility was a key design factor in the system. Data is exposed to the infrastructure, not only because there is a need today, but because there may be a need tomorrow. Juniper built the system to eliminate the need to go back and add probes or logging. The extensive analytics and visibility design ensures that all interactions are modeled and inspectable. It also ensures that any new feature comes with visibility enabled by default. This is key for both managing the infrastructure and integrating applications with Junos OS Evolved to enable deeper integration and better management.
Security – There is an age-old battle in networking between security and flexibility. Junos OS Evolved provides a secure execution environment based on the Linux IMA stack. This design provides security while also providing the flexibility to modify software as well as install your own agents and applications. All of the Juniper provided software is signed and if it has been tampered with, it will not run in the system. We also provide the ability to use your own keys to sign any modifications or custom components that your IT team has produced, tested and approved to run in the system. The ability to sign these modifications and custom components with your own keys ensures their authenticity and allows them to run in the system.
Software Upgrades – Software upgrades in a modern cloud environment need to be agile and reliable with minimal impact. In order to achieve this, a component-based architecture that views the system as a versioned collection of granular pieces of independent software is required. This architecture enables smart software upgrades required for feature velocity, quality and a hitless upgrade process, thus increasing the availability and performance of your infrastructure – even during upgrades.
With Junos OS Evolved, we have brought our industry leading routing and switching solutions to a native Linux environment. In doing so, we have simplified network operations with a highly scalable unified end-to-end network OS while providing the reliability, quality, agility, visibility, open programmability and simplicity to facilitate cloud operator success with a flexible and cost-effective network infrastructure at cloud scale.
This is the first in a series of blogs and associated vlogs covering our Junos OS Evolved network operating system. To learn more about the Junos OS Evolved architecture, please watch the Juniper’s Cloud-Optimized Network Operating System (NOS) vlog.
Blog: Engineering Simplicity in Networks Requires Software Disaggregation
Vlog: Driving Network Simplicity with Software Disaggregation