From its very inception, the cloud optimized Junos OS® Evolved network operating system (NOS) was intended to simplify network operations. Based on cloud principles, it was engineered with visibility, open programmability and simplicity to facilitate network infrastructure at cloud scale. For an overview of Junos OS Evolved, check out Junos OS® Evolved: Juniper’s Industry-Leading Network Operating System (NOS) for the Future. In this blog, we’ll dive deeper into the details of state distribution in Junos OS Evolved.
The state distribution infrastructure in Junos OS Evolved plays a significant role in enabling many important attributes of the NOS. The resiliency of applications, subsystems or even nodes in a cluster to failure event are tied to the state distribution architecture. The state distribution mechanism has a deep influence on critical system convergence metrics and the behavior of the NOS under a heavy load. This state distribution system acts as the highway over which all information within the NOS is exchanged and plays a key role in providing system state visibility, another important focal point in its design.
The Pub-Sub Environment
Junos OS Evolved uses a state-based Publish-Subscribe (Pub-Sub) system, where all exchanges within the system are modelled as state. All applications exchange state with the rest of the NOS via the distributed pub-sub system, also called the distributed data store (DDS). The use of a pub-sub system automatically provides the right level of decoupling between producer and consumer applications. The failure and recovery of an application participating in the pub-sub system is transparent to the rest of the system as there are no point-to-point side channels of communication among applications that would otherwise get torn down and reestablished on a failure/recovery cycle. This also means that the producers and consumers do not need to be collocated, providing a great deal of flexibility in application placement in a modular system with multiple CPUs.
In Junos OS Evolved, the information exchanged is modeled as a state update, making the update a complete and independent information unit that does not require knowledge of past updates. Unlike a message passing system, the state update model allows the NOS to compress multiple updates of the same state into one, allowing coexistence of fast and slow consumers of state without building unbounded queuing in the system. This is a very important aspect of the design that we believe is critical on a router-class system, especially because these systems are subjected to various network and environmental events. Placing an upper bound on the number of events that could hit the system is hard but placing an upper bound on the total state within the system is more tractable. In a state-based system, a fast-changing state, like the UP/DOWN status of a bad physical link does not stress all the consumers in the system as only the final state matters and all intermediate state can be compressed, allowing slow consumers to coexist with fast consumers without degrading the overall system performance. This ultimately gives the system the ability to deal with the bursty nature of inputs gracefully, providing it with the needed elasticity and cushioning when dealing with unpredictable network events.
Persistence, another key attribute of the DDS system, fulfills the reliability requirements of the system. The pub-sub system serves as a state store allowing applications to use it as a database for state persistence across application failures. This coupled with the location agnostic nature of pub-sub systems enables richer high availability models in the Junos OS Evolved system. This enables the router to snapshot and restore its control plane state just like an operator would during a virtual machine upgrade.
System State Visibility
By designing the NOS so that all system state information passes through a common highway, the DDS was also devloped with a focus on providing system state visibility at the infrastructure level. We built Junos OS Evolved to provide access to the system state by default, but not for an immediate end user need. Put simply, network operating systems of the future must provide an unparalleled level of visibility and this served as our guiding principle in the design.
The Junos OS Evolved system enforces a formal model and structure over all state exchanges over the pub-sub system. This allows an application-unaware infrastructure to still introspect all state in the system and present this information through multiple language bindings over multiple interfaces and formats. In-house and customer written applications can interface with the pub-sub system using both on-box applications that use native bindings or remote applications that can access the same facilities through gRPC-based APIs.
Using these APIs, Junos OS Evolved offers a rich suite of tools that present a console view into the innumerable transactions within the system, along with the ability to browse the operational state of the system as it exists in its core true form.
State distribution is a key reason the Junos OS Evolved NOS delivers high availability, accelerated deployment, greater innovation and improved operational efficiencies.
More information on the impact of state distribution is available in the vlog: Benefits of State Distribution in Junos OS Evolved.