It’s tempting—especially for enterprise network owners—to apply the same sorts of lifecycle management tactics to wide area networks (WANs), campus networks and data center fabrics. But does applying the standard “create deep requirements, perform lots of qualification testing, wait for change control and take the network down” process often used for other areas of the network really make sense for a data center fabric?
Years ago, hyperscalers decided the answer to this question was “no”. For instance, instead of building deep requirements lists, they built networks with maximum flexibility to support every possible application need. Spine-and-leaf topology are used widely by hyperscalers because they come the closest to a universal topology. A single topology that can efficiently emulation a logical full-mesh or ring provides deep flexibility across a single, simple design.
Another question many operators ask is: How and when do we qualify network changes for production? This might not be as obvious an area to consider when moving from a WAN to a DC fabric, but building a change pipeline, rather than simple change control, can be important in upping the flexibility of a design.
Hyperscalers have, once again, designed a lifecycle management system that avoids many of the pitfalls of traditional change management. Instead of deep testing based on a fixed set of requirements, they conduct testing and then use a canary process to push changes into production over time. Using real systems to generate real traffic is a much better test. This also helps operations understand what the fabric should look like and what it looks like when it’s broken.
Canary testing is often combined with intentionally breaking things in a limited way. This not only proves out the design and operations of the network, it (once again) helps operations understand what the network looks like when it breaks in specific ways. More understanding leads to faster troubleshooting and hence, lower mean-time-to-repair.
This way of looking at change in the data center also means a change in mindset toward failure. Instead of insisting on “no mistakes,” accept the risk of mistakes and plan to work around them. Prevent what you can and mitigate the rest.
All of these ideas—flexible requirements, accepting and mitigating risk rather than demanding perfection and iterative change processes—represent a new way of thinking about the lifecycle of the data center fabric. Accepting the fabric is not a static, “built once and done” thing. Security, operations and automation must be shifted to the left so these things can be designed in to the network, rather than bolted on at the end. Only by doing so will network architects gain a new perspective on managing the lifecycle of a data center fabric.
Join our next Juniper Data Center Masterclass on Wednesday May 12th at 9 am PST for a discussion on data center Lifecycle Management, where we’ll look at some of these questions in depth.