Specifics of the future are uncertain, yet many things remain predictable. Network engineering is a role that is both growing and evolving. Today’s compute workloads and workflows are growing, varied and with many customer choices. Underlying networks and fabrics can either curtail or unlock growth. In some ways, things used to revolve more around the somewhat simpler ‘speeds and feeds,’ but how much are elements like complexity and security being factored into modern data center cost-benefit analysis?
We recently spoke to some of our Juniper Networks Ambassadors to better understand the dynamics, drivers and concerns they’ve encountered while designing and running data center networks. Our panel discussion focused on what matters now to their organizations and clients, and although no one size fits all, we uncovered some surprising insights.
Shared Situation
Common Bonds and Building Blocks
Data centers come in many different shapes and sizes, yet one truth prevails – connectivity is required for workloads and workflows. There is no utility without access and the glue that binds everything together is the network. Though the network is sometimes an afterthought, once compute and storage requirements are understood, it shouldn’t be. In many engineers’ minds, network requirements are of equal if not greater importance due to the network’s risk profile when compared to the traditional stubs of compute and storage. Even in a distributed microservices world, risk accumulates the most and is aggregated in networks.
Daniel highlights that “A lot of the problems that we see are down to the workload architectures which are typically well out of the remit of the teams that we are dealing with.” His recommendation is to “Encourage application developers to use modern principles in their designs to stop the network from trying to solve those problems.”
So, if the network engineer role is changing and the network is a key focus area for risk, surely it demands the most attention? Unfortunately for many, the underlying risk is difficult to enumerate, demonstrate and communicate to the right stakeholders. When combining an application developer’s desire for a simple “flat earth” with ubiquitous reachability and infinite capacity, many don’t realize that even fabrics are shared and finite.
Christian, an Ambassador from a Juniper partner, points out that although some customers have relatively large automated fabrics, “Some of them are still afraid of doing stuff with Ansible,” concerned that “they don’t know how to automate yet.” This concern echoes the findings of the 2020 SoNAR (State of Network Automation Report). For smaller customer deployments, Christian states that “They need a solution that starts relatively small and can grow with them.” He also notes, “They see the success of EVPN-VXLAN,” but sometimes struggle with aspects of visibility and orchestration, highlighting that “The only thing we need to get stronger at is the management part.” He also observes, as many of our Ambassadors did, that “The role of the network engineer in the DC is changing… it’s constantly changing,” and “it feels like you’re becoming a developer more and more.” We dive deeper into this topic of training and the talent gap later. However, it is an important issue that also stands out in the SoNAR report.
A Simultaneous Exodus and Influx
Predictability, Cost, Compliance and Gravity
Early in the conversation with the Ambassadors, we asked the million-dollar question: Will private data centers even be around in a few years? While many unpredictable and large workloads may be moving to public clouds, the Ambassadors are seeing that more predictable workloads that are moving back from public clouds to on-premises data centers. This return from cloud takes advantage of better cost structures tied to performance and licensing requirements. Even with the standard overheads of power, space and management (and irrespective of whether their existing assets or data centers have spare capacity), some customers are awakening to the benefits of select workloads “coming home.”
Paul, who works with a data center services provider, mentioned that some locations were “emptying quite rapidly” toward public clouds; however, his company’s more risk-averse government customers were staying put. Although most customers don’t generate revenue from their footprints, the main reason for remaining was due to greater compliance requirements and slower cycles.
Jonas observed the exact opposite, especially across mid-market, where many customers found their cloud spend problematic. Instead, these customers are now opting to run workloads on their own physical assets once again. He described an influx of returnees from the cloud who wanted to run more predictable workloads on bare metal across their own fabrics as “They feel that things that need elasticity, that’s kept in the cloud, and then everything else is on-prem.” Those returning found renewed value in migrating back to existing spare capacity while also investing further in their own footprints. He further remarked that this could also provide more favorable licensing costs depending upon the platforms involved.
Another Ambassador from a high-growth environment saw massive investment in data centers and fabrics to meet increasing demand. One of their main drivers was that investors wanted assets retained on the balance sheet rather than a pure OpEx (Operational Expenditure) spend going on the public cloud. Additionally, the public clouds didn’t serve parts of the region well. Herein, there were also concerns around geopolitical risk and data residence due to legislation that mandates local data storage.
So, whether you’re running your own data centers (and workloads) or someone else’s, it’s not a one-size-fits-all scenario but rather a re-focusing of spend that results from hard-won lessons, number crunching and in-depth consideration of requirements and options. With a range of verticals and differing experiences from service providers to healthcare and private enterprises to system integrators, there’s a mix of trajectories, approaches, constraints and opportunities.
Dancing with Security and Legacy
Advanced Persistent Debt
Bad actors and greedy or misbehaving entities represent a threat to the whole stack’s stability and functioning. But how does this manifest in data center networks where elasticity, assurance and security are more critical than ever before? As with most things, “it depends,” and comes down to numerous factors, quantification of risk and cost-benefit analysis. It’s not just government, healthcare or the extremely security conscious that often opt for on-premises infrastructure. It means that control and responsibility sits with the operator for everything from physical access, through the network to the top of the application stack. Public cloud is a leveraged model with many benefits, not least of which is that of large, specialized teams and expertise that exceed many organizations’ capabilities. Fundamentally though, security is a process, reflecting the quality of the whole system, and the more variables and complexity in play, the greater the potential attack surface and risk.
Steve highlights that many organizations are “Very much committed to automating network operations” and not just for the “speed to deploy,” but also for security and regulatory purposes. In his experience, it’s not just the primary compute workloads but sometimes “Physical devices running old operating systems that are not allowed to be patched and are spread out all over the organization,” which become “A security professional’s nightmare.” He goes on to say, “A large part of our automation process was knowing what those devices are, what the individual communication requirements of those devices were,” and “How we get them configured and deployed.”
For some, like Steve, “Microsegmentation is already a big play,” with many hundreds of firewalls deployed in large footprints. Yet, Jonas also sees a desire to consolidate security controls to simplify security operations further. Almost flying in the face of defense-in-depth, he notes that some groups “Just don’t want to do it in the network,” due to the complexity and troubleshooting overheads. They prefer to implement policy and virtual network controls closer to the hypervisors and utilize that associated management tooling.
In the next blogs in this series, we’ll share more insights from the Ambassadors on topics such as hiring and training for the talent gap, automation and troubleshooting.