An Industry First: Benchmarking an LLM on a Multi-Node AI Inference Ethernet Fabric

AI and ML (machine learning) have been in development for decades, but even with today’s rapid advancements, the technology continues to be fragmented, bespoke, and poorly understood. At Juniper Networks, we believe that we can unlock the next stage of AI adoption through powerful collaboration that democratizes AI infrastructure by driving down costs and accelerating innovation. That’s why we recently joined MLCommons.

MLCommons is an AI industry consortium built on a philosophy of open collaboration to improve AI systems. With a mission to accelerate innovation for social advancement, MLCommons fosters collaboration between industry leaders and academics to measure and improve the accuracy, safety, speed, and efficiency of AI technologies. By providing public datasets, AI best practices, benchmark test suites, and measurable metrics and industry standards, MLCommons helps organizations build and deploy AI systems and solutions that can meet the complex AI/ML workload requirements for training and inference.

Developing and deploying AI applications requires highly-tuned infrastructure with purpose-built GPU servers, a robust AI/ML software stack, and a lossless, low-latency network fabric. Training an AI/ML model and then using it to draw conclusions from new data—a process called inference—are two distinct processes that require different performance metrics, which are valuable in building and tuning AI clusters. To quantify performance and response times, MLCommons’ most recent testing round, MLPerf Inference v4.0, was designed to measure how quickly AI clusters can run AI/ML models in a variety of AI Inference scenarios. MLPerf delivers industry-standard ML system performance benchmarking in an architecture-neutral, representative, and reproducible manner for data center and edge systems.

Rigorous LLM testing

Juniper is the first company to submit a multi-node Llama2 inference benchmark, which demonstrates Juniper’s commitment to open architectures using Ethernet for interconnecting GPUs in a data center network fabric. Testing multiple nodes—in this case, multiple GPU servers—is essential to meet the growing scale and complexity of large language models (LLMs). Many AI models, particularly those used for tasks like image or speech recognition, are extremely computationally expensive. Splitting the inference process across multiple nodes improves efficiency. Using multiple nodes is crucial when dealing with large volumes of data, complex models, or real-time decision-making needs that a single machine can’t handle effectively—for example, fraud detection systems in e-commerce that analyze transactions in real time. Multi-node inference allows Juniper to simulate real-world network traffic patterns and accurately assess whether infrastructure can handle the sophisticated requirements of processing data across large networks.

Juniper leveraged and built upon NVIDIA’s TensorRT-LLM, a framework for optimized LLM inference, to benchmark Llama2 inference. Llama2 is a 70-billion parameter model, the largest model in the MLPerf Inference benchmark suite—an order of magnitude larger than the GPT-J model introduced in MLPerf Inference v3.1. It’s more accurate, but also more complex and challenging on the AI cluster infrastructure. This is the first-ever multi-node submission to MLCommons for AI inference based on Ethernet and therefore, we believe the first to truly test the networking aspects of an AI inference cluster.

“Submitting to MLPerf is quite challenging and a real accomplishment,” according to David Kanter, Executive Director of MLCommons. “Due to the complex nature of machine learning workloads, each vendor had to ensure that both their hardware and software stacks are capable, stable, and performant for running these types of ML workloads. This is a significant undertaking, and we celebrate the hard work and dedication of Juniper Networks, which is the first company to submit a multi-node large language model for the MLPerf Inference benchmark.”Juniper AI Lab

We conducted these tests in the AI Innovation Lab at our headquarters in Sunnyvale. Juniper commissioned the lab in 2023 to analyze various AI/ML workloads and associated network traffic patterns. This has further developed our expertise in operationalizing the entire AI infrastructure, optimizing and tuning Ethernet fabrics for AI/ML workloads, and ultimately publishing new Juniper Validated Designs (JVDs). The AI Innovation Lab is built with H100 and A100 GPU servers from NVIDIA, distributed storage systems, Juniper QFX and PTX switches, and operated via Juniper Apstra,™ our market-leading, multivendor DC fabric management and automation solution. We tune our AI clusters and data center fabrics to enable optimal congestion management and load balancing to simplify customer deployments with immediate stability.

Juniper’s AI Data Center solution is a quick way to deploy high performing AI training and inference networks that we believe are the most flexible to design and easiest to manage with limited enterprise IT resources. Our solution features:

Validated solution performance: End-to-end validated designs ensure confidence in choice of products; expedite deployment times

Simplified operations: Save time and money with an operations-first approach to design, deployment and troubleshooting, using fewer resources

Open flexibility: Design your network using proven technologies and products that avoid vendor lock-in

Learn more

Attend the upcoming webinar, Building Cost-Effective, High-Performing AI/ML Data Centers, to learn more about how Juniper can help you navigate the challenges of building and operating data center infrastructure to unleash the power of AI for your organization.

And if you want to learn the basics of network fabrics and how they can optimize performance in your data center—whether you’re running AI/ML workloads or not—RSVP to our upcoming webinar with Redmond, Coffee Talk: Data Center Fabrics 101: What They Are, Why You Need Them & How To Implement Them.

About me

An Industry First: Benchmarking an LLM on a Multi-Node AI Inference Ethernet Fabric

Rigorous LLM testing

Learn more