In a recent episode of Bob Friday Talks, we sat down with Juniper’s Senior Director of Software Engineering, Shirley Wu, to discuss the evolution of the Marvis Virtual Network Assistant (VNA) and how it uses AI to help IT teams do more.
What are the key features of Marvis?
First of all, Marvis sits on the cloud and continuously sifts through the stats and events streamed from Juniper cloud managed devices, client devices, cloud applications, and partners to be able to detect problems on the customers’ network. It means customers don’t need a network administrator to continuously monitor the network. It’s Juniper’s fundamental difference compared to traditional networking companies and vendors.
Secondly, Marvis not only detects and proactively prevents problems, but it also identifies the root cause of the problem. If the problem is fixable, Marvis will fix it. If it’s a problem Marvis can’t fix, it will create a recommendation and send a webhook email notification to the customer with how to address the issue.
With our newest release we’ve introduced Marvis Minis, Zoom/Teams Large Experience Model and CI with GenAI search.
Marvis Minis automates pre-connect and post-connect self-diagnostic analysis ensuring the network will provide a great user experience when your business opens or when your students arrive for class.
How has Marvis evolved over time?
In the early days, Marvis was only able to answer simple questions. As a virtual network assistant, it would collect stats and events, build time series, and answer very limited user queries.
With the second phase of Marvis, we committed a significant chunk of resources to further develop the Marvis platform, which introduced streaming live data to identify problems with the network. The Marvis event action framework we developed allows it to identify the events and generate the root causes.
Now, with the third phase, based on the detection of events actions, we developed a self-remediation framework, a self-optimization framework, and Marvis Minis, a self-diagnostic framework.
What sits at the heart of Marvis’ success?
The data scientist and machine learning (ML) engineers on the Marvis team contribute to Marvis’ success, but we also have a lot of engineers working on the cloud infrastructure. Most important are our firmware, hardware, and customer support teams. They are the ones who make sure our devices are able to generate meaningful, accurate data. That data is the source for Marvis’ success.
Marvis is a lot of software infrastructure we build on the cloud. On top of that are humans, data scientists, ML engineers, and cloud infrastructure engineers. Collectively, it’s the domain experts, AP hardware, switch hardware, SD-WAN hardware, and software engineers working together to create and develop the intelligence for Marvis.
How does Marvis work from an AI perspective?
When a customer opens a ticket, we go through it with our sys QA team and identify whether Marvis already has the answer. If Marvis doesn’t have an answer, we try and identify the issue. We’ll ask questions like, do we have the data in the cloud? Can we capture that type of behavior or problem? If we do, then our data science team and ML engineers have multiple working sessions with our domain experts. We share our data analysis results with them and the basis of our recommendation comes from fine tuning our results. To validate our results, we apply that analysis to our Mist Universe, which includes multiple cloud production environments. Based on the results, we decide—from an ML perspective—if it’s a classification problem or a prediction problem. From there, we can decide what type of ML algorithm we want to use.
We also have to evaluate how much latency we can tolerate and what the cloud computation costs are. And most importantly, we determine how difficult it will be to debug the issue if we deploy it in certain environments. For example, we may not have access, such as in the case of government cloud environments.
Based on all these factors, we finalize our implementation and deploy the solution to the cloud. We then continue to validate the results with our domain experts. Then we validate our cases with friendly customers to achieve certain confidence levels that this solution can be applied to Mist Universe. Whether on AWS, with a retail customer, or across a university campus, we make sure our solution is able to adapt to different environments. If it is, then we will make it a customer-facing feature.
How does Marvis use ML algorithms for troubleshooting?
Marvis utilizes multiple ML algorithms, including regression, decision trees, LSTM, XGBoost, and Shapley.
For example, we use LSTM, which uses neural networks for anomaly detection and stands for long short-term memory, because we want to capture the seasonality. Think about it; if you’re a university campus, during the weekend versus during the week or during the summer break versus during the regular school year, the number of Wi-Fi devices accessing the network is different. In order to identify anomalies, we need a machine learning model that is sophisticated enough to identify the regular pattern if it’s the weekend versus the normal school days. LSTM utilizes more than two months of data as the training data, which helps identify enough patterns to distinguish weekends from weekdays.
Then, we use that model to continuously predict what’s going to be the next hour of events and errors that will occur on the network. If anything deviates from our prediction, Marvis will identify it as an anomaly.
When we detect an anomaly, we share it with the customer with supporting evidence based on the last six weeks of data. We can show them their behavior pattern timeline as validation that something is not right on their network.
How would you describe the Marvis Actions UI?
Our Marvis Actions page provides a highly focused view about a customer’s top issues. Administrators can simply look at the Marvis Actions page and identify the issues they need to fix. The same page provides recommended actions and supporting evidence. This means network admins don’t need to do the debugging, collect the PCAPs, or slice and dice the data to find the problem. We deliver a single point of view to give them all the answers they need.
Second, we reduce the noise. We don’t show admins tons of graphs, tables, or line charts and expect them to compare the notes. We make sure the answers we give them through Marvis Actions are at a manageable level. So, even though you’re managing thousands of stores in a retail chain or thousands of access points on a university campus, we make sure Marvis Actions is manageable by the network administrator.
—
To learn more about how AI is playing a role here at Juniper, take a closer look at the industry’s first AI-Native Networking Platform. And catch the full episode here.