The Problem of Traditional Troubleshooting
Sometimes problems occur unexpectedly. Other times problems grow in the shadows, lurking for months or years unnoticed. They slowly chip away at users’ service expectations and confidence in IT’s ability to deliver on their service promise.
IT issues continue to become more complex and interdependent, with underlying networks also continuing to morph and change. IT services are also expanding, with new service surfaces emerging and growing. Layers upon layers of protocols and interactions lead to a range of known and unknown states resulting in a mixture of expected and unexpected outcomes.
Troubleshooting issues relies on understanding these different states and interactions to resolve and remediate the inevitable problems that arise. The right amount of observability enables teams to reason about and then restore systems to their correct state. For networks, which are themselves distributed systems, this observability hinges on many things, such as propagated signals, logs and interactions between multiple components. Simply put, networks are complex and debugging issues becomes an arduous undertaking.
Traditional monitoring and troubleshooting practices are lagging. They’re starved of the fidelity and granularity of data needed to identify not just intermittent issues, but root causes. In this blog, we dive into why AIOps is the need of the hour, and how Juniper Mist’s AI-powered troubleshooting can radically alter experiences with Marvis Virtual Network Assistant (VNA) and Marvis Actions.
Removing Fear, Uncertainty and Doubt with Juniper Mist AIOps
As automation practices evolve, blended machine learning models are being leveraged to deliver artificially intelligent solutions which help network operators with common problems. AI Ops (AI Operations for IT) can reach deeper, further and faster than a human operator can when scouring application flows, network logs and system states for clues and unexpected dependencies. AIOps makes sense of connected and sometimes seemingly unconnected events, surfacing not just problems but, with the right training, their fixes. Additionally, AIOps offers automated remediation, which can be gated to require permission for actions to be taken.
AI solutions can learn from previous experiences. They can rationalize dependencies and then apply those learnings to practical effect. This applied learning provides new tools that deliver novel and faster ways for teams to work.
Troubleshooting is non-trivial and especially so for complex systems or interactions. It requires context and history, including an understanding of how system states influence one another. What can take a human many hours or days can now be accelerated and performed in milliseconds by AI-powered platforms. With the right data at the correct resolution, AI models can achieve a depth of protocol and network dependency understanding that matches, and in many cases, far exceeds human capabilities per unit of time.
Marvis VNA (Virtual Network Assistant)
Simplifying and democratizing access to AI is a game changer for IT teams. Access is crucial for accelerating and realizing better outcomes across the complex playing field that is IT. One of the easiest methods to interact with AI and complex systems is just to use natural language to inquire about any task or issue at hand. This is what Marvis delivers, a simple real-time interface to your complex network using conversational AI.
Marvis’ chat interface allows for either simple or complex questions to be both asked and answered. Irrespective of whether questions are posed by front-line helpdesk support or network engineering, Marvis can back up any observations or assertions made with the relevant graphs and data.
More recently, Marvis has been augmented with new capabilities over and above existing real-time troubleshooting. Marvis can now provide better contextual responses regarding technical documentation-related questions and tasks rather than just surfacing relevant links. This is achieved using Large Language Models (LLMs) such as ChatGPT and will ensure that Marvis’ efficacy and training continues to improve as more data sources are added and integrated. Marvis can also be added to a Microsoft Teams group for easy interaction with the users and operators of your choice.
One of the newest data sources now provides Zoom insights by ingesting data from Zoom clients and the Zoom cloud. This enables Marvis to not just understand the quality of video and audio streams but also predict when bad video and audio experiences will occur based on peripheral network conditions.
Marvis Actions
Instant Root Cause and Fix
Indeed, highlighting a problem is not fixing the problem. Marvis, however, can actually give the exact answer on how to remediate an issue and then validate a successful fix. In certain scenarios, Marvis can even go ahead and fix the issue for users! This is what a self-driving action framework means and delivers.
Bad Cables
Ask any network engineer about bad cables, and their face will sour. War stories ensue regarding hours spent trying to identify the root cause of an intermittent issue only to be rectified by replacing a damaged or badly terminated cable with a new one.
Layer 1 issues still present themselves regularly, and they can cause havoc before being finally identified and fixed.
Missing VLANs
Across the wireless user access edge, there are often VLANs missing on trunks to Access Points (APs) or sometimes on other infrastructure trunk ports. Marvis can detect these missing VLANs and highlight where and how to address such issues. Missing VLANs can cause everything from basic connectivity issues to a range of service dependency issues, but Marvis has the IT teams’ back and can even validate when fixes are rolled out.
Negotiation Mismatches
Another insidious issue is that of ethernet speed and duplex mismatches. Even with auto-negotiation configured, a common problem is that one end or the other starts flapping due to being unable to negotiate speed or duplex properly. This issue can rapidly become a problem for users and machine agents alike where the de-facto expectation is that of gigabit (i.e., 1000Mbps (1Gpbs)) at full-duplex. Even with the right logging level and monitoring, chasing down negotiation mismatches can be tricky but not with Marvis Actions.
Loop Detection
Although loops are less frequent, they can be catastrophic to production traffic resulting in localized or widespread outages. Detecting loops early is critical. Although loops should not happen with defensive configurations, ports are oftentimes misconfigured, and physical changes or logical changes can cause loops that bring the whole network to a halt. Using flexible templates (mentioned in Simplified Planning: Optimizing for Success and Assurance) helps to reduce the likelihood of loops forming, but rapid and reliable loop detection is still a must-have.
Port Flap
Flapping ports can be a result of different scenarios at the local, intermediate or remote portions of a connection. Unless a device or endpoint goes to sleep frequently, ports should neither flap repeatedly nor continually. This is especially true for infrastructure ports or WAN ports. Port flaps can be due to physical, environmental or logical issues (including sometimes negotiation mismatches), and determining what’s anomalous is crucial. The wired user access edge may see ports go up and down frequently, but not continually. Marvis Actions detects flapping and suggests when to change or update port profiles for offending ports.
High CPU
Infrastructure devices with continuous or bursts of high CPU can slow down device operations to a crawl. Depending upon how the management control, control and data plane operate, traditional monitoring can show high CPU, but when looking for anomalous or intermittent issues, more intelligent monitoring and detection are required.
Port Stuck
Another frustrating and functionality-impacting problem is that of a port (or ports) stuck in the “down” state. Previously they may have been working fine and showing as “up” correctly, but for some reason, they are now stuck in a down state even with some layer 1 connectivity and signalling still occurring on the remote end. Marvis can spot this and recommend when bouncing the port may help to restore connectivity.
Traffic Anomaly
Anomalies can be difficult to detect. They require an understanding of what normal traffic looks like to be able to subsequently show deviations over time. With Marvis Actions, we can rapidly see when there are elevated or even storm levels of multicast or broadcast traffic. This includes an estimation of whether adjacent local ports or the rest of the site may have been impacted.
Leveraging AIOPS (AI for IT Operations)
As you can see, Marvis Actions are extremely useful and powerful, and save a huge amount of troubleshooting time. Marvis Actions surface issues early before they become bigger unwieldy problems. AI-powered troubleshooting means faster, richer and deeper diving for IT teams. Take advantage of rapid answers to your questions, proactive remediation and the confidence to face complexity.
AIOps Customer Story Example: University of Texas at Arlington
University of Texas at Arlington, located in the heart of Dallas-Fort Worth – the second largest institution in The University of Texas system – has been widely recognized as a best value in education by Forbes and others. When the COVID-19 pandemic first swept Texas, UTA quickly pivoted to e-learning and remote work. That agility was enabled by AI-driven Juniper networking from the classrooms and research labs to the data center and cloud apps. Learn more here.
Next, check out how our AI-Driven full stack for switching and security assures everyone from users to stakeholders, across LAN and WAN, that whatever your next steps, they’re optimal, explainable and secure.
- Try Wired Assurance today! Click here for a free 90 day trial of Wired Assurance.
- Join our weekly demos to see how you can drive better user experiences across the wired, wireless and WAN through Mist AI!