Large cloud providers and ISPs offer service level agreements (SLAs) that guarantee uptime and help seal the deal with enterprises that value uptime.
These same enterprises often ask IT to make the same guarantees for the performance and uptime of the internal network, its many varied connections and even the applications.
At the same time, IT may have myriad SLAs from all kinds of vendors—including the aforementioned ISPs and cloud providers—it must manage. After all, what good is an SLA if you don’t know when it is violated? How can you judge a vendor if you can’t measure it?
SLAs have become more important and commonplace since COVID led to a radical decentralization of IT, making measurement more difficult. “This new COVID-induced reality increases an organization’s dependency on reliable data to identify how well (or poorly) operations are performing,” David Borowski, director with business and technology consultancy West Monroe, told CIO.
1. Get in Tune with Network Performance Monitoring and Management
Most SLAs are about performance: how fast something works or whether it works at all. Rather than rely on the vendor to track and report their own performance, network performance monitoring lets you do it yourself. Aside from comparing performance to SLA promises, network performance monitoring lets you set your own thresholds, measure performance in a detailed way and report on the current state and history of performance.
2. Acknowledge the Increasingly Common IT SLAs
The explosion of applications and the criticality of an always-on network brought with it a growing abundance of SLAs. The success of vendor SLAs led leading-edge shops to both demand SLAs from vendors and apply SLAs to the IT department itself. This is serious business. Not only is IT judged based on SLA compliance, but it is also often part of the compensation plan.
3. Conquer SLA Management by Measuring
Too many IT shops happily accept SLAs, even negotiate their contents, but have no way to measure actual SLA compliance. Fortunately, there are tools, including network monitoring, performance monitoring and application performance monitoring (APM) that support SLA management.
“A shift we’re beginning to see is an increased use of data and process discovery tools to measure SLAs,” Borowski of West Monroe said to CIO. “While not pervasive yet, these tools represent an opportunity to identify the most meaningful metrics and objectively measure performance (e.g., cycle time, quality, compliance). When provided by the client, it also eliminates the dependency on provider tools as the source-of-truth for performance data.”
4. Craft the Perfect In-house SLA and Negotiate Your Vendor Needs
If you are building an internal IT SLA or crafting one for a vendor, you should consider your performance, security and uptime requirements. For uptime, how many 9s do you need? Five 9s is a gold standard (meaning some 26 minutes of downtime a year), but is it always achievable? Meanwhile, four 9s, or 99.99% means 52 minutes of downtime every year.
Are there apps or services where even 99.999% is not enough? In this case, demand more but be willing to pay for high availability or failover. Here are three questions for IT to ask:
- What do your business leaders say about these issues?
- What are the costs of downtime or the risks of losing customers if the application or network service performs poorly?
- How much of your technology stack should be covered by SLAs?
5. Know the Cost of Downtime
The cost of downtime varies for every organization and is based on what actually goes down. In almost every case, the costs will make even a stoic CFO choke. According to a TechChannel article, “Enterprise downtime is now more expensive than ever: Some 44% of firms indicate that hourly downtime costs exceed $1 million to over $5 million, exclusive of any legal fees, fines or penalties,” the website said.
“Additionally, 91% of organizations said a single hour of downtime that takes mission-critical server hardware and applications offline, averages over $300,000 due to lost business, productivity disruptions and remediation efforts. Meanwhile, only 1% of organizations—mainly very small businesses with 50 or fewer employees—estimate that hourly downtime costs less than $100,000.”
6. Prevent Downtime and Performance Ills
IT can control many of the issues that cause downtime or performance kerfuffles. High availability can be achieved by discovering and recovering from end-user mistakes and doing the same with malware attacks. The key to both is full network visibility through network monitoring.
7. Find the Time to Meet SLAs
Keeping your network running—and running well—is time intensive. Having a network monitoring solution with total visibility, problem identification, alerting, root cause analysis and a dash of automation vastly reduces the time it takes to maintain the network and resolve problems.
Most IT teams have experienced dreaded intermittent performance problems caused by a network issue and negatively impacting the efficient use of business applications. In some cases, IT teams can spend weeks or even months identifying and fixing these elusive problems. Wouldn’t it be nice if you could disrupt the space/time continuum and create a wormhole to travel back and forth in time? Just imagine being able to apply unlimited resources to keep up with your workload.
8. Adopt Unified Infrastructure Monitoring
One way to get your day back is a low-cost alternative in the form of unified infrastructure monitoring (UIM). A UIM solution can save valuable IT time spent on many manual and repetitive IT tasks. Not to mention reducing the amount the time IT teams spend finding and fixing performance problems.
With a UIM, IT gains visibility into applications, server, network and storage performance. This dramatically reduces your MTTR (Mean Time to Resolution), helping to meet SLAs and providing a great user experience.
Want to learn how a UIM like WhatsUp Gold quickly solves network performance problems? Read further in our How To Meet Your SLA And Get 30 Minutes Back A Day post.