More by Kit Merker:
SREs: Stop Asking Your Product Managers for SLOs Tame the YAML in 2021 How do we measure the customer experience? Want a Reputation for Reliability? Keep it Simple. Interview with Matt Klein Nobl9 & Adobe Systems: Let’s Talk SLOs for OpenStack The Ultimate Guide to Reliability Talks at re:Invent 2020 Nobl9 Has Joined The Cloud Native Computing Foundation Why Your Marketing Site Needs Reliability Targets (SLOs) Too What is an SLO? Explained in 90 Seconds An Easy Way to Explain SLOs and SLAs to Business Executives Nobl9 Demo: GitOps Ready sloctl and SLO YAML Nobl9 Demo: Kubernetes Cluster Failover Scenario Delivering the Right Data for Better SLOs with Nobl9 & New Relic Driving SLO Adoption through CICD Nobl9 and Lightstep Partner to Integrate Distributed Tracing Technology into SLO Management Platform| Author: Kit Merker
Avg. reading time: 4 minutes
Excuses! I’ve heard them all. When it comes to why people “hate” Service Level Objectives (SLOs), I have heard my share of explanations, so many, in fact, that I’ve been able to create a persona-based list of the most common:
- I’m an application developer. I hate SLOs because they are just a way of getting me or my team to carry the pager. Before, we were plenty busy enough just writing all the code. Now, we’re expected to run the service too. No, thanks. I’d rather throw this whole SLO business over the wall and not bother with it. Let’s go back to a proper division of labor.
Response: First of all, the code you’re writing is experienced by the user as a service. That means your job isn’t just about writing code; it’s also about creating happy customers. You need SLOs to do that effectively. Secondly, when you adopt SLOs, whoever carries the pager is going to have a better time, because they’ll be working with realistic goals that match business priorities. Start by setting clear objectives for service that are achievable now. Then determine the aspiration you have for each service. You can organize your team, your risk management, and your incident response around these goals.
Reliability is a team sport. Business and technology stakeholders must come together to achieve the right level of reliability.
- I’m a senior SRE. I hate SLOs because I really don’t like the interdepartmental approach. I mean, let’s be real about this idea of everyone working together to create SLOs that serve the business. After all, business people and tech people aren’t known for communicating on the same wavelength. Frankly, business folks don’t have the patience or precision to be involved in defining SLOs. When we get down to the nitty-gritty of SLOs, you can see the business people’s eyes glaze over. I just don’t think this team approach is realistic. Leave reliability in the hands of the engineers, where it belongs.
Response: It’s tempting to just create the SLOs yourself based on your assumptions and knowledge of how the business works. For SLOs to have meaning, they need to fit into the business priorities. You don’t necessarily need your product manager or CEO to define SLOs for you, but you do need to get a clear understanding of relative priorities, critical business timeframes, and use cases, and also create a feedback loop of performance vs. plan over time. Understanding the tradeoffs and taking stock of your infrastructure is a critical step in gaining alignment between business and engineering stakeholders.
- I’m an enterprise executive. I hate SLOs because I would rather dictate reliability from the top down. I was schooled in the philosophy of “Culture eats strategy for lunch,” and I can lead this whole reliability effort myself. I’ll just explain in the next company-wide meeting that reliability is our priority and instruct all of our operational managers to make it so. We’ll all talk about reliability a lot in our meetings and as we walk the halls. Boom. Done.
Response: Talking about reliability doesn’t fix a broken engineering culture. If you are setting unrealistic expectations (100% reliability, anyone?) you are hurting the culture and losing credibility. Giving your engineering team clarity of priorities and defining as precisely as possible the goals and what matters to the business is critical. Don’t call out or punish teams for missing reliability goals, but instead encourage them to learn and raise issues early. Create a list of all the business-critical situations that you want them to de-risk, and let everything else go.
- I’m an entrepreneur. I hate SLOs because they aren’t on my priority list. We’re a startup, running fast with scissors. Our focus is 100% on getting features out as quickly as possible. Time to market is king. Sure, that comes with some risk, but that’s a risk we have to take at this stage of the game. If we’re lucky, we’ll survive long enough to concern ourselves with reliability, then we’ll know we’ve made it.
Response: It’s true your first job is to get the product to market and build something people want and will pay for. But even the earliest software product has critical user experiences as well as less critical ones. Defining goals and measuring against them give focus and clarity to your team to engineer to the requirements at this stage. As the business is successful and scales, you can ratchet up the reliability guidelines appropriately and manage the infrastructure cost to deliver.
- I work in IT Ops and I hate SLOs because that SRE stuff is just another attempt to automate me out of a job. I know what’s up here. It’s the same old song, just another cover—at the end of the day, ‘SLO’ is after my ‘FTE.’ You can count me out.
Response: The move from internally operated IT applications to SaaS service is one of the biggest migrations going on in business and technology today, so this worry is not unfounded. However, SLOs can help IT operations in a few ways. SLOs help organizations understand the real needs and expectations of applications and infrastructure to optimize costs, manage upgrades and migrations, and deliver value that is aligned to business goals. Whether you are developing an application in house, running it on your own infrastructure, or receiving it via a SaaS service, SLOs connect reliability metrics to business KPIs.
Now, admittedly, I’ve exaggerated a bit here. As science fiction writer John Brunner quipped, “Don’t bother explaining—I’ve heard all the excuses and the trouble is most of them are true.” That said, the examples I’ve offered are instructive, as they reflect a larger, more powerful truth:
Reliability is a team sport. Business and technology stakeholders must come together to achieve the right level of reliability.
No matter what your role in the company, your investment in the process of defining SLOs is like sharpening the saw: it’s time well spent that will later save you inordinate amounts of time, resources, money, and customers. And, like most challenges we face, taking a “best practices” approach, and having the right tools for the job makes the task so much EASIER and the results remarkably BETTER.
That’s what Nobl9 is here for—to equip you with the knowledge and tools you need to ensure that your investment in SLOs pays dividends beyond your expectations.
Do you want to add something? Leave a comment