More by Kit Merker:
Going to KubeCon in Search of Reliability Talks? The Ultimate Guide. What is Five 9s Availability? Do you really need 99.999% Server Uptime? Optimizing Cloud Costs through Service Level Objectives Announcing SLOconf 2023: A Global Event With a Local Feel Creeping Latency Metrics: Review of a Subtle Kubernetes Serverless Scalability Bug How to Convince Your Boss to Adopt Service Level Objectives Want a Reputation for Reliability? Keep it Simple. Interview with Matt Klein An Easy Way to Explain SLOs and SLAs to Business Executives Delivering the Right Data for Better SLOs with Nobl9 & New Relic Tame the YAML in 2021 5 “Reasons” I Hate SLOs Nobl9 Has Joined The Cloud Native Computing Foundation Nobl9 Demo: Kubernetes Cluster Failover Scenario Nobl9 Demo: Setting up a Prometheus SLO with the Web UI Nobl9 and Datadog: Better Data Makes Better SLOs| Author: Kit Merker
Avg. reading time: 3 minutes
Differentiating services from the competition can grow revenue. Adopting SLOs (Service Level Objectives) can get you there efficiently and positively impact organizations.
IT leaders across industries are under pressure to deliver beautiful, delightful, and elegant solutions for their customers. Additionally, businesses must differentiate their services and grow revenue while meeting customer expectations. A new trend I’m seeing is the need for fast, responsive service and providing B2B customers with direct access to reliability data to build trust. Implementing a differentiated service tier with performance transparency for your top customers can set you apart from others who obfuscate their unreliability.
Transparency for your Customer
When building world-class customer-facing digital systems, you must be transparent about the reliability and performance of the system. You need to share contextual information with customers when your system is working and not working. You need to account for the network, your 3rd party vendors, and other factors that might impact local user experience. Simply stating that “everything’s working on our side” is not good enough.
Top internet companies like Google developed Service Level Objectives (SLOs) to provide transparency with context both within the organization as well as outside it. Now, SLOs are catching on across many companies to improve reliability and transparency. The risk with transparency is assumptions made by the other party without context. SLOs add an essential context – clearly defined goals, tuned over time – that lets you share more candidly with customers, suppliers, and employees. This level of accountability is a huge differentiator from service providers that have an SLA “just in case.” Think about it; when customers choose your API, they need to trust it and plan for risks. They can't engineer a solution if they have no idea of a realistic level of reliability and performance. It doesn’t mean they expect it to be perfect; they can add retries, caches, queues, and failovers. But they need to know!
Counterintuitively, transparently sharing when your system is impacted and not fully operational increases your end-users trust in you.
Finding the Sweet Spot
Adopting SLOs lets you balance competing interests to deliver top-shelf technology products by measuring reliability, performance, support tickets, and any other goals you can set. By closely tracking these metrics, you can decide what is most important at the moment. Tuning goals and reacting to new information gives you an OODA loop that lets you adapt quickly and build new capabilities that customers need and engineering can feasibly produce. Finding the “sweet spot” in delivering innovation vs. stability allows you to optimize your technology investments.
Managing Interdependence
A complex web of dependencies exists between different teams and systems within modern organizations. 3rd party vendors, the Cloud, and microservices have increased this trend. Interdependence is an asset because it allows reuse (for example, not reinventing the database for every app). But interdependence is also a liability when it impedes independent progress or cascades into a global outage. How can we map and manage the interdependence in our services to our advantage while minimizing the risks associated with tightly coupled interlocking pieces?
With SLOs, you can demystify and leverage interdependence in your systems. Defining clear service boundaries, dependencies, and expectations and seeing historical performance vs. goals helps teams better plan, implement, and run services for their customers. Instead of avoiding dependencies and building expensive silos, teams can quickly reuse other internal and 3rd party services to drive faster innovation and time to market. This approach is a considerable advantage, especially as specialized technologies like AI/ML become available through reusable APIs. Without reuse, your user experience will quickly fall behind as you try to do too much yourself.
Justifying The Move
Many organizations expect engineering and operations teams to stare at charts and metrics to understand their systems' performance. SLOs prevent the need for continuous chart-gazing and instead enable proactive “reliability burn down” alerts based on the concept of error budgets (the small, tolerable amount of error in any system that built-in resiliency and customer retries will overcome).
SLOs mean less staring at charts and waiting for something to go wrong. Instead, a fully automated system proactively grabs your attention. Further, this doesn’t mean treating blips as emergencies but rather a progressive intensity of escalation, starting as simple as an email or slack message, to filing a ticket, to a full-scale escalation.
The Bottom Line
To differentiate your service and grow revenue, think like the tech giants. Balance competing interests, provide contextual transparency to customers, and manage your interdependence to your advantage. Get your team off the “chart staring contest” and back to work. Adopting SLOs might sound scary, but it's easier than you think, and there is a thriving community to support you. Waiting to adopt SLOs until your tech is more mature is like waiting to go on a diet until you’ve lost weight. New revenue, new customers, and differentiated experience are all in store for your organization, but you must set up the basic infrastructure to drive the business forward.
Do you want to add something? Leave a comment