More by Kit Merker:
Nobl9 & Adobe Systems: Let’s Talk SLOs for OpenStack What is an SLO? Explained in 90 Seconds SREs: Stop Asking Your Product Managers for SLOs Reliability Evolution from Datacenter to Cloud: Interview with Less Lincoln, SRE at Microsoft An Easy Way to Explain SLOs and SLAs to Business Executives Want a Reputation for Reliability? Keep it Simple. Interview with Matt Klein Why Your Marketing Site Needs Reliability Targets (SLOs) Too 5 “Reasons” I Hate SLOs Nobl9 Demo: Setting up a Prometheus SLO with the Web UI Nobl9 Demo: GitOps Ready sloctl and SLO YAML The Ultimate Guide to Reliability Talks at re:Invent 2020 Nobl9 and Datadog: Better Data Makes Better SLOs Nobl9 Has Joined The Cloud Native Computing Foundation Nobl9 Demo: Kubernetes Cluster Failover Scenario Kubernetes Knative Serverless Latency Metrics: Interview with Matt Moore| Author: Kit Merker
Avg. reading time: 4 minutes
So many interactions with customers are going through digital channels that you’d think it would be easy to understand what our users are going through!
It seems archaic to force a customer to call you on the phone. And even if they do, they will have to struggle through an automated customer service menu. Today, even the most traditional brick-and-mortar retailers need to have an amazing digital experience.
Behind the scenes, you may have dozens of software systems that need to work together and orchestrate a seamless handoff in order to not annoy (or worse, drive away) your customers. When a customer wants to sign up, log in, reset their password, see what’s new, and purchase from you, this is the worst time for a broken handoff from one internal service to another, or for a small blip of an outage to cause a problem for the customer. If it happens enough times, they’ll simply leave you.
But how to actually implement customer experience metrics in a meaningful way? And how do you avoid exposing your organization structure to customers so they can focus on the task at hand? (see Conway’s Law)
The goal is to tune your service metrics so that they have a high correlation with user happiness, which will allow your teams to trust them when they are trying to optimize the reliability of a system
There are two parts to measuring this experience. First, you need to understand the reliability and latency of the underlying services that accrue to the user experience. Second, you can track user happiness by benchmarking user journeys through your product against objective customer feedback. Once this is done, the various teams need to take these metrics seriously in their product roadmap.
Let’s break this down.
Reliability Is the Most Important Feature
Reliability comes first because, without it, your features simply don’t matter. You can add all the fancy graphics, design, and features you want but if you can’t load the damn page quickly, users are going to fall out. A powerful methodology is to define Service Level Objectives (SLOs) that set reliability goals for your software services from the user’s perspective. The idea is to count the number of service requests from a user that succeed, and also to start a timer from when a customer requests something until the task is completed. Don’t worry about measuring every little step along the way, but rather create 2-3 service Key Performance Indicators (KPIs) that clearly tell you what percentage of your customers are able to complete important tasks and how quickly.
Ideally, you would have a very high percentage of requests from customers (clicks or swipes) that complete the roundtrip to your servers and back correctly and quickly. But the reality of software systems is that there are a lot of things slowing users down. Spotty wifi connections, heavy network load periods, restarts of servers when they are patched with new versions—these all multiply each other to lower the reliability of your system. So you might set a goal to have 99.9% of requests succeed (1 failure that needs to be “re-tried” in 1,000) and 99% of requests to complete within some time threshold, 1 second say. (You could also have a goal for the “average” time to be even faster.)
Measuring the User’s Journey
Now that we have reliability metrics, let’s move on to measuring the user experience objectively. Start by defining some of the most important user flows through your system.
For our example, we will go through a sign-up journey, right when a user is getting to know your product for the first time. The journey probably begins on a search engine, then heads to your marketing site, then on to a sign-up link. They will provide some basic information and maybe some credit card details. Behind the scenes, you will need to validate them with an email or text verification and make sure it’s a new user and not a returning one. Your system will provision an account for them and typically will block certain features until they verify who they are.
Even in this simple example, you can see there are plenty of steps and chances for something to break or delay, and the customer hasn’t even signed up yet!
So let’s apply the same methodology from step one and set up user experience KPIs to track if the user is able to complete the journey easily. First, we need to decide when to start and stop the KPI. Since we can’t see them using the search engine, we begin when they land on our webpage. But that might give you too many window shoppers, so we might instead look at people with clear intent—they clicked the sign-up button! OK, so now we start the clock.
On the other end of the process, we can’t really call them “done” until they reach the end of the sign-up and we have a valid account. This is the most conservative view of customer completion of this process. So we wait until they are in the product with a verified account.
How long should this take? And what percent of customers would you expect to complete this process?
And even more importantly, how will you redesign your product to ensure these goals are met? Setting these objectives in terms of time to complete a task will give you a way to communicate to each technology team clear business goals. You need more than a dashboard that no one looks at, but rather a set of rules for how these KPIs drive the roadmap. Essentially, have each team explain how they expect the work they are doing to improve these metrics.
Do these metrics tell us anything?
Once these metrics are implemented, the next question is to understand how they correlate to objective measures of user happiness and business outcomes. First, figure out which sources of data tell you when people aren’t happy. This can be support tickets, cancellations or returns, trial conversion declines, or even negative Twitter sentiment. If you are seeing an increase in users not completing the sign-up experience, this is a pretty clear one. But you might have service KPIs and user experience KPIs that are more nuanced and “mid-experience” for a user too.
The goal is to tune your service metrics so that they have a high correlation with user happiness, which will allow your teams to trust them when they are trying to optimize the reliability of a system (which takes significant effort and investment) or when they are stable and can now focus on improving the user experience and adding features. Note that the natural tendency for product managers and software teams is to push for new features and let reliability slip. But this can be backwards. Without adequate reliability, the user experience becomes meaningless.
Aligning the Team around Service and Experience KPIs
Once you have these end to end metrics you can go to each backend service that contributes to the user flow and ask them how they are either hurting or helping the metric. Come up with action plans to focus on the critical user journeys and the impact can be huge—an aligned team working together to delight users and grow the business, and doing it without guessing.
Read more about SLO basics in this blog post that lays it all out.
Image credit: Ahmed Zayan on Unsplash
Do you want to add something? Leave a comment