More by Kit Merker:
SRE 101: The SRE Toolset Creating Your First SLO: A Discussion Guide You’re Not Google. And, Yes, You Still Need SLOs Optimizing Cloud Costs through Service Level Objectives SLO Many Talks About Reliability at KubeCon: Here Are Our Picks Announcing SLOconf 2023: A Global Event With a Local Feel What is Five 9s Availability? Do you really need 99.999% Server Uptime? Going to KubeCon in Search of Reliability Talks? The Ultimate Guide. Differentiating Services with SLOs to Grow Revenue SLOconf Speaker Profile: Steve McGhee Measuring and Optimizing CPU Performance Creeping Latency Metrics: Review of a Subtle Kubernetes Serverless Scalability Bug Nobl9 & Adobe Systems: Let’s Talk SLOs for OpenStack| Author: Kit Merker
Avg. reading time: 3 minutes
I recently had the opportunity to sit down with Alina Anderson, Senior TPM of Site Reliability Engineering for Outreach.io, to talk about SLOconf, where she’ll be speaking and attending. SLOConf is the first Service Level Objective Conference for Site Reliability Engineers set for May 17-20.
The following interview has been lightly edited and condensed.
Hi Alina! Tell us about the talk you’re giving at SLOconf.
I’m excited to share some hard fought lessons from my own experience leading on-call programs, a TLDR survival guide for those of us tasked with keeping the wheels on the bus through rapid organizational and system change. I can’t stop talking about the human side of service ownership. I hope that listeners will discover at least one new idea they can apply immediately to improve their team’s quality of life.
What inspired you to talk?
I’m excited to see an explosion of new Observability and Incident Response tools in the market. This conference is pioneering a community conversation around SLOs and why they matter. Adopting this practice can have a huge impact within engineering teams and also within an organization. I wanted to be a part of increasing the visibility of these ideas and also to hopefully improve my presentation and speaking skills!
What is the biggest challenge you’re facing in your day job when it comes to reliability?
Reliability is everyone’s job. You can’t just hire a super smart engineer to make your product reliable. It’s a shared responsibility across the organization, and maintaining alignment to achieve our most important goals is my biggest challenge.
How would you define reliability?
I would define reliability as “Do you meet your customer’s expectations?” In other words, can they trust you the way they want to trust you? Every day, you’re making a promise, explicitly or implied, to every user. It’s important to understand the distance between expectations and reality.
Do you think reliability has become more important over the past year?
I don’t think reliability has become more important (it’s always been important) but it has become more VISIBLE.
I don’t think reliability has become more important (it’s always been important) but it has become more VISIBLE. In the last year, we have seen unprecedented scale and demand for SaaS software products, and this can lead to outages while companies try to keep up – we saw Zoom, Slack, Google, Microsoft, AWS, etc experience service interruptions under the weight of it all. This means it’s important to raise the bar within every organization, including the reliability of your internal tools. You have to rely on your tools, and your customers rely on you.
What do you think about the format and what are you hoping to get out of SLOconf?
It can be difficult to take off long chunks of time during the work week so I really like the while-you-work format. I also like the lightning talk style, you can get a broader surface area on which topics you might be interested in learning/exploring. All these new responsibilities during the day – working/parenting/caretaking lead to less time to invest, so you have to be picky. The #SLOconf format allows for that. I’m really hoping to discover some new concepts and innovative ways of thinking about reliability. I’ll be joining the discussion in the Slack community too.
How have you adapted to lack of travel, how are you getting new info and learning and connecting with people?
Meetup.com is a great resource for events and now you can participate in a larger range of DevOps communities or events that are meeting virtually.
If someone is on the fence about coming to the event, what would you tell them to help them decide?
It’s free! And although it’s a 3 day event, you can reduce it to a bite sized chunk by selecting one to three talks – well worth investing an hour of your time. It isn’t an all or nothing event – the dedicated Slack channel is a great way to meet other SREs and participate in the community. Meeting reliability nerds from all over is definitely one of my favorite parts!
Why is there so much interest in SLOs that it justifies a whole event?
SLO is shorthand for solving business problems.
SLO is shorthand for solving business problems – which is universal. It’s a language and behavior with an organization that is gaining adoption, and this practice will drive effective outcomes if you’re willing to learn more about it. SLOs are a shared language across organizations.
You’ve also been active at the Beyond Seattle SRE Meetup – what do you think of that event?
I enjoy the no sales format – observability vendors can be relentless (laughs). I appreciate that and I appreciate the time and effort put into selecting speakers. The conference and the meetup welcomes everyone interested in reliability – you don’t have to have an SRE title.
Do you think there’s an opportunity for these virtual formats to stay around?
Absolutely. Especially when I think about professional development communities. I no longer have to plan how to get across town in traffic and clear my calendar for the afternoon – I just log in. Sloconf is going to be an awesome learning conference and I’m excited to discover some new ideas that I can bring with me into my role at Outreach.io.
Do you want to add something? Leave a comment