Error Budget

Posted by Alex Nauda |
SLOs are good; SLOs for defined customer segments are better

We in the SRE world often speak in generalities about “customer happiness” and how SLOs can help us find that ideal balance between software reliability and the velocity at which...

Continue Reading
Posted by Alex Hidalgo |
Today I Join the Noble Pursuit of User Happiness

Today I am joining Nobl9 as their principal site reliability engineer. My goal in life is to make people happy, and I think this company is following a noble pursuit...

Continue Reading
Posted by Kit Merker |
5 “Reasons” I Hate SLOs

Excuses! I’ve heard them all. When it comes to why people “hate” Service Level Objectives (SLOs), I have heard my share of explanations, so many, in fact, that I’ve been...

Continue Reading
Posted by Brian Singer |
Nobl9 and Datadog: Better Data Makes Better SLOs

At Nobl9, we’ve been working hard to build our service level objective (SLO) platform that takes your existing monitoring data and connects them to your user and business goals. That’s...

Continue Reading
Posted by Kit Merker |
Creeping Latency Metrics: Review of a Subtle Kubernetes Serverless Scalability Bug

TL;DR—To avoid scalability cliffs, you should pay attention to creeping latency. Think of it like a tire. If you have low pressure, you had better keep an eye on it...

Continue Reading
Posted by Kit Merker |
Measuring and Optimizing CPU Performance

Software engineers are under constant pressure to make their desktop, laptop, and client applications run as efficiently as possible.  That’s easier said than done. Just ask Aaron Tyler, Principal Software...

Continue Reading
Posted by Brian Singer |
Intro to Error Budget Policies

This article will get a bit deeper into how to turn service level objectives (SLOs) into a tool for balancing investments in new features and in improving the reliability of...

Continue Reading
Posted by Kit Merker |
SRE 101: The SRE Toolset

What tools do you need to get started with SRE? Much has been written, especially by the founders of the Site Reliability Engineering (SRE) concept at Google, about the benefits...

Continue Reading