Posted by Kit Merker | October 22, 2020
Nobl9 & Adobe Systems: Let’s Talk SLOs for OpenStack

Delivering reliable software services is a challenge for any team running infrastructure, and OpenStack is no exception. Service Level Objectives (SLOs) help bring a data-driven approach to defining, measuring, and delivering the right level of reliability for a given use case while optimizing cost and pace of change. SLOs are an essential tool for any...

Posted by Brian Singer | October 12, 2020
Objective Launch Decision Making with SLOs

If you are a product manager or tech lead, you typically face three key decision points in the process of issuing a new software release:  What’s in the release? Go or No-go?Who takes responsibility for ongoing operations?  SLOs can play a helpful role in each of these decisions. Using SLOs to Decide What’s in the Release...

Posted by Marcin Kurc | October 6, 2020
A Year in the Life of Nobl9: Launching a Startup Amid a Global Pandemic

Some people say we should just write off 2020, the year when the global COVID-19 pandemic threw life as we know it off the rails and laughed at our carefully laid plans. Having launched a business mere months before the world unraveled, I can understand that sentiment. But I believe that in spite of the...

Posted by Alex Nauda | October 1, 2020
How to Use Service Level Objectives to Manage External Services

Originally published at The New Stack on September 14, 2020 Unless you’re operating an Astute-class submarine hundreds of meters beneath the surface of the sea, chances are good that your infrastructure and software development stack relies on third-party technologies. These technologies no doubt provide great value to your technology stack and deliver significant cost efficiencies (saving time, money,...

Posted by Alex Nauda | September 28, 2020
SLOs are good; SLOs for defined customer segments are better

We in the SRE world often speak in generalities about “customer happiness” and how SLOs can help us find that ideal balance between software reliability and the velocity at which we release new features. To be sure, for many of our conversations, it's expedient to refer to “the customer” as a homogeneous entity, as if...

Posted by Alex Hidalgo | September 22, 2020
Today I Join the Noble Pursuit of User Happiness

Today I am joining Nobl9 as their principal site reliability engineer. My goal in life is to make people happy, and I think this company is following a noble pursuit to make this happen. Let me tell you the story about how my journey has led me here. About a decade ago I ended up...

Posted by Kit Merker | September 9, 2020
5 “Reasons” I Hate SLOs

Excuses! I’ve heard them all. When it comes to why people “hate” Service Level Objectives (SLOs), I have heard my share of explanations, so many, in fact, that I’ve been able to create a persona-based list of the most common: I’m an application developer. I hate SLOs because they are just a way of getting...

Posted by Brian Singer | August 18, 2020
Nobl9 and Datadog: Better Data Makes Better SLOs

At Nobl9, we’ve been working hard to build our service level objective (SLO) platform that takes your existing monitoring data and connects them to your user and business goals. That’s why we’re pleased to announce that Nobl9 now supports Datadog as a monitoring metrics source.  SLOs, which are a small set of service KPIs and...

Posted by Krzysztof Wilczyński | August 2, 2020
SLO Many Talks About Reliability at KubeCon: Here Are Our Picks

The Ultimate Guide to SLOs at KubeCon Europe  If you’re into SLOs and you’re going to KubeCon + CloudNativeCon Europe August 17 through 20, this is the guide for you. We scoured the agenda for talks that we’re excited about, and we’ve picked out the ones we think will be most interesting and valuable. Don’t...

Posted by Kit Merker | July 28, 2020
Creeping Latency Metrics: Review of a Subtle Kubernetes Serverless Scalability Bug

TL;DR—To avoid scalability cliffs, you should pay attention to creeping latency. Think of it like a tire. If you have low pressure, you had better keep an eye on it to avoid a catastrophic blow out. Knative’s open source framework is useful in providing a serverless-like experience on top of Kubernetes, including features like an...

Posted by Jenn Ordonez | July 21, 2020
Takeaways from Enterprise DevOps Skills Report

The DevOps Institute recently released a report titled, “2020 Upskilling: Enterprise DevOps Skills.“ Several aspects of the report are worth amplifying, and I’ve summarized the elements I found most compelling. Global GDP Going Digital According to the World Economic Forum, by 2022, over 60% of global GDP will be digitized. An estimated 70% of the...

Posted by Kit Merker | July 10, 2020
Measuring and Optimizing CPU Performance

Software engineers are under constant pressure to make their desktop, laptop, and client applications run as efficiently as possible.  That’s easier said than done. Just ask Aaron Tyler, Principal Software Engineer at DocuSign. I spoke with Aaron recently about the relentless drive to improve infrastructure efficiency, and he shared some of his best practices for...

Posted by Marcin Kurc | July 6, 2020
Monitoring Tells Me Nothing, Says Your CEO

Do you need to justify your infrastructure spend to your CEO? In this article, we’ll equip you to communicate clearly and in CEO language about how the infrastructure investment you seek will benefit the business. How does a CEO know if the infrastructure team is delivering value or just spending money? First, let’s give your...

Posted by Marcin Kurc | June 29, 2020
Nobl9 Has Joined The Cloud Native Computing Foundation

Today we are proud to announce that Nobl9 has become a Silver member of the Cloud Native Computing Foundation (CNCF) and the Linux Foundation. We know that members of this community are trying to build reliable software, and we want to engage the community to help solve reliability challenges. We’re looking forward to sharing our...

Posted by Alex Nauda | June 23, 2020
Optimizing Cloud Costs through Service Level Objectives

Are you spending too much on cloud services? Is there a way to optimize your cloud spend and dramatically lower it? Here’s a hint: it has nothing to do with negotiating better deals (and bigger commitments) or reminding your developers to turn down unused infrastructure. This post outlines a different approach: by leveraging cloud-native infrastructure...

Posted by Kit Merker | June 15, 2020
Do You Really Need Five Nines?

Call it what you will: Always On, Six Sigma, High Availability, or Five Nines. Management wants you to deliver a service as close to perfection as possible. Little do they know about the downsides of high availability. In a world where applications are delivered over public networks, the physics of only five minutes of downtime...

Posted by Kit Merker | May 29, 2020
Creating Your First SLO: A Discussion Guide

Congratulations! You are ready to sit down with your team and establish your first Service Level Objective (SLO). You might be wondering where to start. Here’s an outline of how you could approach your first SLO-setting discussion with your developer and operations teams: Share a user story. Suppose you have an e-commerce user story that...

Posted by Brian Singer | May 28, 2020
Intro to Error Budget Policies

This article will get a bit deeper into how to turn service level objectives (SLOs) into a tool for balancing investments in new features and in improving the reliability of your software.  Error Budgets can be thought of as a conceptual model for understanding acceptable risk in your services. You want to provide a highly...

Posted by Marcin Kurc | May 28, 2020
How to Explain SRE to Your CEO

As we discussed in the previous post, Site Reliability Engineering (SRE) is an operating model that helps your organization grow and innovate with velocity while maintaining infrastructure reliability for service levels that keep customers happy. (It’s like DevOps on steroids.) The benefits of SRE are many. The end result is customer satisfaction, and the bottom...

Posted by Kit Merker | May 13, 2020
How do we measure the customer experience?

So many interactions with customers are going through digital channels that you’d think it would be easy to understand what our users are going through! It seems archaic to force a customer to call you on the phone. And even if they do, they will have to struggle through an automated customer service menu. Today,...

Posted by Alex Nauda | May 13, 2020
You’re Not Google. And, Yes, You Still Need SLOs

Much of the available literature about Site Reliability Engineering (SRE) and Service Level Objectives (SLOs) refers to pure, homogeneous infrastructure environments (think hyperscalers, including Google, where the concept of SRE originated). Row after row and data center after data center of similarly sourced and configured racks of gear, configured and operated the same way with...

Posted by Marcin Kurc | May 13, 2020
How the CEO Should Think About SRE

It’s 2020, and your customers expect a lot of you: they expect the most innovative services, up-to-date websites, easy to use software applications, and intuitive mobile interfaces. And, they expect all of those things to be available and running efficiently upon their demand. If customers don’t get what they want when they want it, they...

Posted by Kit Merker | May 13, 2020
An Easy Way to Explain SLOs and SLAs to Business Executives

If you appreciate the irony of TLA (a recursive acronym for "three-letter acronym") this blog is for you. Even if you find the “alphabet soup” of business unappetizing, read on anyway, because we guarantee you’ll find some meaty morsels of value in the following discussion.  Nearly everyone, whether you’re in the C-suite or on the...

Posted by Brian Singer | May 13, 2020
A Product Development Approach to Creating Your First SLO

Product managers play a critical role in driving and supporting reliability objectives. This may seem daunting, but we can apply the same PM principles used during product development to building reliable products. The SRE approach is the same. Just follow the steps: 1. Listen to the customer. When it’s time to discuss what your product...

Posted by Kit Merker | May 13, 2020
SRE 101: The SRE Toolset

What tools do you need to get started with SRE? Much has been written, especially by the founders of the Site Reliability Engineering (SRE) concept at Google, about the benefits of all parties in an organization working together to balance features and reliability. We recognize that there are two opposing forces that each make perfectly...

Posted by Brian Singer | May 13, 2020
How Product Managers And SRE Together Can Delight Customers

If you are a software product manager, you probably spend a lot of time thinking about how to delight your customers with new features and high-quality services. After all, your objective is to attract and retain customers and generate new sources of revenue for the company. But what about reliability? Reliability should be just as...