More by Kit Merker:
What is Five 9s Availability? Do you really need 99.999% Server Uptime? SLO Many Talks About Reliability at KubeCon: Here Are Our Picks SRE 101: The SRE Toolset Differentiating Services with SLOs to Grow Revenue Optimizing Cloud Costs through Service Level Objectives SLOconf Speaker Profile: Alina Anderson Announcing SLOconf 2023: A Global Event With a Local Feel Creating Your First SLO: A Discussion Guide You’re Not Google. And, Yes, You Still Need SLOs Creeping Latency Metrics: Review of a Subtle Kubernetes Serverless Scalability Bug Measuring and Optimizing CPU Performance Want a Reputation for Reliability? Keep it Simple. Interview with Matt Klein Nobl9 Demo: Setting up a Prometheus SLO with the Web UI Measuring Technology ROI: SLOs for CFOs Tame the YAML in 2021| Author: Kit Merker
Avg. reading time: 3 minutes
If you’re into SLOs and you’re going to KubeCon + CloudNativeCon North America November 17 through 20, this is the guide for you. We scoured the agenda for talks that we’re excited about, and we’ve picked out the ones we think will be most interesting and valuable.
An SLO-Driven Approach to Enhance Kubernetes Cluster Reliability
Wednesday, November 18 • 2:45pm – 3:20pm PT
Summary: This talk first briefs the philosophy behind the SLO-driven approach for reliability engineering, followed by a deep dive of how SREs define SLOs for one of the world’s largest Kubernetes clusters in Ant Financial.
Comment: Looks like a great intro to the SLOs with hands-on examples for Kubernetes clusters.
Evolution of Metric Monitoring and Alerting: Upgrade Your Prometheus Today – Bartlomiej Płotka, Red Hat, Björn Rabenstein & Richard Hartmann, Grafana Labs, & Julius Volz, PromLabs
Wednesday, November 18 • 12:00pm – 12:35pm PT
Summary: Prometheus Maintainers will introduce you to the universe of reliable monitoring and alerting with metrics via Prometheus with specific and actionable examples. After that, we will make sure more experienced users can learn as well, by explaining the advanced usage patterns of the Prometheus and new, useful features available in the newest versions.
Comment: Learn about monitoring using Prometheus and Thanos, an essential tool for any Kubernetes user.
A High-Schooler’s Guide to Kubernetes Network Observability – Drew Ripberger, Nirmata
Wednesday, November 18 • 12:00pm – 12:35pm PT
Summary:Though prior to his internship, Drew had not used Kubernetes, or even heard of eBPF or Prometheus before getting assigned the project, this talk will take you through the creation of kube-netc and his journey from hacks and workarounds to utilizing everything that the CNCF ecosystem has to offer.
Comment: Look at Kubernetes and Prometheus through the fresh eyes of a highschooler beginner.
Eating Your Vegetables: How to Manage 2.5 Million Lines of YAML – Daniel Thomson & Jesse Suen, Intuit
Wednesday, November 18 • 12:00pm – 12:35pm PT
Summary: This session will explain our journey, hard lessons faced for managing YAML at scale, and where Intuit thinks the future of Kubernetes configuration management needs to head.
Comment: All that yaml can be daunting, how to keep track of it all?
Supercharged Analytics for Prometheus Metrics with Spark, Presto, & Superset – Rob Skillington & Gibbs Cullen, Chronosphere
Thursday, November 19 • 12:45pm – 1:20pm PT
Summary:We’ll walk through a working example to run Superset and Presto in docker connected to a remote Prometheus to perform advanced SQL queries of arbitrary size reliably without timeout. We’ll also demo joining metrics data using the Kubernetes node name Prometheus label to detailed Kubernetes object metadata (events, pods, etc) collected by Fluentd using a simple SQL join thanks to Presto’s query federation capabilities.
Comment: Advanced SQL in Prometheus helps with creating rich customer-focused SLOs.
How the OOM-Killer Deleted My Namespace, and Other Kubernetes Tales – Laurent Bernaille, Datadog
Thursday, November 19: 1:50pm – 2:25pm PT
Summary: In this talk Laurent and Tabitha will share some of these stories, including a favorite: how a complex interaction between familiar Kubernetes components allowed an OOM-killer invocation to trigger the deletion of a namespace.
Comment: War stories of unreliability are fun, and a great way to learn.
Whatever Can Go Wrong, Will Go Wrong – Rook/Ceph and Storage Failures – Sagy Volkov, Red Hat
Thursday, November 19 • 1:50pm – 2:25pm PT
Summary: In this presentation we’ll go over the basics of storage demands (RPO/RTO), How different types of replications in Ceph impact our recovery time, and how components failure such as drive, node or cluster determine how long we are at risk. We’ll include a live demo of a Rook/Ceph recovery process from a failed component. We’ll show what components of Rook are recreated, how Ceph behaves during components/pods recreation, and what is the impact on the application while these failures occur (In our case the application will be MariaDB).
Comment: Live demo of failure and recovery!
GitOps Is Likely More Than You Think It Is – Cornelia Davis, Weaveworks
Friday, November 20 • 12:10pm – 12:45pm PT
Summary: In this session Cornelia will cover the four key principles of GitOps, and she’ll demo those concepts with specific tools including Flux. She’ll also talk about use cases including cluster-api (CAPI).
Comment: Looks like a great intro to GitOps, starting with the 4 key principles.
Image Credit: Frank Eiffert on Unsplash
Do you want to add something? Leave a comment