In the course of their day-to-day work, our development team actively relies on our metrics platform to confidently ship code to production and debug problems. They measure and correlate behavior between services on live production workloads, use real-time data to reason and hypothesize about production problems, and add or modify metrics and instrumentation in production to prove out their assumptions. Our own success in utilizing the metrics stream from production to close our engineering feedback loop, has convinced us that this, practice, which we describe as Metrics Driven Development (MDD), is a requirement of building web-scale systems. It is a discipline that should be implemented by development teams alongside other development paradigms like Test-driven-development (TDD) and Behavior-Driven-Development (BDD).
Our talk will recount an episode where we employed MDD to diagnose an actual problem encountered in our production system running at scale. The audience will follow as the developer initially identified an anomaly in a production KPI metric, developed a hypothesis as to the cause of the anomaly, added instrumentation to the code in question and finally confirmed the original hypothesis through observation of real-time metrics. Along the way we’ll include references to specific tools and best practices that developers can adopt in their own MDD efforts. We’ll also demonstrate that MDD does not replace traditional debugging approaches like request logging or code profiling, but can often help narrow the focus of those efforts, which can be expensive or difficult to perform in web-scale systems.
This talk is a synthesis of cultural transformation, concrete engineering techniques, systems monitoring, scientific observation, and post-mortem. It will prove intellectually gratifying and valuable to anyone who is writing and shipping code to production systems, even if they are already following an MDD model. They’ll learn what requirements a metrics platform needs to support MDD, how to add lightweight instrumentation to code, and how to isolate problems by using metrics derived from that instrumentation. The audience will also see how MDD can be used in addition to traditional production debugging practices, and will come away with an understanding of how to ship better software through the use of MDD.