Continuous Software Engineering: Part 4 - Continuous Experimentation & Measurement

Dec 11, 2024

In the previous post, we explored continuous development and quality, looking at development practices such as CI/CD, continuous pull requests, TDDs, and quality assurance practices such as dogfooding and automated testing. All those practices enable teams to build resilient software according to Continuous Software Engineering (CSE) principles (see paper)

Now, we will focus on two other key pillars of the CSE framework: Continuous Experimentation and Measurement. These areas ensure that what is built is informed by data, helping teams drive rational product decisions and measuring team health objectively.

Continuous Experimentation

At the core of Continuous Experimentation is the notion of experiments in the context of product feature launches.

Experimentation is a process where we want to observe - in a data-driven manner- the behavior of a subset of users applied to an experiment. The process starts with hypotheses usually driven by Product Managers. Each hypothesis has a primary metric associated with it, most likely used in defining a KR (see this post to know more about OKRs) and an initiative we want to launch, whose output will affect the primary and possibly secondary metrics.

One of the most used ways to test a hypothesis is through A/B testing. A/B Testing allows teams to test hypotheses by comparing one or multiple treatments - the hypothesis we want to test with possible variants - against control - the current version of the product. For instance, if we are going to optimize user engagement on a landing page, we might test two versions of the landing page with a subset of users (under two treatments) compared with the current version (under control) and measure which version results in a higher engagement rate.

Engineers, product managers, and data scientists work together to define the test, implement it, and analyze the data. Usually, engineering takes the lead in the implementation, whereas product & data lead in the definition and analysis. However, engineers and engineering leaders need to lean in all the stages, specifically during the data analysis - commonly called readouts.

Continuous Experimentation is about constantly considering hypotheses and running experiments, making data-driven decisions, and learning from them. A specific result of an experiment may validate the current strategy or affect how the strategy needs to change.

An example of a process in this category I have seen in multiple companies is an Experiment Review forum. This type of forum is used to discuss ongoing tests that haven’t yet been fully rolled out to all the users. For each experiment, a data scientist, an engineer, or a product goes deep into the results related to an A/B test. A document is usually shared ahead of the meeting with the critical data, links, or screenshots to the charts of primary/secondary/guardrail metrics. Collectively, the team discusses the results, and the decision maker (typically a product manager or an engineering leader) will drive the decision most objectively. The decision could be to abandon the experiment (in case of negative results) or to roll it out to more users or all users (in case of positive results).

Continuous Measurement

Continuous Measurement is about constantly tracking the metrics associated with products, engineering health, and overall processes.

Broadly, these metrics can be divided into three types:

Business Metrics – These metrics directly reflect the product’s impact on the business. Business metrics can include engagement rates, user acquisition amount, retention rates, revenue per user, etc. By measuring these, teams understand how their work contributes to larger organizational goals; therefore, engineering leaders must raise awareness about those metrics with their teams.
(Engineering) Operational Metrics – These metrics focus on the systems' stability, reliability, scalability, performance, etc. Operational metrics include system uptime, error rates, latency, incident response times, etc. They give teams a view of the tech stack health - typically, teams will have monitoring and alerting for those metrics.
Execution Metrics – These metrics track the productivity and efficiency of engineering teams. Examples include sprint velocity, code merge frequency, number of PRs, and build-deployment times. Execution metrics should help engineering leaders find areas for improvement in terms of efficiency; for instance, a downward trend in the number of PRs could indicate a bottleneck.

Engineering leaders should constantly monitor those metrics and set an example for the team, fostering a data-driven mindset.

For example, regarding engineering operational metrics, a process I experienced at a few companies is having a Weekly Operational Metric Review forum. In this session, each engineering leader will present noticeable regressions/progressions for the key metrics of a specific area. For instance, a Messaging engineering lead might show an ongoing regression with the send message performance metric in a particular week. She will present the details and root causes for this issue and follow up with remediation items the following week. This forum pushes accountability from each engineering team to stay on top of engineering health metrics.

Continuous Experimentation and Measurement are fundamental areas in the CSE framework that align software development with business impact. Experimentation encourages teams to make data-driven decisions, validating hypotheses through A/B tests. Measurement drives teams to focus on data, constantly monitoring the business's health, tech stack, and execution.

The Healthy Engineering Leader

Discussion about this post