Impact Over Productivity: Rethinking Engineering Metrics
This post is about Engineering Metrics, but it won’t tell you how to measure the productivity of your engineering teams. On the contrary, if you don’t have systems in place to identify low performers and mechanisms to move them out of your organization; if you’re trying to squeeze every hour of coding time out of your engineers; or if you think you can just “turn on” data collection and see improvements, an Engineering Metrics program is likely to fail from the start.
Answering a Basic Question
I started my own journey toward establishing an Engineering Metrics program years ago in an attempt to answer a basic question: How can I know if the Engineering organization I lead is best-in-class? And what does “best-in-class” even mean? Okay, so maybe that’s two questions, but there’s an important distinction here: if you’re trying to use metrics to answer questions like “is my team working enough” or “should people be writing more code?”, you’ll erode any trust you might have built and end up spending more time working against people gaming metrics than you will working with your team to improve performance. More on trust and metrics later. But my original questions, the very basic question you may also be asking, is surprisingly hard to answer.
Measuring in the Wrong Direction
Like any good Engineering leader, I was well aware of Google’s annual State of DevOps Report (DORA) and the four key metrics the research group espoused as being indicative of elite organizations. For the uninitiated among us, there are plenty of other articles explaining what DORA metrics are and how to measure them and I will leave researching them in detail as an activity for the reader. The first time I tried to institute an Engineering Metrics program, I built it around the DORA metrics. After all, if it was good enough for Google, it must be good enough for us, right? As it turns out, I found starting with DORA to be difficult for two reasons: First, I found instrumenting systems to reliably measure some of the DORA metrics, such as change failure rate, to be exceedingly difficult. Measuring a failed deployment is relatively easy, but that’s really just a signal of your automation tooling’s quality. If you want to measure a failed software change, I’ve found that you also need a way to reliably measure if that software failure is an escaped failure or a latent one and that differentiation tends to be more of a rule-of-thumb and a judgement call than it is something you can automatically measure. And if you can’t automatically measure it — if you rely on people to “mark it down” every time it happens — you’re unlikely to get strong, consistent measurement of any metric as your organization grows. Some organizations do have ways to measure this, though, but I found trying to solve for this with a brand new program tended to be too much inertia to overcome at the start. Setting aside the hard-to-measure metrics, though, even the easier metrics, like deployment frequency, revealed a second challenge with DORA: no one outside of engineering really cared. I believe this was ultimately due to the fact that DORA metrics are engineering metrics, not impact metrics, and Engineering’s ultimate responsibility is to positively impact the business. There was something missing in the first few metrics programs I started that failed to launch because we weren’t able to get broader buy-in from the business about the importance of measuring the things that more research than I could ever commit to the topic had shown was highly correlated with elite performance among companies. What was I doing wrong?
Measuring Where, Not How
DORA metrics are ultimately a measurement of how a team or organization delivers. Teams who want to improve deployment frequency or lead time should first introduce automation and reduce manual toil, for example. The missing link here is that my peers and other leaders were more interested in what we were delivering. Product had their roadmaps and the things they wanted to ship each quarter to move the needle on specific business metrics, and while we were delivering against that roadmap, we knew our time was also being spent in other areas — we just didn’t know how much. This made capacity planning a challenge. We tried fancy spreadsheets with people-to-hours formulas and we built in plenty of buffer for time off, unplanned work, and even some effort to “pay down tech debt”, but there was still a gap we couldn’t reconcile. Where was our time going? Eventually, this became my new single question to answer and is the basis around which a successful metrics program was started. I set out to answer the question “where is Engineering spending our time, and how can we shift those investments into the places we want to spend time as opposed to where we have to spend time?” Ok, again, that was two questions but if you noted that, at least I know you’re paying attention. Once we began to measure where teams were spending their time, the details of which I will leave for a future post, we were able to get a much better understanding of why our original approach to planning wasn’t capturing everything that was being worked on. I think this is an incredibly important story to tell for Engineering organizations, because while no sustainable organization is spending all of their time adding new features, it’s often the case that stakeholders outside of Engineering are focused only on the new features and enhancements being built. This makes sense, since these stakeholders are primarily focused on growing the business. However, what is often missed is acknowledgement that Engineering plays a critical role in running the business, as well — there is time invested in keeping software running even to minimally meet customer expectations. Once we had a better understanding of where we were spending our time, it allowed us to focus on improving Engineering impact. I prefer “impact” instead of “productivity” because I think the words we choose here matter. “Productivity” implies a machine-like focus on inputs and outputs. It also tends to suggest that we are measuring how hard a person is working, or how many hours they spend behind the keyboard. Instead, “impact” communicates that we trust people to do the right thing and that we are focused on leveraging those people to drive the best possible outcome for the business. By focusing on “impact,” you clarify that you’re measuring the outcomes of the Engineering organization as a whole on the business, and not the output of individuals. This focus on impact allowed us to shift the conversation around things like DORA metrics from being the goal towards being a tool we could use to measure the impact of changes after we’ve identified a specific challenge or opportunity.
Starting Your Metrics Program
How you start your Engineering Metrics program depends a lot on the size and culture of your organization. If your organization is large (I’d say anything over 50+ engineers), it’s best to start measuring metrics on a single team with a single goal and to scale from there. For example, if you’re looking to improve delivery predictability, find a relatively well-performing team and work together to categorize where that team is spending their time. Surprised by how much effort is going into getting software out to Production? Now you can use DORA metrics like lead time or deployment frequency to test changes to team process or to measure improvements from automation. As these metrics improve, you’ll see the time invested in Production deployments go down and an ability to reinvest that time elsewhere increase. Now you can use that success story to scale your metrics program to other teams across the organization. Again, the goal here is not “get to 10 deployments a day because that’s what Elite organizations do.” Instead, use deployment frequency to measure if changes, like additional automation or improved PR processes, impact the metric. If your organization is small enough (say, about 10 or fewer), you can take the same measurement approach but you’ll likely be able to just roll it out to everyone. While starting a metrics program with smaller teams may seem like a premature optimization, using metrics to improve team processes and deliver more business value takes time, and there’s a reasonably large barrier to overcome in building trust throughout the organization that metrics are being used the right way. I’ve found this much easier to accomplish in smaller teams, and it helps to set the culture early on of using metrics and data to measure and improve effectiveness. As you grow, that culture of measurement and trust in how that data is used is much easier to scale. You start hiring people who are data-driven and focused on continuous improvement, and a flywheel effect takes place. When you’re first starting to collect data, from investment areas to things like DORA metrics, start collecting passively at first. Don’t roll into a team retrospective waving your Metrics Bible and telling the team their lead time is too long. Listen to the challenges your teams raise in one-on-ones and retrospectives, and see if you can identify the metrics you can watch to validate whether or not changes on those teams are having an impact. Celebrate the wins publicly, and avoid using metrics to point out failures or things that aren’t moving in the right direction.
I’ve also found it helpful to identify champions on teams who are data-driven and interested in the topic of using metrics for continuous improvement. Teach them how to review the same metrics dashboards you’re looking at. Tell them what you’re looking for when reviewing this data, and ask what stands out to them. Ask what’s hard about getting work done, and then look together through the data to identify what metrics you can watch to measure improvement based on different initiatives. Finally, encourage them to suggest these initiatives and to advocate for using those metrics to monitor change. When teams are bought into using data to improve the way they get work done and the impact they have on the business, your metrics efforts are far more likely to be successful.
When we decided to fully invest in our metrics program at Paytient, it meant we needed to commit to collecting quality data. For us, this meant adding a new required field to our issue tracking system. There were two aspects of this change I underappreciated initially — first, I was worried that adding a new required field would introduce too much friction in our process and discourage the creation of issues, thus reducing overall visibility. This turned out not to be the case, I think largely because the issue creation process was already fairly lightweight so one more field didn’t meaningfully change that and because we focused on investment areas that were mutually exclusive and collectively exhaustive (MECE). Even still, the second thing I learned from this is that the definitions of the investment areas you choose needs regular, consistent reiteration. You want the categories to be clear enough that people don’t need to spend too much time thinking about which category a piece of work falls into, but that requires talking about the categories, providing examples of prior work and how that was categorized, and reviewing the categorization of future work regularly. Finally, remember that when it comes to categorizing work into investment areas, good enough is good enough. You’re looking for trends and patterns, not the precise allocation of every activity by every individual. Encourage your team to use their best judgement, and provide the examples and review opportunities to create a shared understanding.
Don’t Use Engineering Metrics in Isolation
Engineering Metrics, from investment areas to metrics like DORA, never tell the whole story. You can have the shortest lead time, lowest change failure rate, and spend 80% of your time building against the roadmap, but none of that necessarily means you’re helping to grow the business. You need to work with Product to combine your organization’s metrics with theirs to truly understand the impact the Engineering organization is making on the business. Deployment frequency doesn’t really matter if you keep deploying things your customers don’t want. Use investment area data to improve the product roadmap planning process. Knowing where your team has historically spent their time helps you and your partners in Product to be better informed about how much time you’ll have to invest in the roadmap in the future. And you can tell a much better story about how the time your teams spent improving documentation and application logging led to shorter incident resolution times and ultimately more time being available to build new features!
Benchmarking Internally and Externally
I think the hardest part of implementing a metrics program is figuring out what to do after you’ve started measuring things. If your teams spend 60% of their time on product roadmap work, is that enough? Should you be aiming for 80%? What if a team’s PR cycle time is two days — how much time and energy should you spend trying to shorten that even further? I’ve found benchmarking to be a useful tool in figuring out where to go after your first data points are collected. First, start benchmarking internally and only within the team you’re measuring; trying to benchmark across teams can be difficult due to different team compositions and areas of focus. Where has the team historically invested their time? Does it shift in a predictable pattern, such as spending more time on customer support and incident response after a major feature release? Is time spent deploying to Production steadily increasing over time? The goal here is not to compare teams or to pit them against each other. We’re using metrics for learning and improvement within a team, remember? If each team can make even marginal improvements, those aggregate across the organization.
That said, it’s helpful at an organizational level to understand how all teams are performing relative to your peers (or competition), and external benchmarking can be very helpful here. The challenge is where to get this data from, and the most reliable source I’ve found is the benchmark data available in Engineering Performance Management tools like Jellyfish, LinearB, etc. These tools have benchmark data that goes beyond just DORA metrics and includes things like investment allocations, sprint predictability, issue lifecycle, and more. Typically, you can segment the benchmark data by things like industry and company size to ensure you’re benchmarking against companies similar to your own. Investment allocations is where this benchmarking data can be especially useful in conversations with Product and other business stakeholders because it can help qualify the story you’re telling around striking a balance between growing the business versus running it.
When Metrics Won’t Help
Collecting and using metrics in-and-of-itself won’t solve any of your problems. If you’re trying to use metrics to identify low performers, don’t. If you lack the feedback mechanisms to identify these folks on your teams outside the context of Engineering metrics, you likely lack the psychological safety required to make any metrics program successful in the first place. It requires a significant amount of trust from the people doing the work you’re trying to measure to launch and scale a metrics program and if they think this data is going to be used to micromanage them, you’re sure to fail. Similarly, do not tie Engineering metrics to performance reviews or financial incentives like compensation or bonuses. These are the kinds of initiatives that fuel the horror stories of failed metrics programs and engineers gaming the system come from. Metrics should be used to measure engineering effectiveness, to diagnose problems, and to measure the impact of changes to the ways teams work.
Iterate Regularly
Once you’ve established an Engineering Metrics Program, it becomes simpler to answer all kinds of questions. But start small. Don’t overload your issue tracker with required fields in the hopes that you’ll one day need or want to slice and dice the data along those dimensions. You’ll be drowning in data at first, and will need time to acclimate to your newfound insights. From there you can start asking new questions and incrementally adding fields, labels, issue types, etc. to gain new insights. Some of the questions I’ve been able to answer by adding new dimensions after the launch of a successful program are things like “how much time do we spend doing work for specific customers?”, “what was the ROI of that engineering initiative”, and “how much planned versus unplanned work are we doing?” Look for ways to improve the process as you learn more about what data you have, what data you need, and how you can collect it. And remember: never stop improving!

