The Measurement Mess

thirty years in software, and people still don't know what to measure

Jun 29, 2024

My very first conference talk ever was on metrics. It was probably 2000 or 2001, and it was a mediocre at best presentation with too many bullet points and not enough entertainment. Despite my inexperience, the point was clear - measure things that matter.

I spent a lot of time over the years since then working on various metrics programs - sometimes shutting them down, but mostly, working with teams on how to use metrics to answer questions.

Measure what Matters

The simple yet often ignored advice on measurement is to measure things that truly matter. Many organizations fall into the trap of measuring whatever is easy, hoping that interesting insights will emerge. I call these 'jeopardy metrics' because they provide answers without clearly defined questions.

I wrote an article just short of twenty years ago on this subject (it’s not horrible if you want to read it). I wrote about the problem of measuring the wrong things and talked about the Goal-Question-Metric (GQM) approach from Victor Basili. I had a chance to work with Dr. Basili briefly around that time as well while working on a project with Microsoft Research that helped me iron out a lot of my opinions on measurement.

GQM is pretty straightforward in concept (from the above linked article):

Generate a set of goals based on the needs of the organization or business.
Develop a set of questions that will let us know whether we are meeting our goals.
Develop a set of metrics that provides answers to these questions.

If your goal is, for example, to increase developer productivity by 20%, first, ask yourself how you would know that developer productivity has increased. This should be a long list, and could include things like more code check-ins, reduction in blocking issues or friction points, tasks completed, developer perception of velocity, etc. Then, put some measurements in place for those things (creating measurements where needed), and see if they tell the story. If not, re-brainstorm the goals, the questions, and the metrics and try again.

Isn’t that…

Many years later, I read Andy Grove’s High Output Management, where he discusses OKRs (Objectives and Key Results). There’s a lot of overlap and similarities in these two metrics approaches, but I’ve drifted towards OKRs since I like the emphasis they put on creating ambitious objectives - I’ve found that these “big rocks” are a better way to align the team in a direction - e.g. rather than a goal of improving dev productivity by 20%, my OKR style objective may be to have the most productive development organization in the industry. My metrics (key results) may end up being similar, but I have a stronger North Star to shoot for.

For those of you who haven’t read the back catalog - I have more thoughts on OKRs here.

Big Rocks

Alan Page

February 25, 2024

Someone surely has claimed or discussed this phenomenom before. It’s why people who claim Agile doesn’t work aren’t actually following (m)any rules of Agile. It’s why people who have never used story points don’t like story points, it’s why the term “AI” means seven different things to five different people, and it’s why OKRs are a disaster…

Read full story

This Shit’s Hard

The easiest (and thus most common) metrics I’ve seen teams adopt come from simply copying what someone else has done. I read that <some org> measures bones in mayonaise, so we should measure bones in mayonaise. Today, the DORA metrics (deployment frequency, lead time, fail rate, and time to recovery) are massively popular - and they should be, because they’re based on extensive research. But if you read about the research, you would also discover that these were metrics (measured via survey - not dashboards) that had a correlation with high performing teams. While these metrics are indicators of a high performing team, driving the metric up will not necessarily make a team high performing. Goodheart’s law says, “When a measure becomes a target, it ceases to be a good measure". I think a lot of software teams have fallen into the trap of trying to improve their DORA metrics rather than improve the performance of their teams.

Focus on the goal - not the metric.

More often than I’d like, teams measure stuff because they can measure it (jeopardy metrics), but then make it worse by asking the wrong questions or ignoring the questions entirely.

The Solutions(s)

I’ve told this story about my journey in testing a few times. When I first read a book on software testing, I thought I had it all figured out. Then I read a second book and realized that I didn’t. After I read my third, fourth and more books, I began to form my own opinions.

In my experience, sometimes hard problems require hard research. The internet, unfortunately, is full of bad advice on metrics, so you’re going to have to sift through what works, what doesn’t work, and why.

Weinberg taught me a lot of the foundation of software metrics research in his four volume set on Quality Software Management. From Volume 1 on Systems thinking to Volume 4 on Anticipating Change, these were the books that gave me a broad and confident base to build and release software that I knew was solid. The emphasis on systems thinking and ensuring that metrics are meaningful, reliable, and support continuous improvement is something that’s bled into a lot I have done.

Two Paths

There’s a more interesting conundrum in metrics. In decades of measuring things, there are two truths I’ve discovered that can both contradict and compliment each other.

You’re probably measuring the wrong thing.
You probably already have the metrics you need.

People are lazy with metrics. They pick the things they easily have access to - bug counts, code coverage, and test pass rates are indeed all things that can be measured, but none are very interesting as an organizational metric. Bug counts are a measurement of nothing else than how many bugs your team are reporting; code coverage only tells you how much of your code is completely untested or unreachable; and test pass rates…are mostly useless.

Don’t play jeopardy - start with the questions you want to answer, and then figure out what you need to measure in order to answer those questions.

On the other end of the spectrum are metrics “programs” where teams make long lists of new things to measure. In my experience, programs like this are mostly useless busy work. Chances are that you can answer your questions about your product goals with things you’re already measuring. If you get to a point where you need to measure something new, that is the time to add new metrics to your portfolio.

It’s All The Same

Surprisingly (or maybe not?) the closing points of that twenty year old article are still pretty good advice today.

Don’t try to measure too much. Just because you can measure something doesn’t mean that you should.
Understand the goals of your project before you determine what to measure.
Once you determine the goals for your project, determine which metrics support these goals. Try to choose from existing metrics rather than defining new ones. The important point to note is that now you know why you are using each particular measurement.
Don’t let your metrics define the behavior of your team. If the metrics you have chosen can be modified without showing an increase or decrease in quality, either change the metrics or choose a set of relative metrics that cannot be manipulated.
Monitor the metrics throughout the project. Just as you measure the project to assess quality, you should measure the metrics program to define areas for improvement and identify trends you can use to provide better information to the team.

I’m taking a rare week off next week from these posts while I go for another long walk in the woods. As usual, thanks for reading my weekly brain dumps.

-A 11:6