I came across a LinkedIn post a few weeks (maybe? time is a blur) from Mike Vermeer where he made the simple statement:
Teams should aim for zero bugs in the backlog.
Then I was surprised-not-surprised that many of the commenters were confused about what this meant, and in defining bugs or quality - which isn’t the point of having zero bugs at all. The not-so-secret-secret is that all software ships with known bugs. Zero bugs in your backlog doesn’t mean that your product doesn’t have any bugs - it means that you “deal” with every single bug as it is discovered - either by fixing it, or choosing not to fix it.
No Time to Fix Bugs?
This should be a familiar story if you’ve been in software for a while. I worked on products at Microsoft with “feature complete” deadlines and “stabilization” periods in the schedule specifically for teams to fix bugs (or enough bugs that we felt better about shipping). For reasons that baffle me in hindsight, we too often separated building features and software from fixing the bugs we created along the way.
There’s a story (legend?) that used to circulate about an Office developer in the ‘90s who was rushing to complete a feature that would calculate some obscure attribute of a font. They were too busy creating other buggy features, and didn’t even start this functionality until the “feature complete” date - an arbitrary date where “new code” ended, and the remainder of the cycle was spent on bug fixing. There was a dilemma - either stay up all night making this feature work, or hang out with some friends watching Reservoir Dogs and eating pizza. Rumor is that the developer checked in the feature on time, and had time for their friends.
float CalculateSomeObscureFontFeature(HFONT font)
{
return 1.0f;
}
Then, with that bit of functional, but more-than-slightly-buggy code in place it was movie time. They could fix the “broken” functionality during bug-fixing time.
Later?
You may also think, “even though I’m not going to fix the bug right now, surely I will want to fix it later” - so it must be dutifully logged.
Why do we lie to ourselves?
Back in the old days, as release dates approached, we’d defer huge amounts of bugs to the next release, assuring ourselves we’d fix them eventually. Many teams at Microsoft would kick off a new release with a “milestone zero” where they’d try to address bugs and other tech debt - but they usually didn’t. When I worked on Windows, for example, there were hundreds of bugs that were around for multiple versions of windows - some even surviving migrations between different bug systems.
What’s the Problem?
Let’s talk about what happens with those 200 (or 2000) bugs you keep in your bug backlog so you can fix them “when you have time”. Those 200 bugs get reviewed frequently by various members of the team, prioritized, reassigned, updated, and reviewed again. That’s a lot of total time for a bug that’s realistically not getting fixed ever. And you thought status meetings were an expensive waste of time!?
Instead, close them all. Every single one of them. They are a waste of time. If they were bugs that really needed to be fixed, they would have been fixed already.
The Policy
Here’s the simple policy that should work for everyone.
If it’s a bug worth fixing, stop and fix it now.
That’s it. Implied in that statement is that if a bug is not worth fixing you don’t fix it. It’s a decision (we’ve talked about those before). You can choose to continue implementing a highly requested feature and delay fixing a bug (which will probably increase the cost of fixing it), or you fix the bug right now, and potentially disappoint customers who think that you're too slow at delivering the feature they want just because you wanted them to avoid a bug. To me, that decision is clear. If it’s worth fixing, fix it now. Otherwise, it doesn’t matter. Close it.
Or don’t even open it.
To Log or Not to Log
A lot of teams don’t bother with a bug database at all. If they find a bug worth fixing, they just fix it. No need to waste time putting details about it into a system. I think for some teams working on some products, that recording some details as bugs are found may be helpful - but I think that most teams don’t need a bug database at all.
I’m not going to argue or judge you if you decide you need to track details on every bug - it’s inefficient in most cases, but you do you. I do, however, reserve my right to judge you if you are wasting multiple hours of person time every week reviewing old bugs that you’re not going to fix. As my Meyers-Briggs horoscope tells me, I abhor inefficiency.
The Purge
If you’re not already typing your hate mail to me, and you’re sort-of-onboard with the idea of a zero bug backlog, you may be wondering about the bugs you have now.
Close them all.
If you’re worried there are some gems in there that need to be fixed, you’re probably wrong. If you’re still worried, go ahead and triage them one last time - but for any bug you think needs to be fixed, commit to fixing it NOW, before you do anything else. Otherwise, you’re lying to yourself yet again. At the very least, set a deadline for when your team will have a zero bug backlog.
Or save yourself the trouble and close them all.
ABA
If you’ve somehow read this far and still think you’re an exception to needing a zero bug backlog, I’ll share a metric that I’ve found to be a pretty valuable indication to product health in an organization. Average Bug Age (ABA).
Average Bug Age, in the short term, measures commitment to reaching a Zero Bug world. Long term, ABA is a measure of feature maintainability (how long it takes to fix a bug in a code base). On a team maintaining a zero bug backlog, this time is measured in hours. But even if you are keeping bugs around for a bit longer, ABA - and the trend of ABA is (IME) a strong indicator of code health, as not only is it more efficient to fix bugs as they’re found, but it’s important that bugs are easy to diagnose and fix in your code base. I’ll spare you the numerous stories of me taking days or weeks to diagnose and fix bugs in Windows code.
One final note on this metric is that it’s one of the few metrics I find useful where average is much better than median. For a lot of measurements (e.g. page load time), median or p90 is vastly preferred over average, as the outliers can throw off the metric. In the case of ABA, we want the outliers in the equation.
This another of those posts where some people will nod their head and ask what’s new, and others will think I am an idiot. I don’t expect that level of divisiveness this time, but as always, I’m happy to answer sensible questions and engage here, or wherever you prefer to discuss.
I started a QA team for a small company. last year. We don't have a standing bug backlog.
The main reason for this is that we're a QA team not a DA team, as in Defect Assurance.
Our focus is on preventing bugs
by advertising upfront to the team what will be tested. We want to keep test cycles fast, reliable and passing 100%.
You've addressed the why behind this process but not the how. There always will be dissenters who refuse to take this action and will make up reasons why they need all their bugs floating around. How do you address this? And how do you get to the root of why engineering leaders are often very afraid to take this step? Most of them aren't forthcoming about their real reasons. Often it's just fear.