An often asked question in software - and one which rarely gets a correct answer is simply, “When will this be done?” This is such a difficult question to answer that people like Woody Zuill tout No Estimates as a principle - and he’s not entirely wrong. Estimates are practicaly impossible - and often when teams are held accountable for educated guesses on delivery, it’s a short path to blame and padding. In fact, I’ve led no-estimates projects, and they went great…but there’s a challenge that I didn’t find a workaround for - stakeholders (dependent teams, leadership, etc.) want to know (fairly) when something is going to be done.
We need magic.
The Science of Magic
Over fifteen years ago, a team member introduced me to Douglas Hubbard’s How To Measure Anything, and it’s something I’ve re-read quite a bit over the years. Early in the book, he sums up my challenges succintly.
Management cares about measurements because measurements inform uncertain decisions
For any decision or set of decisions, there are a large combination of things to measure and ways to measure them - but perfect certainty is rarely a realistic option
Therefore, management needs a method to analyze options for reducing uncertainty about decisions
Hubbard goes on to discuss estimation via confidence levels as a method of expressing uncertainty, and suggests that for any question around uncertainty, that we probably already know enough or have enough data already to answer, if we estimate a confidence level. Consider these questions:
How Many Lakes are in Minnesota?
What's the population of South America (more examples here)?
I could not answer the questions with certainty, but I could answer with a confidence level - meaning that I don't know the answer, but I can estimate the uncertainty.
Minnesota is known as the land of 10,000 lakes, but I don't know if that's literal or figurative. It probably has at least 5000 - but may have as many as 20,000. It could even have more or less, but I have an 80% (p80) confidence it's between those numbers.South America has some huge cities. I also know that the US has about 350 million people, so I expect South America has at least that many people, but I don't think it has a billion, therefore, I estimate (p80) that the population of SA is between 350 million and a billion
At the time I first read HTMA, most of Microsoft was still shipping things on yearly - or bi-yearly schedules, and I was using the concepts to help measure different quality aspects, so I didn’t think of applying any of this to estimates for many more years.
The Sweet Spot
In more recent years, I’ve studied everything I could about agile estimation (I guess I have an addiction for hard problems), and discovered the writing and books of Mike Cohn and have been a big fan. In addition to reading all of his books, I’ve probably watched a huge number of Mike Cohn talks. Somewhere in the back of my head - or maybe it’s just a dream, I swear at one time I read or heard Mike talk about estimating with a 50% confidence level (picking a date that the team has a 50% confidence level in). Today, I can find no reference of this anywhere, but someone in a previous job had similar thoughts, and I had my teams give p50 estimates for many years.
In practice, delivery estimates with a 50% confidence level work really well - I’ve found that p50 is an absolute sweet spot between overplanning, and taking off before you know what you’re doing. It’s an estimate based on the knowledge that you have - but purposely, not all of the knowledge. I’m going to estimate (with an 80% confidence level) that my teams shipped between 500 and 800 projects (my definition for project was a deliverable that had a customer impact), all with p50 dates. Your mileage may vary.
The less Sweet Spot
There are caveats with p50 estimates. The obvious one is that nearly all of your estimates are going to be wrong - especially at first. Hubbard talks a lot in HTMA about calibration - that you have to estimate a lot with confidence levels before you get good at it. Because p50 dates are wrong a lot, a team needs psychological safety to pull them off - when a date moves, there should be no blame - but teams should be accountable for learning every time a date is moved (that’s how calibration works).
The other challenge is with stakeholders. If you tell them something is going to be done on June second - but that you only have 50% confidence in that date, they get antsy. It takes some work to get everyone involved to a state where confidence levels inspire…confidence, but from experience, I can tell you it’s possible.
Possible, but not always optimal.
Back to Hubbard
Hubbard mentions Four Useful Measurement Assumptions
Your problem is not as unique as you think
You have more data than you think
You need less data than you think
An adequate amount of new data is more accessible than you think.
These apply directly to estimating software delivery.
The thing we’re making (or the part of the thing) has probably been done before
We probably already know enough about building it to know sort of how long it will take
We could get more data, but it probably wouldn’t help us that much in the long run
If we do run into unknown unknowns, we will be able to learn and adapt quickly.
In my experience when teams have to deliver on dates, they will spend too much time planning, and not enough time learning (and they’ll pad estimates and cut corners). They’re too afraid of being wrong (not hitting their date) to deliver efficiently.
The New Magic
I think there’s an even better way, and it’s right out of Hubbard. Instead of estimating a single date with a 50% confidence level, what about estimating a range of dates at an 80% or 90% confidence level?
Need to deliver a new menuing system - I may have a 80% confidence level that it will be done in between 2 and 4 months. If I need to add a new widget to our home page, I have an 80% confidence level it will be done in 1-2 days.
It’s important to note that the range isn't wild guess - it comes from the knowledge the team has already about what they’re building balanced with known risks. For the menuing system example, that date range should be accompanied by a list of known risks - and if you squint a little, the front end of that estimate is close to the date that we’d hit if few to none of those risks occurred, and the later date is the date we’d hit if most to all of the risks occurred. It’s not magic (or science) - it’s an art that helps to tell a story.
End(?) With Why?
As I think about it, there are a few things I like a lot about this proposed method of estimation. First, it gives us a consistent way to communicate what outcomes teams are generating, and when they’re going to happen. Second (but more importantly), it gives team an opportunity to practice accountability and gives teams permission to "fail" when they need to adjust estimates (even with 80% confidence, some things are going to slide).
I’m sure I will hear opinions. I can’t wait.
- A 5:6
In my short experience I find the estimation process incredibly arbitrary. I work for a software agency (though today is my last day before moving on) and it’s crazy to see how estimations differ between projects and clients.
One project that stands out to me right now is one which faced huge amounts of unknown unknowns, yet none of the estimations changed. Granted the estimations were for FE and BE work which was broken down to a coarse granular level, and that work will still exist, but the resolution of the unknowns resulted in additional work items being made that would either augment or invalidate other work items. I wonder if that’s just that particular client however as they’re very favourable of the vanity metrics on Jira.
I wasn't familiar with Hubbard, but I'm reminded of my experience.
In my first management role, I was on a team with a big estimation problem. There were many unknowns and it was frustratingly difficult for the team to know when they would be done. On the other hand, I was impressed with how much we had to be right. The Marketing folks needed to buy ad space in advance. The Sales team needed to plan their activities and reach out to major customers. The Manufacturing team had to plan when to buy parts inventory and ramp up production. None of these stakeholders would be satisfied with the engineers saying "It's too hard to say."
Instead of asking the teams for estimates and holding them accountable for meeting them, we asked the teams for ranges of dates that they felt comfortable with. When will you be done if things go well? When will you be done if there are unforeseen problems? We didn't try to measure the confidence to 50% or 80%. We just let each team determine their own range. Some teams were much more confident in their abilities to estimate their work. They had done similar things before. They had narrower ranges. Other teams working on newer features were more wary about what could go wrong. They had larger ranges. The project managers had to think differently, but they were able to pick a ship date with an acceptable level of risk.
The estimates could have been better, but that would have taken more time. That time would be better spent doing the work and learning from what went wrong than doing more planning to anticipate better what might go wrong.
The overall project was large - months, not days. So, the next step was to have the teams think about indicators in their work. How can we tell now whether we're tracking toward the short end of the timeline or the long end of the timeline? The closer they get to done, the more confidence they should have in the estimated dates.
Regularly looking at the indicators and reassessing the risk allowed us to make re-scoping changes to the project plan necessary to reduce the risk. Should we move a component from one team to another that had similar skills and was running ahead of schedule? Should we cut a feature with an unacceptable level of risk? Should we skip some "low-priority" testing? (I know, I know…) Should we switch to the simpler design that was rejected because of issues less important than shipping on time? It's critical that these discussions are blameless and focused on delivering the whole product. But they're exactly the discussions that were needed.