The Last Time
a post about feedback loops and the "other" kind of testing
A lot of people who read my post, Don’t Blame Me didn’t read the second half where I talked about looking at customer usage patterns and fast feedback loops. Let’s dive in a bit deeper.
Feedback Loops and the Ways of DevOps
Feedback Loops. Feedback Loops. Feedback Loops.
The DevOps Handbook by Kim, Humble, et al describes the Three Ways of DevOps - The first is Flow (systems thinking and optimizing the system - worth mentioning here is one of my favorites, Principles of Product Development Flow (Reinertsen)). The second and third are unfortunately neglected or ignored too often. These are Feedback, and Experimentation & Continual Learning.
We want feedback loops to be as fast as possible. This guidance tells us if we’re doing the right thing - and if we aren’t, we adjust. For example, I’ve written background processes that detect code changes, and then automatically run tests against that changed code, alerting me if there is a test failure, a static analysis error, or a drop in code coverage. This fast (almost instant) feedback is critical and keeps me on track.
Continuous Integration (CI) systems exist so that when we check-in code, that a wide variety of other issues are detected and flagged as fast as possible. I wrote a CI system (I didn’t know what CI was at the time, so I called it a check-in system) in ~2002 or so. I didn’t know the concept of feedback loops either, but since our product took several hours to build, it gave feedback (by quickly rejecting code submissions that didn’t pass a specific set of validation criteria).
All of the above (and variations) can be referred to as the “inner feedback loop” of software development. All we’re getting feedback on is our internal quality - which is important, but only a baseline of what customers expect in quality software.
The “Outer Loop”
The other feedback loop we need - and the one that massively accelerates our ability to deliver high quality software is the loop that gets us feedback from customers. Back in the ‘90s, I worked on a few different versions of Windows. The feedback loops were long. Windows 95 shipped to a set of beta testers frequently (a small set weekly, if I recall), with milestone releases to larger audiences with (supposedly) higher quality. Those feedback loops were slow. In the best case (which I suppose could have been worse), a weekly beta tester would report a bug that was quickly fixed in the release that came out the next week - although turnarounds were rarely that fast. Feedback loops that take weeks - or longer are inefficient. We can do additional “inner loop” testing to mitigate the risk of longer feedback loops. In my experience, there’s a direct correlation between the speed of the outer/customer feedback loop and the need for dedicated test roles. If our feedback loops are slow, having a separate person or team do additional testing can be an effective mitigation strategy.
Someone on linkedin asked me about editors - implying whether there was an equivalence between developers & testers and authors & editors. An editor can't help me determine if a customer derives value from a book. Editors can supply some opinions on functional correctness, but in my experience working with editors on HWTSAM, all MS Press did was nag me about deadlines and ask me to supply more screenshots. I’m sure other editors are better, but editors have two purposes. The first is to help with functional correctness (grammar, structure, clarity, etc.). The second is (often) to act as a proxy and give feedback on the experience of the writing. Here too, I think static analysis tools (e.g. grammarly) can provide feedback on the “correctness” of my writing, but I think the experience is probably better evaluated by my readers. If I were to write another book, I’d release chapters weekly to a subset of beta-readers, take their feedback, and rewrite. I’d repeat - perhaps with larger audiences - until the feedback was “good enough” for an official release. I could also build measurements into a web site hosting my new book and note what phrases people highlight, or pages they seem to spend more time on, and get feedback even faster. If I didn’t have a way to get that feedback though, I’d rely much more on the editor for sure.
Short story is that no matter what I'm creating, I want feedback from my customers as fast as I possibly can.
We want fast feedback loops, and the best way to get fast feedback loops is to release to customers and then collect metrics and data that informs us on whether or not they are finding value. My stance is that most software can take much more advantage of these fast feedback loops.
The Other Kind of Testing
In Trustworthy Online Experiments, Kohavi et all say,
Features are built because teams believe they are useful, yet in many domains, most ideas fail to improve key metrics. Only one third of the ideas tested at Microsoft improved the metric(s) they were designed to improve.
Fareed Mosavat, Slack’s Director of Product and Lifecycle tweeted that with all of Slack’s experience, only 30% of monetization experiments show positive results.
As much as you think your new features and product are awesome, you will likely be wrong. More inner loop testing doesn’t help with that. Instead, design tests (experiments) that help you understand whether the improvement you’re trying to implement is making an improvement in something customers care about.
For example, let’s say that I’m adding a new formatting feature for a code editor. Adding this feature because “I think it’s cool” would be foolish. Instead, I have a hypothesis on what it will do for customers. For example, I expect that 20% of my customers will use this feature at least once per editing session. Then, I make sure my code collects the right data for me to know if I’ve met my measurement or not. Ideally, I want to tie these measurements to business outcomes or revenue, but in this case, I just want to attempt to understand if the feature seems useful for my customers.
In a perfect world, the metrics come back as expected, I pat myself on the back, and move on to the next feature. In reality (based on the data from Kohavi above), I’ll get it wrong. Let’s say I discover that 70% of users are trying my formatting feature in the first session, but only 5% use it in subsequent editing sessions. Something is up, but I don’t know what yet. Is it because they didn’t need to use my cool formatting feature more than once. One thing I can do, because I have other data I collect, is to poke around a little more. I spent a chunk of my career basically as a professional debugger, and I was, for a time, pretty good at it. To me, exploring data to understand what’s happening with customers is just like debugging. It’s insightful and informative.
In this (mostly fictitious) story, I discovered that while 70% of my users tried the new formatting feature, the most common command used after trying my feature was undo. They were happy to try my feature, but didn’t like what I did. Fortunately, I found this out within hours or days instead of sticking my customers with a feature they didn’t like for months or years.
A former colleague from Microsoft (Hi Seth) has been talking about this for at least a decade (and, IIRC, Seth worked with Ronny Kohavi for some time at Microsoft). At the time, we often referred to the metrics debugging I described above as Testing In Production. It’s an accurate name when you consider the definition of testing, but I don’t like using the term, because when a lot of people see the phrase they ask “Why would you cause your customers pain by making them test?” and insist that I’m demanding that customers test for functional correctness - which is wrong, but unfortunately somewhat expected.
The Past is the Future
In a presentation from over twelve years ago, I talked about some of this stuff. I couldn’t even find the deck, but fortunately, I found the slides hosted elsewhere. At the time, I talked about feedback loops - sort of, I don’t think I knew the term, and The DevOps Handbook was still four years away, but the gist of the presentation stands up today. Customers are a much better option for you to discover if you’re making the right product, and fast feedback loops lead to higher quality.
Thank you if you have read this far. Like I predicted in my last post, some of you will nod your head and wonder why I’m discussing basics, and the rest of you will think I’m the stupidest human on the planet.
But let me be honest. I’m not an inventor. None of what’s in this post, the last post, or in any other post is stuff I’m making up - it’s what I’m seeing. It’s what I’m reading about in books and articles, and what I see in practice - both in teams I work directly with and organizations where I don’t know a soul. I understand that you may not like it, or may disagree for your own reasons, but I’m just sharing observations. I hope that you can do with those what you will.
Finally, please subscribe and share to keep up on my random ramblings.
Thanks again for an insightful article. It has me thinking about my experience with a long outer loop and what we did to help mitigate that problem. I feel I should share more about it than I have time for now. I'll add it to my list and just stay with the "thanks" for now.