4 Comments

I avoided the trap of blaming CS’s lack of testing or even Gary himself. I do wonder what their rollout strategy is (or was) and as you pointed out, what systems failed that allowed this change to become problematic in the wild.

In the end I hope the postmortem is publicly available so we can all learn a bit more.

Expand full comment

Clearly something happened where all of the holes in the Swiss cheese lined up to let this slip through. Human nature is such that when something works flawlessly over time, complacency sets in. My personal speculation (which is worth nothing) is that *somebody* (or somebodies) were in the middle of summer vacation and "Gary" was the poor schmuck who had to do this rollout. Start with a latent weakness in the system (that the vacationers knew to avoid), add a dash of inadequate training and lack of work shadowing, and maybe lack of sleep on Gary's part and ... boom.

Expand full comment
author

yes. I should have called out the book Drift Into Failure - which points out that most errors like this occur because of a long sequence of minor errors align in a way that enables the bigger error to occur (or, like you say, the holes in the swiss cheese align).

Expand full comment