Call It What You Want

a small dump of thought about devops, sre, platforms, and devex

May 13, 2023

I have spent a lot of my career in testing circles - but I haven’t done nearly as much hands-on testing as most of my industry peers. I studied and taught testing a lot, however, which has helped me form solid and evidence-based opinions on when and where testing is helpful in software development. This is something that has accelerated my knowledge of software delivery, but frequently pushes the buttons of those other folks.

I moved to building systems to help testers and developers build quality software faster. I readily admit that I am far more interested in quality than I am in testing - and while that seems right, I know that it’s the opposite approach of many in testing.

I’ve spent a big chunk of my career building tools and systems that help people do their jobs better. I started with test tools and systems - including a test distribution system I described in chapter 8 of Beautiful Testing. Tools for testers quickly expanded to tools for the whole team. I wrote a code check-in / preflight system before I had ever heard the term Continuous Integration (CI). I began leading teams building these tools, and had a bit of an epiphany when I realized that in building CI, monitoring, and deployment systems that I had so much more influence over quality than I ever did as a tester.

Epiphany

Somewhere in my growth and evolution, The Phoenix Project was released - along with The DevOps Handbook, and DevOps became a “thing”. The idea behind DevOps was to get rid of the wall that existed between development and operations teams - or to really get rid of the walls everywhere in software development. Over time, for better or for worse, the DevOps culture transitioned at many companies, to a DevOps team (who often do Operations work handed to them from a development team - uggh).

More often though, the teams that were unfortunately called “DevOps Teams” worked on the base level infrastructure and tools used by development teams to deliver quickly and safely. While I still don’t like calling these teams “DevOps” (DevOps is a culture, not a role), I am now ok to squint at it and think DevOps Enablement?!?) team and not hate it so much. As a(n important) aside, I can’t be the only person completely annoyed by DevTestSecPMPerfOps adaptations. Since DevOps means no walls anywhere. From now on, I humbly ask that you think of ∞Ops. All that said, these days, I see these teams referred to much more often as Platform Teams, Infrastructure teams, or Developer/Engineering Experience Teams, which I think is more suited.

The Platform Evolution

Over the years, I’ve noticed some patterns in how teams adopt a culture of DevOps. I don’t yet know if these are widespread, so I’m sharing here so people can tell me if I’m wrong.

Decentralized - The Wild West

When teams or organizations first begin to move to more frequent delivery or using the cloud, it almost always starts decentralized. Everyone tries to figure out the best way to move their business forward. For build systems, one team is using Jenkins, another team is using Circle CI, and another team is using some shell scripts that Brent wrote over the weekend. When you’re small and learning, I think this is totally fine. You don’t yet know enough to make any sort of mandates on tool X or tool Y, and it’s faster (for now) to just go-go-go.

But - before you know it, there are five different CI tools, seven different logging tools, nine alerting systems (including one that uses a python script to talk to Salesforce), and you’re in four different public clouds. A thousand flowers bloomed, and everyone has allergies.

Centralized - There Must Be Order

Then, someone has a moment of clarity and notices that even though things started off great, now that the team is much bigger and everyone is still doing their own thing, shit is getting wasteful (they’re right), and that it may be time to centrally align on tools and processes (they’re a little right).

So - they move platform and infrastructure work to a central organization and slowly begin moving teams towards a central set of tools. I am willing to bet that anyone in any sort of devex, platform, or engineering role is in the middle of some sort of migration right now. In fact, when I interview folks for roles in my teams, I always ask for details about the last migration they did. This gives a lot of insight into how they deal with the non-technical parts of the migration - which are often the most difficult.

Eventually, you will get to a stage where most teams use most of the tooling and processes defined by the central organization. IMO, the absolute wrong thing to do here is to try and get every team using all of the tooling and processes. Instead, it’s time to let go.

I’m not sure where I first heard of the paved-road approach at Netflix, but I’m almost positive it was well before this 2017 presentation, but I’ll use it as a reference anyway.

The Paved Road is …

A concept, formalizing a set of expectations and commitments between the centralized teams and our engineering customers

While this sounds a whole lot like full centralization, the presentation goes on to say that the paved road is not mandatory. In fact, I like to extend the metaphor by saying that my teams provide the paved road because it’s easier, and has less risk. However, if a team needs to go on a dirt road (or a hiking trail), that’s fine. It’s their business, and they are accepting the risk (and probably for a good reason that’s far outside of my mandate).

Federated - This is the Way

A centralized team building paved roads accelerates the rest of the organization. In fact, if you look at the key metrics (aka DORA metrics) from Accelerate:

Deployment Frequency—How often an organization successfully releases to production
Lead Time for Changes—The amount of time it takes a commit to get into production
Change Failure Rate—The percentage of deployments causing a failure in production
Time to Restore Service—How long it takes an organization to recover from a failure in production

…centralization and paved roads have a huge influence on all of those.

But it’s not always enough. While the centralized team enables development teams to focus on product, and not infrastructure, inevitably, they’ll get to a point where they need to invest a small bit of their development team into specialized bits of infrastructure unique to their team. While this may feel like a return to decentralization, it is (in my experience) actually a really cool thing. These small pockets of embedded experts work with the central team - and essentially function as a highly specialized extension of a central team to further accelerate the ability of teams to deliver high quality products more quickly. I’ve called this a “hub and spoke” model quite a bit, but I’ve recently settled on calling this a Federated model of infra development (likely because I’ve spent way too much time reading about politics over the last few years).

What’s In a Name

There was a time when I worried a lot about what to call the central teams I’ve described here. As I mentioned above, I don’t like calling a team DevOps - but now I don’t care. I have extremely strong opinions on what SRE do - but I also realize that term too has been used for many different things. I have finally reached a state of Zen where I just don’t care. Teams like these exist to accelerate the business, and it doesn’t matter what they’re called - just that they’re effective.

Call it what you want.

-A