It Rhymes With Hate

my past and future infatuation with CI systems

Apr 13, 2024

I held a post back from my Five for Friday series over on angryweasel.com so I could write a bit about it here. This week, I read an article on the invention of modern CI, and it brought back some memories. I experimented with Fagan Inspections in 1999 (please forgive me). In 2001 I taught developers in my org how to write unit tests, where several of them asked me - “if we write theste tests, what will the test team do?”

But this quote from Kohsuke Kawaguchi - the inventor of (what would become) Jenkins gave me a case of whiplash head-nodding. He wrote a CI system because he:

“got tired of incurring the wrath of his team every time his code broke the build.”

Tools for Teams

For better or for worse, I’m known for having some testing knowledge. At a time in my career, I studied a lot of things about testing, taught a lot about testing, and even wrote more than most about testing. But I’ve really spent most of my career involved in writing tools that help teams get their job done. At first that was test teams, but as quality moved earlier, my efforts on tooling followed suit.

When I joined Microsoft, I was testing networking on windows 95. It was there where I learned about all of the fun and challenges with mixed byte character systems and now they caused problems. In my first few months, I wrote a tool that would convert double-byte characters to their per-byte hex values. I shared it with a few people, and before I knew it, a hundred people across the org were using it. Without the tool, it was difficult and time-consuming to try to figure out why a certain character in a string may not be working. For example, 太平洋 was failing in file names in an application. We needed a quick way to help diagnose and isolate the issue, so we used my tool to break down the bytes (and in this case, find a 0x5c ‘\’ character). Then to isolate the problem, we’d use my tool to generate a different character with a 0x5c trail byte - e.g. 十 and (almost inevitably) reproduce the problem.

As I said above - I learned a lot about testing, but my passion has always been building tools to help people do their jobs more easily and efficiently.

The Perils of the Past

From ~2000 to 2003, I was working on Windows CE. I was leading a test team - part of which were dedicated to writing tools and infrastructure.

If you’ve ever read Showstopper, you’ll know that in those days, most teams at Microsoft had a “Build Lab” that would be in charge of the nightly build and Basic Verification Tests (BVTs). Every afternoon, they’d kick off a build, nurse it to completion, and then run some basic tests to ensure that it was testable before “releasing” the build to the team for further work and dogfooding.

It generally worked, but sometimes it was a shitshow.

I remember getting paged (yes - with a pager) early one morning because some code I integrated from another team a week or so previously wasn’t building. I drove to work, marched to the build lab and took a look. Sure enough, the code wasn’t compiling (linking to be clear), but I pointed out that the code hadn’t changed in a week, and we had several successful builds since then. We stared and fought for a bit before we deleted cached files and rebuilt - and everything went fine.

More often though, people just broke stuff. Someone would check in code just before the “build lab cutoff”, and go home - or on vacation. The build lab would dissect the hundreds of check-ins from the day and try to figure out which ones broke the build or functionality. It was a big cringey mess.

Minor Improvements

As you can imagine, this was frustrating for the team, and I worked with the build team a lot to try and make their jobs easier. Our first attempt to ease the pain was to build a script that would run the BVTs locally before checkin. The tests were just slightly flaky enough when running on the hodgepodge of developer machines, that eventually the developers (at least those who remembered to run the BVTs) ignored any errors and checked in anyway. Or, often enough, the local BVTs would pass, but the developer would forget to check in a file, and break the build in a different way.

So Marty, an employee of mine and I talked about building a check-in system. Basically, instead of committing directly, you’d send your changes to our system (named “Slate”), and Slate would unpack your changes, build, test, and then if successful, submit on your behalf.

Except it was more complicated. We had over a hundred developers committing changes and a finite amount of compute, so we implemented batching and we would take ~10 or so changes at a time, and queue the rest. This solved the queuing problem, but in practice, it meant that one developers bad code would kickout the changes from 9 others. Those other 9 didn’t like this. At all. Marty used to get email with titles like, “Slate rhymes with Hate”. Our development team was close to revolting.

But then something …interesting happened. Those same developers who would check in crap code and let the build lab figure it out, felt much worse about pissing off their peers, so they began to do more due dilligence before submitting their code to Slate. They did more code review, and they ran more tests. Changes were rejected much less often. The build lab stopped paging people.

It gets better. Over time, Marty and I added more to Slate. It ran analysis tools and rejected code that introduced new errors. Developers wanted to add more checks to the tool. This tool that slowed developers and angered them initially, made them more productive within months. Those same developers who sent the Slate/Hate emails were wondering how they ever lived without a checkin tool.

Back to the Present

Things are much better today. Most teams have some flavor of CI with some amount of testing - but it’s certain that CI will continue to evolve.

I’ve always been a fan of code analysis tools. I’ve written tools to evaluate risk of a check-in based on complexity metrics, bug density and other factors, and I’ve written far too many syntax checkers to remember.

AI?

The article linked above imagines a bit how AI will aid the future of CI - and it will be interesting to see how generative AI is integrated into CI systems. An obvious starting point may be AI driven code review (which is undoubtedly a better choice than Fagan inspections). For example, consider the following Python method to count the occurrence of a word in a file.

def count_word_occurrences(filepath, search_word):
    """
    Count the number of times a specific word appears in a text file.
    
    :param filepath: str, the path to the text file.
    :param search_word: str, the word to search for in the file.
    :return: int, the count of how many times the search word appears.
    """
    count = 0
    try:
        with open(filepath, 'r') as file:
            for line in file:
                words = line.split()
                count += words.count(search_word)
    except FileNotFoundError:
        print("The file was not found.")
        return -1
    except Exception as e:
        print("An error occurred:", e)
        return -1
    return count

# Example usage
result = count_word_occurrences('example.txt', 'the')
print("The word 'the' appears", result, "times in the document.")

Rather than play a game of spot the bug, we can use AI to get feedback. In fact, if I ask CoPilot “what bugs are in this code”, it tells me:

The code provided seems to be correct in terms of syntax and logic. However, there are a few potential issues that might arise depending on the context:

Case Sensitivity: The count_word_occurrences function does not handle case sensitivity. If the search_word is 'the' and the text contains 'The', it will not count it.

Word Boundaries: The function does not consider word boundaries. If the search_word is 'the' and the text contains 'there', it will count it as an occurrence of 'the'.

Error Handling: The function prints an error message and returns -1 when an exception occurs. Depending on the context, it might be more appropriate to re-raise the exception or handle it differently.

File Encoding: The function assumes that the file is encoded in UTF-8. If the file is encoded in a different format, it might raise a UnicodeDecodeError.

File Path: The function does not check if the filepath is a valid file path. If an invalid file path is provided, it will raise a FileNotFoundError.

Return Value: The function returns -1 in case of an error. Depending on the context, it might be more appropriate to return None or raise an exception.

All valid feedback in a fraction of the time of a code review. But - as with feedback from an in-person review some may be able to be ignored or commented as needed.

Back to the Future?

Not sure what my old employee Marty is up to these days, but I hope he’s looking to the future vs. dwelling in the past. I can easily see a future where these code review comments generate squigglies or annotations just as compiler warnings do today. They could also come with a confidence level of how often they’d likely manifest as a bug - or better yet, with the AI trained on upstream and downstream usage, AI code review could tell you exactly what you’re breaking, as you’re breaking it - and maybe even suggest new tests to write based on that knowledge. Some of the tools I wrote to analyze risk would likely be even more accurate assisted by AI. Graphite is alreday experimenting with this, but I think there’s a lot of yet-untapped potential in this area.

It’s a fun time to be infatuated with CI (and productivity tools in general).

-A 1:1