Code is code

With the rise of tooling to generate code, there has been a lot of debate on how to review it and where the limit is (Pull Request size, scope, tests, …).
Generated code is still code: it should be reviewed, tested, and maintained the same way.

Code ownership

In my day job I mostly review code and write far less than before. That means I have not used the new tools (Claude, Codex, …) to their full extent. Some colleagues have automated most of their workflow, from taking a ticket to creating specifications and then coding, up to opening the Pull Request.
What I focus on here is the code that gets written.

I will not go into details about the ownership of code written by an LLM (Large Language Model), because that has already been written about a lot and I do not have the expertise.

At the beginning of my career, my first job involved the Extreme Programming (XP) methodology, and one idea that stayed with me was Collective Code Ownership: everybody can make changes and is encouraged to.
Pushing the concept a bit further, code belongs to no one, or to everybody. Nobody is to blame for a bug or designated as responsible for it. There are of course areas of expertise, but the source of the code matters less than its quality and maintainability.

The older I become, the more I care about it (or do not care about who or what actually wrote it): the source of the code matters less than its quality and maintainability.

Code maintenance

All those new tools make writing code much easier, and it is acceptable code, but:

  • it tends to generate far too much, even with specific rules to restrain it
  • it often tries to add some backward compatibility when none is needed
  • it generates lots of tests, some of them are not meaningful

I had an example of a Pull Request where Codex generated lots of tests for a small Python script. Some of them were just validating the arguments of the program by invoking it. This was overkill and not useful. Tests are also code that needs to be run and maintained. Useless tests increase cognitive load and CI time.

It is now far easier to create a big Pull Request that “does everything”, or “does too much”. But this is still code that will go through the same lifecycle:

  • review
  • testing + acceptance
  • production

The more there is, the more the subtle details and edge cases are hidden in the big mess of text, and because it has been generated, those small details were not carefully written by an engineer and are hard for the reviewer to pinpoint.
It may eventually fail randomly in production.

The good news is that LLMs also do an excellent job helping review code. That is how I use them now: as an entry point to navigate large Pull Requests and for focusing into specific, critical components.

Code in production

There are already quite a few articles about it, but code written by an LLM that ends in a production system may still hide unexpected surprises, because the code looked acceptable during review, and humans often skipped that part or went through it too fast.

When it fails, the team discovers some small incorrect behavior that is hidden deep in some random function, and that takes the whole system down. It then becomes a rush to understand it and fix it. It is even harder if some customers have started to depend on that behavior.

Code is just code

When reviewing, I try to be open-minded about the code. Consistency is usually more important than fancy stuff, and I focus on the behavior, complexity, and maintenance cost.
Eventually, code written by a human, auto-generated, or written by an LLM is the same: it ends in the application and must be reviewed with the same rigor without considering its origin.

Code has no ego.