Large PRs are Bad

I heard a joke years ago that went something like this. When a developer gets a pull request for code review that’s 100 lines long, they will open the file(s), look at the code, and ensure standards are being followed. They might run some code, they’d walk through the algorithm in their mind, and spend 10, 20, or more minutes examining how the change is built. If you give them a code review of 1000 lines, they’ll just assume it’s great and approve it in seconds.

I know that many developers don’t find that funny. Often I meet people that think they’re professional and they spend the time needed to examine the changes and ensure they work. I’m sure that many people do this often, and some people are very meticulous about their evaluation of the change. However, everyone gets busy and I know that often depending on who submitted the PR and how busy the reviewers are, the joke does reflect reality. The longer the PR, the less likely someone will either a) spend the time to carefully review it, or b) catch small mistakes.

There have been a few large profile outages in 2023, one of which was an Azure DevOps outage in Brazil. I’m not picking on Microsoft as AWS, GitHub, GCP, and others have had issues. I know GitHub is part of Microsoft, but it’s also a separate enterprise that really runs on its own in many ways. The point is more that there will be issues, and some of these are related to the rapid changes of a DevOps or GitOps workflow where PRs aren’t always reviewed clearly and cleanly.

In this case, there was a typo in how a process worked. A cleanup process was supposed to delete databases in Azure, but the typo had it deleting the logical servers. Those had many databases, not just old ones that needed cleanup. This PR, however, contained a lot of changes, as there was an upgrade to swap out older Azure Manager packages with Resource Manager packages. I don’t know if the cleanup job was related here or included in a large PR, but in any case, the PR was reviewed as in the joke above. It was approved and things started failing.

This wasn’t caught in testing as there wasn’t great test coverage. You can say MS should have more tests, and they should, but there will never be enough test coverage. There also weren’t any systems in their ring 0 (first) deployment that triggered this typo, so no one realized there was an issue. Again, ring 0 systems might not be representative of larger rings. Another reality that we aren’t likely to fix in every situation.

Microsoft recovered the data, but it took a long time. I don’t know how easy or feasible it is to create smaller PRs with something like this when you are upgrading packages in many files. I just know that the time that I make large-scale changes in code, with large PRs, often I find some problem somewhere. Especially if there are changes that aren’t all the related. If too many different things get included, the potential for mistakes and problems goes up.

I think this is actually a good DevOps story. They ran their process and there was a mistake. They fixed it and have started to adjust their process to add more testing in this area and potentially ensure this doesn’t happen again. The logging helped them diagnose the issue quickly once it was reported. Their ability to deploy on-demand meant that once the problem was understood, a fix could be quickly deployed. That’s what DevOps is: it’s not perfect, but it does allow us to understand, learn, and adapt quickly.

Now we just need to ensure that humans use the process in a way that other humans can more easily understand, with smaller PRs.

Steve Jones

Listen to the podcast at Libsyn, Stitcher, Spotify, or iTunes.

Unknown's avatar

About way0utwest

Editor, SQLServerCentral
This entry was posted in Editorial and tagged . Bookmark the permalink.

1 Response to Large PRs are Bad

  1. 1) ACCEPTENCE, NOT PERFECTION: Nothing is perfect, you are always going to have issues and anyone who believes being %100 issue free all the time is a reasonable expectation is a fool. You have to find that balance where any additional gains from additional research are too small to warrant their effort. Even within nature we see compromises like this because perfection is impossible but getting clsoe to it isn’t.

    2) COST ANALYYSIS not Personal Bias: If you’re the type of persons who is going to see that once every 10 years data issue (where somethings serious happens but only once every decade and not always the same thing but just something serious) convince you that it’s because your goals and procedures aren’t as good as they should be than you\re not being realistic. Because perfection is not possible there is always some level of risk and so every day you risk that day being the day that something significant breaks and the finger pointing begins because no one at the top wants to concede that nothing is perfect and by definition that means there’s always some level of risk each day. This is when cost analysis matters more than ones personal biases. if the problem is rotted out and corrected an it falls within that amount of acceptable risk than you’ve done all that is reasonable expected and anyone demanding or promising more is a con man.

    What boils my backside and I’ve born witness to this is when you have decision makers who don’t listen to those under them when they raise the warning flags which in turn eventually leads to one of those warnings going off after which the decision maker’s tone about it changes and unlike before where they were blowing it off as no big deal they’re now trying to act like as if that’s not how they ever approached the issues; that they too have taken it seriously and the finger pointing begins. The best manager/leader isn’t the one who knows everything every employee under them knows and can do but one who knows how best to delegate and schedule and who listens to what those who work for him say and take it seriously. The best supervisor i ever had knew almost nothing about It/Dev stuff (that which I mainly handle) but he did know that I knew what i was doing and he knew when to see I was on target and when maybe I was letting my own biases get in the way and skew my perspective. The worst was the one who knew best how to always cover his backside no matter who was really at fault for something. While not always teh case the 2 worst supervisors I’ve had were both family members of the store/chain owner and both were clearly still being paid because they were family and not because they deserved to be.

    Like

Comments are closed.