Large PRs are Bad

I heard a joke years ago that went something like this. When a developer gets a pull request for code review that’s 100 lines long, they will open the file(s), look at the code, and ensure standards are being followed. They might run some code, they’d walk through the algorithm in their mind, and spend 10, 20, or more minutes examining how the change is built. If you give them a code review of 1000 lines, they’ll just assume it’s great and approve it in seconds.

I know that many developers don’t find that funny. Often I meet people that think they’re professional and they spend the time needed to examine the changes and ensure they work. I’m sure that many people do this often, and some people are very meticulous about their evaluation of the change. However, everyone gets busy and I know that often depending on who submitted the PR and how busy the reviewers are, the joke does reflect reality. The longer the PR, the less likely someone will either a) spend the time to carefully review it, or b) catch small mistakes.

There have been a few large profile outages in 2023, one of which was an Azure DevOps outage in Brazil. I’m not picking on Microsoft as AWS, GitHub, GCP, and others have had issues. I know GitHub is part of Microsoft, but it’s also a separate enterprise that really runs on its own in many ways. The point is more that there will be issues, and some of these are related to the rapid changes of a DevOps or GitOps workflow where PRs aren’t always reviewed clearly and cleanly.

In this case, there was a typo in how a process worked. A cleanup process was supposed to delete databases in Azure, but the typo had it deleting the logical servers. Those had many databases, not just old ones that needed cleanup. This PR, however, contained a lot of changes, as there was an upgrade to swap out older Azure Manager packages with Resource Manager packages. I don’t know if the cleanup job was related here or included in a large PR, but in any case, the PR was reviewed as in the joke above. It was approved and things started failing.

This wasn’t caught in testing as there wasn’t great test coverage. You can say MS should have more tests, and they should, but there will never be enough test coverage. There also weren’t any systems in their ring 0 (first) deployment that triggered this typo, so no one realized there was an issue. Again, ring 0 systems might not be representative of larger rings. Another reality that we aren’t likely to fix in every situation.

Microsoft recovered the data, but it took a long time. I don’t know how easy or feasible it is to create smaller PRs with something like this when you are upgrading packages in many files. I just know that the time that I make large-scale changes in code, with large PRs, often I find some problem somewhere. Especially if there are changes that aren’t all the related. If too many different things get included, the potential for mistakes and problems goes up.

I think this is actually a good DevOps story. They ran their process and there was a mistake. They fixed it and have started to adjust their process to add more testing in this area and potentially ensure this doesn’t happen again. The logging helped them diagnose the issue quickly once it was reported. Their ability to deploy on-demand meant that once the problem was understood, a fix could be quickly deployed. That’s what DevOps is: it’s not perfect, but it does allow us to understand, learn, and adapt quickly.

Now we just need to ensure that humans use the process in a way that other humans can more easily understand, with smaller PRs.

Steve Jones

Listen to the podcast at Libsyn, Stitcher, Spotify, or iTunes.

Posted in Editorial | Tagged | 1 Comment

A New Word: Fitzcarraldo

fitzcaraldo – n. a random image that becomes lodged deep in your brain – maybe washed there by a dream, or smuggled inside a book, or planted during a casual conversation – which then grows into a wild and impractical vision that keeps scrambling around in your head, itching for a chance to leap headline into reality.

I feel this most often with projects around the ranch. I’ll get an idea to build something, usually to make my life easier, but sometimes as a work of art, and in my mind, I’ll think about the exotic woods, the clever design, the neat ways to fasten things.

Then I realize that I can’t find time for many of the more practical things I’d want to do. I’ll never get to most of my ideas.

Maybe, however, I’ll tackle one or two. We’ll see. For now, lots of fitzcarraldo in my head.

From the Dictionary of Obscure Sorrows

Posted in Blog | Tagged , | Comments Off on A New Word: Fitzcarraldo

Which Schema is Queried First?–#SQLNewBlogger

I had to test something for a customer, and as a part of this there as a need to have a different default schema for a user. I wrote about that, but one of the things that occurred to me is the need to know how security works with schemas and the priority of queries.

The Scenario

A user in a database needed to access certain objects, which were going to be located in a separate schema. The previous post looked at the setup of the user, but what happens with queries. Let’s set up a few tables that will help us learn.

I’ll create two tables, one in each schema.

CREATE TABLE dbo.Location (locationname VARCHAR(20))
GO
INSERT dbo.Location (locationname) VALUES ('dbo schema')
GO
CREATE TABLE webapi.Location (locationname VARCHAR(20))
GO
INSERT webapi.Location (locationname) VALUES ('webapi schema')
GO

Let’s see what happens when we query the tables. If I write a query that qualified objects, I get what expected. I’ll login as APIUser and then run this code.

SELECT * FROM dbo.location
SELECT * FROM WebAPI.location

When I do this, I get the results expected.

2023-06-27 09_48_54-SQLQuery3.sql - ARISTOTLE.test (ApiUser (56))_ - Microsoft SQL Server Management

If I don’t qualify the table, do you know what happens? You should, and this post shows the results that the default schema is queried first.

2023-06-27 09_50_13-SQLQuery3.sql - ARISTOTLE.test (ApiUser (56))_ - Microsoft SQL Server Management

If I didn’t have a WebAPI.Location table, the dbo.location result would be shown. It is important that you understand SQL Server security if you manage instances or write queries, as you could otherwise feel you need to grant far too many permissions.

SQL New Blogger

This was a minor part of something else I was doing. I noted this in the previous post, but also realized the ALTER was a good second post and verifying the permission hierarchy was a good third post. I could have added this to the first one, but I like separating and focusing posts. Better for SEO if you care, but better for your workload and producing most posts.

Outside of the work I was doing, the sketch of these notes took about 2 minutes, and then the entire post was a little longer, with more code to write. This was about a 15 minute post.

You can do this.

Posted in Blog | Tagged , , | Comments Off on Which Schema is Queried First?–#SQLNewBlogger

Republish: The Minimum Upgrade Point

I guess I’m technically working today. I’m at That Conference in Wisconsin, speaking and networking with fellow software geeks, but I’m expecting to be mostly out of touch, I’m republishing The Minimum Upgrade Point

Posted in Editorial | Tagged | Comments Off on Republish: The Minimum Upgrade Point