Hidden Heroes

A few weeks ago I was in Bletchley Park, at the facility where the Allies decoded and broke many of the German Nazi messages in World War II. It’s quite a facility and museum, and I hope to go back. I was distracted that day and didn’t get a lot of time to enjoy the exhibits and really learn more about what happened there.

I was there for our Redgate 2026 Company Kickoff, and as a part of that, two different executives in our company shared their stories of people who had worked there. What was interesting is that until we planned this event, these two people had no idea that there were people they knew well, who had been part of the effort to end World War II at Bletchley Park. This facility can be considered to be one of the birthplaces of computing.

They were some of the hidden heroes of World War II.

Our executives drew a parallel of these people to our customers, notably the DBAs we work with. These are often hidden heroes in their organizations, toiling away, getting things done, keeping systems running, without the recognition or gratitude they might get if their efforts were more widely broadcast.

That might be a bit of a stretch. However, many of us who work on database systems are doing essential work that our organizations depend on. If we do a poor job, many people complain. If we make mistakes, (usually) lives aren’t lost, but profits can be. Often, no one knows our names, we don’t get a lot of thanks for databases working well, and we have high expectations from our customers.

Many of us know that data is critical for most organizations today. Many in management today are acknowledging this as well, at least to investors. I don’t know if they’ll ever start to truly appreciate data professionals with kudos and compensation. That might not be a step they’re willing to take since many other departments might claim to be just as important as they also work with data in some way.

Perhaps we will remain the hidden heroes in today’s organizations.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | Leave a comment

A New Word: los vidados

los vidados – n. the half-remembered acquaintances you knew years ago, who you might have forgotten completely if someone hadn’t happened to mention them again – friends of friends, people you once shared classes with, people you heard stories about, who you didn’t know well but who still made up the fabric of your intense little community – making you wonder who else might be out there somewhere, only just remembering that you exist.

I have a lot of los vivados in my life. I’m old (late 50s), so high school and college were a long time ago. This is one thing I love about Facebook and Instagram is that I’ll get a memory, a glimpse of the past, of someone I knew, but not well. A post by someone closer leads me to the los vivados of my life.

I had this recently at a Redgate event. Someone came up to me and the face was familiar, but I didn’t recognize them or their name. However, they reminded me we’d met years ago at an event and had a chat.

I might be more memorable since I speak often, but I’m sure there are people how there who barely remember me, or remember something I’ve said/written, but don’t know the name.

I think los vivados exist for all of us, mostly because of the faulty human memory.

From the Dictionary of Obscure Sorrows

Posted in Blog | Tagged , | Leave a comment

Changing Data Types

Recently I was working with a customer that is trying to smooth out their database development process. They noted that a lot of changes tend to take a significant amount of time and impact their availability. They came to Redgate to see if Flyway might be a way to improve the effectiveness of their internal groups that build, deploy, and manage their database software.

We can help, but often when I get called to help architect things, we are trying to decide if the customer is a good fit for our solutions. Since we work on a subscription model, we want to be sure customers are happy and get value from their decision. Otherwise they have a poor experience and don’t renew. This might be because they aren’t ready, or it might be that the question wasn’t considered of whether our solution fits their environment well. In any case, I usually dig into the goals and challenges they’ve faced with their current process.

In this case, they found that developers often changed data types of columns to better match the data being stored. That can be a disruptive change, and while Flyway does some amazing things, the software is still bound by the rules of making changes on the platform. In this case, SQL Server changes can sometimes be metadata-only changes, which are fast. Sometimes this isn’t the case.

After the call, I started to wonder how often people change data types? I’ve rarely done this, and honestly, I’ve used a zero-downtime pattern a few times to add a new column, move data, use triggers, and then when I’m sure all data is moved, drop the old column. Changing types in place seems like a bad idea most of the time.

Sometimes it’s not a big deal, but I prefer to avoid any potential problems if I can.

That got me wondering. How often do you change data types in existing tables/columns? Is this because of poor data modeling? Changing requirements? Or are you lucky enough to avoid this?

Or maybe you’re unlucky in that you can’t change those poorly chosen types.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | Leave a comment

Finding and Updating Duplicate IDs: #SQLNewBlogger

Finding duplicates was an interview question for me years ago, and I’ve never forgotten it. Recently I got asked how to easily do this and delete them, so I decided to write a couple of posts on the topic. This one looks at simple, single column IDs. The next one will look at more complex situations.

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

A Simple Scenario

Like many people, I like identity fields for primary keys. However, lots of people build tables like this:

CREATE TABLE PurchaseOrder
(
     poid INT IDENTITY(1, 1),
     purchaseordernumber VARCHAR(20),
     podate DATETIME,
     active INT
)
GO

No nullability, and no PK constraint. People think an identity prevents duplicates. Run the query above and then the script below.

INSERT INTO PurchaseOrder
(
     purchaseordernumber,
     podate,
     active
)
VALUES
('PO-2023-00001', '2023-01-15 09:30:00', 1),
('PO-2023-00002', '2023-01-22 11:45:00', 1),
('PO-2023-00003', '2023-02-05 14:20:00', 1),
('PO-2023-00004', '2023-02-18 10:15:00', 0),
('PO-2023-00005', '2023-03-03 16:30:00', 1),
('PO-2023-00006', '2023-03-17 08:45:00', 1),
('PO-2023-00007', '2023-04-02 13:10:00', 0),
('PO-2023-00008', '2023-04-15 15:25:00', 1),
('PO-2023-00009', '2023-05-01 09:50:00', 1),
('PO-2023-00010', '2023-05-14 12:05:00', 1),
('PO-2023-00011', '2023-06-01 14:40:00', 0),
('PO-2023-00012', '2023-06-15 10:35:00', 1),
('PO-2023-00013', '2023-07-02 16:55:00', 1),
('PO-2023-00014', '2023-07-17 08:20:00', 0),
('PO-2023-00015', '2023-08-03 11:30:00', 1),
('PO-2023-00016', '2023-08-18 13:45:00', 1),
('PO-2023-00017', '2023-09-04 15:10:00', 1),
('PO-2023-00018', '2023-09-19 09:25:00', 0),
('PO-2023-00019', '2023-10-05 12:40:00', 1),
('PO-2023-00020', '2023-10-20 14:15:00', 1)
GO
SET IDENTITY_INSERT dbo.PurchaseOrder ON
GO
INSERT INTO PurchaseOrder
( poid,
     purchaseordernumber,
     podate,
     active
)
VALUES
(19, 'PO-2023-00021', '2026-01-15 09:30:00', 1),
(14, 'PO-2023-00022', '2026-01-22 11:45:00', 1)
GO
SET IDENTITY_INSERT dbo.PurchaseOrder OFF
GO

Now if we select all the rows from this table, we might think things are fine. After all, all the purchaseordernumber fields are unique.

Checking Duplicates

I’ll run this query. Notice I use a GROUP BY on the poid to list these together with a count. In the image, we see some counts that are greater than 1, which indicates a duplicate. We are grouping, or putting all the rows with the same value together.

2026-02_0139

I often will add a HAVING clause to this, which lets me filter the grouped items. When I do that, I just see two items.

2026-02_0140

Notice if I change this to purchaseordernumber, I don’t get duplicates. This is because those are unique.

2026-02_0142

However, a lot of people often build software that edits using the underlying key, so they can edit the PO number. Let’s do that. I’ll change the PO number for id 19. First we’ll get the current values, and then re-query.

If we look below, we see separate purchase order numbers, but when we try to update one of them, we get two changed. Because we have duplicate hidden surrogate ID keys.

2026-02_0143

We want to fix this, so what can we do?

What we want to do is find the duplicate 14s and 19s (and others) and change them.

Fixing the Issue

While trying to fix this, I realized that one can’t update an identity field. That actually makes the fix really, really simple.

Since I want to give the rows new poid values, I need to find those rows which are duplicates and then re-insert them into the table. I also need a way to delete the old duplicate values as well.

This can be tricky, as the purpose of an identity (usually) is to ensure there are not duplicate rows. It’s possible every field in the row is duplicate, which could be an issue. In this case, I’d likely just copy the data back in and delete all the “old” rows, which were the same.

In my case, the purchaseordernumber is different, so we can use that with the date to decide which is a duplicate and which row we keep.

SQL New Blogger

This is a little longer post, and it somewhat got away from me, but this isn’t an easy thing to write about, nor is it short. Easy to mess this up.

This post took me about 45 minutes to write. The code part wasn’t long, but I had to think about how to frame the issue with test code and explain that. SQL Prompt made the coding easy once I knew what I wanted. I built this over 3-4 days, working on it at 5-10 minutes at a time.

That’s a great way to tackle complex topics.

You could do this and impress an interviewer. Highlight this post in your resume/LinkedIn/etc.

Posted in Blog | Tagged , | Leave a comment