Changing Data Types

Recently I was working with a customer that is trying to smooth out their database development process. They noted that a lot of changes tend to take a significant amount of time and impact their availability. They came to Redgate to see if Flyway might be a way to improve the effectiveness of their internal groups that build, deploy, and manage their database software.

We can help, but often when I get called to help architect things, we are trying to decide if the customer is a good fit for our solutions. Since we work on a subscription model, we want to be sure customers are happy and get value from their decision. Otherwise they have a poor experience and don’t renew. This might be because they aren’t ready, or it might be that the question wasn’t considered of whether our solution fits their environment well. In any case, I usually dig into the goals and challenges they’ve faced with their current process.

In this case, they found that developers often changed data types of columns to better match the data being stored. That can be a disruptive change, and while Flyway does some amazing things, the software is still bound by the rules of making changes on the platform. In this case, SQL Server changes can sometimes be metadata-only changes, which are fast. Sometimes this isn’t the case.

After the call, I started to wonder how often people change data types? I’ve rarely done this, and honestly, I’ve used a zero-downtime pattern a few times to add a new column, move data, use triggers, and then when I’m sure all data is moved, drop the old column. Changing types in place seems like a bad idea most of the time.

Sometimes it’s not a big deal, but I prefer to avoid any potential problems if I can.

That got me wondering. How often do you change data types in existing tables/columns? Is this because of poor data modeling? Changing requirements? Or are you lucky enough to avoid this?

Or maybe you’re unlucky in that you can’t change those poorly chosen types.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | Leave a comment

Finding and Updating Duplicate IDs: #SQLNewBlogger

Finding duplicates was an interview question for me years ago, and I’ve never forgotten it. Recently I got asked how to easily do this and delete them, so I decided to write a couple of posts on the topic. This one looks at simple, single column IDs. The next one will look at more complex situations.

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

A Simple Scenario

Like many people, I like identity fields for primary keys. However, lots of people build tables like this:

CREATE TABLE PurchaseOrder
(
     poid INT IDENTITY(1, 1),
     purchaseordernumber VARCHAR(20),
     podate DATETIME,
     active INT
)
GO

No nullability, and no PK constraint. People think an identity prevents duplicates. Run the query above and then the script below.

INSERT INTO PurchaseOrder
(
     purchaseordernumber,
     podate,
     active
)
VALUES
('PO-2023-00001', '2023-01-15 09:30:00', 1),
('PO-2023-00002', '2023-01-22 11:45:00', 1),
('PO-2023-00003', '2023-02-05 14:20:00', 1),
('PO-2023-00004', '2023-02-18 10:15:00', 0),
('PO-2023-00005', '2023-03-03 16:30:00', 1),
('PO-2023-00006', '2023-03-17 08:45:00', 1),
('PO-2023-00007', '2023-04-02 13:10:00', 0),
('PO-2023-00008', '2023-04-15 15:25:00', 1),
('PO-2023-00009', '2023-05-01 09:50:00', 1),
('PO-2023-00010', '2023-05-14 12:05:00', 1),
('PO-2023-00011', '2023-06-01 14:40:00', 0),
('PO-2023-00012', '2023-06-15 10:35:00', 1),
('PO-2023-00013', '2023-07-02 16:55:00', 1),
('PO-2023-00014', '2023-07-17 08:20:00', 0),
('PO-2023-00015', '2023-08-03 11:30:00', 1),
('PO-2023-00016', '2023-08-18 13:45:00', 1),
('PO-2023-00017', '2023-09-04 15:10:00', 1),
('PO-2023-00018', '2023-09-19 09:25:00', 0),
('PO-2023-00019', '2023-10-05 12:40:00', 1),
('PO-2023-00020', '2023-10-20 14:15:00', 1)
GO
SET IDENTITY_INSERT dbo.PurchaseOrder ON
GO
INSERT INTO PurchaseOrder
( poid,
     purchaseordernumber,
     podate,
     active
)
VALUES
(19, 'PO-2023-00021', '2026-01-15 09:30:00', 1),
(14, 'PO-2023-00022', '2026-01-22 11:45:00', 1)
GO
SET IDENTITY_INSERT dbo.PurchaseOrder OFF
GO

Now if we select all the rows from this table, we might think things are fine. After all, all the purchaseordernumber fields are unique.

Checking Duplicates

I’ll run this query. Notice I use a GROUP BY on the poid to list these together with a count. In the image, we see some counts that are greater than 1, which indicates a duplicate. We are grouping, or putting all the rows with the same value together.

2026-02_0139

I often will add a HAVING clause to this, which lets me filter the grouped items. When I do that, I just see two items.

2026-02_0140

Notice if I change this to purchaseordernumber, I don’t get duplicates. This is because those are unique.

2026-02_0142

However, a lot of people often build software that edits using the underlying key, so they can edit the PO number. Let’s do that. I’ll change the PO number for id 19. First we’ll get the current values, and then re-query.

If we look below, we see separate purchase order numbers, but when we try to update one of them, we get two changed. Because we have duplicate hidden surrogate ID keys.

2026-02_0143

We want to fix this, so what can we do?

What we want to do is find the duplicate 14s and 19s (and others) and change them.

Fixing the Issue

While trying to fix this, I realized that one can’t update an identity field. That actually makes the fix really, really simple.

Since I want to give the rows new poid values, I need to find those rows which are duplicates and then re-insert them into the table. I also need a way to delete the old duplicate values as well.

This can be tricky, as the purpose of an identity (usually) is to ensure there are not duplicate rows. It’s possible every field in the row is duplicate, which could be an issue. In this case, I’d likely just copy the data back in and delete all the “old” rows, which were the same.

In my case, the purchaseordernumber is different, so we can use that with the date to decide which is a duplicate and which row we keep.

SQL New Blogger

This is a little longer post, and it somewhat got away from me, but this isn’t an easy thing to write about, nor is it short. Easy to mess this up.

This post took me about 45 minutes to write. The code part wasn’t long, but I had to think about how to frame the issue with test code and explain that. SQL Prompt made the coding easy once I knew what I wanted. I built this over 3-4 days, working on it at 5-10 minutes at a time.

That’s a great way to tackle complex topics.

You could do this and impress an interviewer. Highlight this post in your resume/LinkedIn/etc.

Posted in Blog | Tagged , | Leave a comment

Whiling away an afternoon, thinking

I come to Heathrow often. Today is likely somewhere close to 60 trips to this airport in my lifetime. After over 19 years of working with Redgate (plus a trip in college), this is the airport I’ve visited most outside of Denver (my home).

This is the view outside my hotel window. You can see an A320 (or a Boeing 737) taking off. This is a good sized plane I fly all over the US. It is about 35 meters (100ft) wide and carries 150-180 people.

As it took off beside the A380 pulling into the terminal, it looked like a toy plane. Those things are massive.

20260317_151547

I’ve stayed here a few times after landing to try and get some work done, and to check in early to take a nap. I landed at 10am after flying all night, and I could use some rest. Rather than heading into London and getting stuck sitting in a restaurant waiting to check in, I booked a room here. Already had a nap and ready to work.

Thinking Time

This is good thinking time, which is sometimes rare to get. Years ago on one of our multi-times-a-week SQL Server Central calls, Andy told me that he needed time to think about his work as a DBA and manager of others. Time to just consider how things are going, why things are going well or not, and which direction to go.

Time just thinking about a topic.

It has stuck with me all these years that time to just think is important, and it helps me clarify how I view things.

It’s often how I get editorials written. Turning vague ideas into something concrete. I’ll look at a note I’ve made or re-read something and think about it. When inspiration strikes, I’ll start writing.

Today I was thinking, watching planes take off when I realized I hadn’t seen a plane in awhile. Normally they take off around every 45s at LHR. Then I saw this little vehicle doing down the runway, something I’ve never seem.

20260317_151301

I watched for a few minutes and realized there was another one coming the other way. They met in the middle and circled a few times before leaving the runway.

20260317_151335

I’m sure this is some FOD (Foreign Object Detection) process looking for pieces that might have fallen off planes (yikes) or vehicles, or somewhere. A pilot friend said that this is something that is done in the military, sometimes by grunts walking the runway looking for something that might get pulled into a jet engine.

In any case, it was something new and unusual and inspired me.

I don’t know that everyone needs thinking time, but if you manage or lead or are trying to develop strategy, I do believe you need some quiet thinking time when you’re not really getting anything done, but just thinking about work.

Glad that I took the time today and grateful to have the opportunity.

Now back to writing about AI and other things impacting the data professional.


Posted in Blog | Tagged , , | Leave a comment

Breaking Down Your Work

I saw an interesting LinkedIn post on Kyler Murray and how he goes about approaching the game of American football. I don’t know if this meme is true, but certainly, his efforts to prepare have been a reported issue during Murray’s career. The post actually deals with sales and analyzing the reasons for deal success or failure, something I’ve been able to witness at Redgate the last few years. It’s interesting to me to see the sales process examined, though I don’t make sales.

Incidentally, one of the comments is one I appreciate, referencing Kobe Bryant and the Mamba Mentality. I like the approach of working and asking questions to become better.

Most of us technical people aren’t thinking of sales, but do we break down and re-examine how we do our jobs? Do we aim to improve the skills we have and develop more depth in the areas we work? I know lots of technical people like learning new skills, but is looking at the improvement (or refinement) of existing skills on your list?

My experience has been that most people don’t look to grow deeper in many ways. They learn a thing and then often use that skill, but don’t often re-examine to see if they could actually do that thing in a new way. Technology changes, and it’s easy to think that the way you write SQL or build servers or implement security is good enough. It can pay to not only learn new things, but re-examine your existing patterns and practices to see if there are better ways to accomplish those tasks.

This is where the one year of experience repeated ten times comes about with candidates who don’t interview well. They’ve been repeating patterns without improving them.

I promote the idea of regularly improving your skills, sharpening your tools, and growing your abilities in a way that provides value for your employer and ensures you have a successful career.

This is going to be more important in the future, especially with AI impacting the way many managers view technical work.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | Leave a comment