T-SQL Tuesday #198–Change Detection

This month we have a new host, Meagan Longoria, who graciously agreed to help me this month. I’ve known Meagan for a number of years and she’s been a person whom I’ve asked questions about data visualization and analysis in the past. I was slightly surprised by the topic this month, but only slightly. I’ll write my answer below, but if you want to try blogging and host a future month, let me know.

Change Detection

The topic this month is change detection, which is important for efficient ETL work, but also for other areas, such as auditing. I haven’t tended to work in high volume systems where we did a lot of ETL and needed to very efficiently detect changes. Most of the time I’ve had ETL pipelines they were busy, but not excessively so.

As a result, in the past, I’ve often used a roll-your-own approach in the SQL 6.5->2008 era. As I’ve worked on those systems, we’ve usually used a simple update or modified date in the table that tracked when something was altered. By knowing the last time a pipeline ran, we could gather all data from that point forward and extract it.

We used a similar approach to send a lot of emails from SQL Server Central years ago. That can work well, and as long as you track the last execution of your pipeline process, whatever that is, you minimize the data being transferred.

A Modern Approach

I ran into dbt a few years ago and did a one day class on how it works. It was interesting to me, and I could see the appeal. Recently I had a conversation with John Miner, who’s used it in his Fabric Modern Data Platform series. I would be very tempted to use dbt, in conjunction with a modified date as described above, to ETL data around today.

However.

I have been seeing that Change Data Capture (CDC) is being used by lots of products these days. It’s behind the Fabric mirroring, Oracle has used it for years, when you look to move data into Databricks, CDC is common, it seems like it’s everywhere. 

If I were going to be regularly moving data in 2026, CDC seems like something I’d experiment with and test, since it’s a known technology that works across many platforms and there is a lot of knowledge out there on how it works. AI can certainly help with experiments, and with understanding the overhead on your system, because there is some overhead.

I haven’t looked at the new Change Event Streaming, though I’m always wary of anything that limits me to one cloud. The more generic CDC, with AI assisted configuration and maintenance, seems like a better approach.

Unknown's avatar

About way0utwest

Editor, SQLServerCentral
This entry was posted in Blog and tagged , , . Bookmark the permalink.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.