Using Data Compare with Recent Data Only

This is a post that looks at how to compare data changes in recent data. A customer recently asked me about looking at a table, and choosing specific data to compare. In this case, the data they were looking to compare was the most recent data.

Scenario

I decided to set up a quick scenario to showcase this for the customer. I created a table that has some data:

CREATE TABLE [dbo].[DataWithTime](
     [myid] [int] IDENTITY(1,1) NOT NULL,
     [Mydata] [varchar](20) NULL,
     [mytime] [datetime] NULL,
  CONSTRAINT [DataWithTimePK] PRIMARY KEY CLUSTERED
(
     [myid] ASC
)
GO
INSERT INTO dbo.DataWithTime (Mydata, mytime)
VALUES
( 'A', N'2021-01-22T12:42:33.213' ),
( 'B', N'2021-01-22T12:52:33.213' ),
( 'C', N'2021-01-22T13:02:33.213' ),
( 'D', N'2021-01-22T13:07:33.213' ),
( 'E', N'2021-01-22T13:12:33.213' )

I put this in my sandbox database. I wanted a second copy of this same table, but with less data, in another database. I edited the insert statement to look like this:

INSERT INTO dbo.DataWithTime (Mydata, mytime) 
VALUES
( 'A', N'2021-01-20T12:42:33.213' ),
( 'B', N'2021-01-21T12:52:33.213' ),
( 'CC', N'2021-01-23T13:02:33.213' ),
( 'D', N'2021-01-24T13:07:33.213' ),
( 'EE', N'2021-01-25T13:12:33.213' ),
( 'F', N'2021-01-26T13:07:33.213' )

Now I have two copies of my table, with disparate data. What’s different?

Data Comparison Filters

If I open SQL Data Compare, you get the default comparison. I’ll set this up with my two test tables:

2021-01-25 11_12_32-(local)_SQL2017.SimpleTalk_1_Dev v (local)_SQL2017.SimpleTalk_5_Prod.sdc_

When I do the comparison, I see the differences between the tables. As you can see, I edited two rows and added one.

2021-01-25 11_13_25-SQL Data Compare - E__Documents_SQL Data Compare_SharedProjects_(local)_SQL2017.

That’s great, and in a table of a few rows, this isn’t an issue. What if this table has a million rows? Or a billion? I don’t want to scan everything.I want to limit things.

I can, if I click “Edit Project”.

2021-01-25 11_17_43-SQL Data Compare - E__Documents_SQL Data Compare_SharedProjects_(local)_SQL2017.

and then choose Tables and Views. I’ll see my table listed.

2021-01-25 11_17_59-(local)_SQL2017.SimpleTalk_1_Dev v (local)_SQL2017.SimpleTalk_5_Prod.sdc_

I can select the row with my table, DataWithTime, and then I can click the “Where clause” link in the upper right.

2021-01-25 11_18_06-(local)_SQL2017.SimpleTalk_1_Dev v (local)_SQL2017.SimpleTalk_5_Prod.sdc_

This pops up a dialog where I can enter a WHERE clause to be used for the table. I can set the same clause for both the source and target, or use separate ones. I’ll use the same one here.

2021-01-25 11_18_51-(local)_SQL2017.SimpleTalk_1_Dev v (local)_SQL2017.SimpleTalk_5_Prod.sdc_

I can click OK for this and then Compare now to re-run the project. This gives me the data compared, but without looking at any data before the 25th of Jan. Notice only two rows below instead of 3.

2021-01-25 11_19_07-SQL Data Compare - E__Documents_SQL Data Compare_SharedProjects_(local)_SQL2017.

Am I sure this still didn’t impact my SQL Server with a large query? This works great with 5 rows, but what about 1billion? Well, I ran the XEvent Profiler while I was editing the project, and then filtered this down to the SQL tools. When I do that, I see this:

2021-01-25 11_21_16-ARISTOTLE - QuickSessionStandard_ Live Data - Microsoft SQL Server Management St

The query being issued has my WHERE clause, which filters out data at the query processing level. This doesn’t guarantee a seek or limited reads, but if I have the column indexed, then I would get an efficient a plan as I could get.

SQL Data Compare is fantastic tool for finding data differences. Comparing large volumes of data can be slow, but if you use filters, you can dramatically speed things up. If you haven’t tried SQL Data Compare, download an evaluation today and see what you think.

About way0utwest

Editor, SQLServerCentral
This entry was posted in Blog and tagged , , . Bookmark the permalink.

1 Response to Using Data Compare with Recent Data Only

  1. Jeff Moden says:

    This is really cool, Steve. Thanks for taking the time to write up such a nice, clear, demo.

    My concern (actually, an observed problem from the past) is that SQL Data Compare creates RBAR INSERTs. While it works mostly fine for smaller stuff, it would take a heck of a long time to INSERT a couple of million rows difference. It’ll be interesting to see if SQL Data Compare will even handle such large differences.

    Yeah… that means that I’m going to have to test it for such a scenario.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.