A customer had a question recently about using Data Masker for SQL Server and temporal (or system versioned) tables. I decided to make a quick demo that will show how this works.
This is a simple demo, using just a few changes, but the concepts are useful for larger systems.
Setting up a temporal table
In the demo database for Data Masker, there is a dbo.dm_customer table. I’m going to make this a temporal table with a few schema changes. First, I’ll add two columns that will be used to store the valid datetime ranges.
ALTER TABLE dbo.DM_CUSTOMER
ADD StartTime DATETIME2(3) GENERATED ALWAYS AS ROW START
EndTime DATETIME2(3) GENERATED ALWAYS AS ROW END
DEFAULT CONVERT(DATETIME2(3), '9999-12-31 23:59:59.999'),
PERIOD FOR SYSTEM_TIME(StartTime, EndTime);
Next, I’ll enable system versioning, and specify the history table.
ALTER TABLE dbo.DM_CUSTOMER SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE=dbo.DM_Customer_History));
At this point, let’s test. I’ll query the history table and show it’s empty.
Now, let’s make a couple changes.
UPDATE dbo.DM_CUSTOMER SET customer_firstname = 'Jillian' WHERE customer_id = 1000002
UPDATE dbo.DM_CUSTOMER SET customer_firstname = 'Sian' WHERE customer_id = 1000004
INSERT dbo.DM_CUSTOMER ( customer_id,customer_firstname,customer_lastname, customer_gender, customer_company_name,
customer_street_address, customer_region, customer_country, customer_email, customer_telephone,
customer_zipcode, credit_card_type_id, customer_credit_card_number)
( '1000', 'Steve', 'Jhones', 'M', 'Redgate', '123 My St', 'CO', 'US',
'email@example.com', '30333333333', '80014', '1', '12`345566')
SET customer_lastname = 'Jones' WHERE customer_id = 1000
SET customer_lastname = 'Jilly' WHERE customer_id = 1000002
SET customer_lastname = 'Jill' WHERE customer_id = 1000002
We requery, and you can see there is data in the history table (at the bottom). All of my changes are captured in a sequence of copies of the row.
There are a few more rows in the history table, but they got cut off in the image. That isn’t that important. Just note that the names match up, for the most part, between the tables.
Using Data Masker
Now the interesting part. I have a masking set that’s similar to ones I’ve used in other articles. In this case, I’m changing names in two substitution rules. However, these rules are set for the dbo.dm_customer table. What about the dbo.dm_customer_history table?
To change this, I need to do a few things. First, I need a Command rule to disable system versioning. This is the way I disconnect the tables so that I can update data.
This has to take place before I change dbo.dm_customer, otherwise I’ll insert a bunch of new data in the history table. My ordering looks like this:
Once this is complete, I now need a Table-Table sync rule to ensure data in one table matches the other. When I create the rule, I need to have a way to join the two tables together. In this case, I use the customer_id column.
Note that I’m ensuring the names and credit card numbers match here. This will wreck my history table slightly, in that all of the rows for a customer_id will have the same value, but this does ensure I don’t have sensitive information leaking.
Once I’m done, I make this a child of the other rules. This ensures that the substitutions run on the main table, then are synchronized to the history table.
You’ll notice I also have another command rule at the bottom. The text for this rule is below, and this rule enabled the temporal link again. I’m assuming the application is somehow using this. If not, you can ignore this rule and leave the tables disconnected if that’s Ok in your application.
All I need to do now is run the rules. Let’s do that.
Once the rules are complete, let’s query the tables again. I’m only looking for those rows that were changed previously.
As you can see, the dbo.dm_customer (top) table and the dbo.dm_customer_history table (bottom) are synchronized. The multiple values I had for different versions of customer 1000002 are now all the same. I’ve lost history, but if this were sensitive information, that works out. I could have used random values instead for the history table, but here I can see linkages if I’m checking for similar data.
This is a Hard Problem
Trying to determine how to rewrite history for audit tables is hard. Data Masker can do pretty much anything you would like, but here we need to be sure that we have an idea of what we want to do.
I’ve presented a simple version of setting all history to the current value, which will work in some cases. If not, let me know what you need to see, and I’ll build you a scenario.
Data Masker for SQL Server is an amazing tool in the SQL Provision bundle. Combined with SQL Clone, you can deploy databases with masked data for dev and test environments in seconds. Give it a try today and see what you think.