I learned a new trick with SQL Data Generator that I wasn’t aware of previously. I think this is a good idea for masking some of that production data that you might not want developers to have.
Let’s start with a production table. In my case, I’ve created a Sandbox_Prod database with a table in it for employees. I’ve added a few records that contain some sensitive information.
CREATE TABLE Employees ( empid INT IDENTITY(1,1) , EmployeeName VARCHAR(250) , EmpployeeEmail VARCHAR(250) , active TINYINT , salary money , pwd VARBINARY(max) ) ; GO INSERT INTO Employees VALUES ( 'sjones', 'firstname.lastname@example.org', 1, 10000, ENCRYPTBYPASSPHRASE('The User Sample', 'MyS%83ongPa44#word')) , ( 'awarren', 'email@example.com', 1, 20000, ENCRYPTBYPASSPHRASE('The User Sample', 'Ano$2therS%83ongPa44#word') ) , ( 'rsmith', 'firstname.lastname@example.org', 1, 5000, ENCRYPTBYPASSPHRASE('The User Sample', 'Kedj93m@@83ongPa44#word')); GO
I’ve got a second database, called Sandbox, that simulates my development environment. I’ve got the same table, but without any data in it.
What I want to do is move some of the production data to my development area, but not all of it. Some of the production data needs to be masked.
SQL Data Generator Sources
I can use SQL Data Generator from Redgate to do this, by using a data source that actually exists. In this case, I’ll create a new project and point it at my Sandbox database. I’ve deselected all of the tables except my Employees table.
When I pick my Employees table, I see the familiar generation screen on the right. Most of you are like me and notice the number of rows and the option to delete data.
However there’s another option. I can select the “Use existing data source” radio button instead. When I do this, I have a few choices for data. I can use an existing table or a CSV file. Both of those can be good choices, especially if I have sets of data I want to load into the table. Either one can help me to build known, specific data sets for development (or testing).
In my case I will choose an existing table. When I do this, I click the “Browse” button and I get a connection dialog for SQL Server. I pick my instance and the production database.
I click “Next” and then get the chance to select the table to use. In this case, I’ll pick the Employees table.
When I return to the main SDG screen, I see the table listed as the source, but my preview shows the actual production data. This is because I’ve mapped the production table as a source, and it will be used as it currently exists.
That’s not what I want. I want to mask the email address and the salary. However, now I can change things like I might do for any random data generation.
Let’s first click in the EmployeeEmail column. When I do that, I see the following, the column with its source set as the existing column in the production table.
However the drop down gives me lots of choices, including an Internet email generator.
If I select, then my preview changes. Now the image below shows production data for all columns other than the email.
I can repeat this for the salary (and password to be safe). When I do that, I’ll see random data for those columns and production data for others.
I can repeat this for all tables in my project, mapping through data that isn’t sensitive, and masking data that is. It’s a tedious process, but it’s a one time process for specific data. Once this is done, every restore can have the project run and the data masked. If production DBAs do this refresh, then developers never see sensitive information