Yesterday I republished an editorial from 2014 for the holiday. The topic was production subsets of data, which has been something that many data professionals have struggled with for years. Many of us have built scripts to delete, change, obfuscate, or alter production restores as a way of providing useful, but manageable development database sets. Or maybe it’s just some of us. I’m sure more than a few of us have given up on this task and just restored production databases in entirety to test and development systems.
I changed over my career to become a fan of additively building a known dataset rather than deleting extra data. I advocate adding rows from production (properly masked/obfuscated) and maintaining this set over time as requirements change. However, this isn’t without it’s own administrative headaches. I think it’s easier, but this does require commitment from everyone to keep going over time. It’s certainly better than each developer adding their own 10 rows of data to a table for testing.
A year ago, Redgate released SQL Clone, designed to solve some of these issues. Once an image is created, new databases for test and development and be provisioned in seconds. I found this to be an amazing product that really changes how I develop against databases, though it does require me to stop getting caught up trying to undo changes or manage a single database. Instead, I need to ensure I am saving code to version control and then build the habit to drop and rebuild a baseline database.
As we’ve worked on SQL Clone, I’ve found that there are lots of companies that offer similar ways of virtualizing your data, giving you access to large, production scale systems in seconds. Data masking, obfuscation, and more are features, with some vendors requiring specific hardware. Others, like Red Gate, have software add-ons (Data Masker). All of these products cost money, which can be an issue for many organizations, but I’m glad that this technology is growing and advancing. With GDPR and other draft legislation, many of us need to take better care of our data and build more secure architectures.
Containers are another interesting way to virtualiza data, though they don’t solve the scale issues. If you can work with a smaller data set, and maintain that, then containers might provide a fantastic way for you to learn to build, teardown, and rebuild databases in seconds.
The world of databases hasn’t changed a lot in some ways across my career, but in others, I’m amazed. Data virtualization is one of these areas, and if you haven’t trialed the technology, maybe you should give it a whirl this year.