As someone learning about DevOps, I follow a number of people, one of whom is Gene Kim. When I see him get excited about a post, I usually read it. That’s how I found this post on Demo Data as Code. It’s a short, but interesting read. I think this is actually something more people ought to implement in their environments and not just for demos.
DevOps is about reliability and repeatability, among other things, but those two are tackled with automation for a known process. We don’t want simple, silly mistakes, or even complex errors that might undermine our ability to move forward and create value. We don’t want simple errors eating up resources and time from expensive talent with unnecessary work. Part of ensuring both repeatability and reliability involves using data in our databases to evaluate our application. This isn’t necessarily for demos, though it could be used for demos.
Once of the areas that is often left out of the process is the data that we use in our building our systems. We need some data for developers, for QA, and often for demos. In all of those cases, when humans need to repeatedly look at how well the software performs, and want to re-test things, they need some consistent data. I’d also argue that the need for agility means that we need a manageable data set. I think SQL Provision from Redgate is amazing, but I still don’t want to always develop with 2TB of masked data. I certainly don’t want to demo this for customers from a laptop, and might not want to share this in the cloud.
At Redgate, we sell masking with SQL Provision, and it supports most of the process that’s outlined in the Demo Data as Code article. What it needs, however, is a small set of data that can be masked in a deterministic fashion. What I recommend to most clients is that they build a known set of test data, which could be used for demos. This can include all your edge cases and show off new features. It’s helpful for developers, testers, and salespeople, who will always have a known, useful set of data.
This can’t be a build it and forget it, much like what is emphasized in the article. This will need to be altered over time. There ought to be a process to build this dataset, likely from production data that gets sanitized. This can then be distributed through SQL Provision (or similar technology), with backups, or even as a set of scripts in your VCS. Ensure an environment can be hydrated instantly on any platform, from a developer workstation to a sales laptop to a QA server. Once you have this, everyone can work on evaluating your software from a known baseline.
And if you find the need for more data, then just add it. You have a process, so add an additional step that will cover the holes you inevitably find.