I ran across this post on developing database with SSDT. It has a lot of steps, and reading through it, I find this to make some sense, but I’m not sure I think this is easy. I can see why developers find databases to be a pain to work with. There are a lot of steps in this post to just setup and configure a database project. Databases are fundamentally hard to work with, as the model of maintaining state between changes and ensuring data is not lost can be hard. While the concept is similar to keeping track of configuration files, the scale of data in a database is vast and ever changing. Tooling to manage data that might need to be recovered isn’t very practical.
I ran across a developer that was trying to automate their database development. They used tooling to deploy a table to a production system. The application connected and data was stored in the table. The developer then dropped a column from a table and deployed this change. It worked, but this was a mistake and this person decided to deploy the previous version of the database, with the additional column restored. The deployment worked, but there wasn’t data in the column that had been dropped, and added back. Why not, asked the developer?
Many people are of two minds here. One, any tooling or automation should preserve data and allow for rollbacks. In the application world, this makes perfect sense, and even the data our application uses (reference files, configuration files, etc.) are restored if we rollback to a previous version.
For the data people, this makes no sense. The data in a column could be of significant size. We often plan for systems to reach millions (or more) rows of data, and trying to save the state of this data before a change isn’t practical. Even if we were to store changes items for a few deployments, it’s entirely possible that putting the data back wouldn’t make sense as related information in other tables might not match up correctly. Consider the case of a financial system and restoring old money values. Who knows what issues would be created?
For developers, this highlights one of the things they dislike about databases. They are must manage state transitions across deployments, and rolling back to previous versions isn’t often possible. This is one reason that I have often performed a backup before major deployments, and even today, in an automated DevOps process, I’d want to perform a backup if any significant data were being changed or deleted.
I’ve spent a lot of time advocating for DevOps and smoother, modern database development practices for the last few years. I don’t want the database to be a hindrance or impediment to change, but I also don’t want to compromise the integrity or safety of data. My pitch has always been that our database automation tools at Redgate don’t perform any magic. They smooth, and hopefully speed up, the process of making database changes that we’ve used for decades. They save you time and effort, just as other tools may do, but they can’t change the rules of relational database changes.
Everyone developing code inside or connecting to a database needs to understand how transactions and data changes work. There are rules and restrictions the protect our data. These mean we need to sometimes plan and consider the consequences of our actions. This should also mean that despite wanting to move faster and make changes, we can’t treat data placed in columns as malleable in the way a method in C# can be changed back and forth. We need to account for, and protect, the information stored in our systems. There are patterns that can help you evolve your database from one state to the next, but there isn’t any magic that lets you drop and add data storage elements without some preparation for handling the data itself.