I caught a post from the Slack Engineering team titled: Re-architecting Slack’s Workspace Preferences: How to Move to an EAV Model to Support Scalability. In the post, an engineering intern describes the move from a blob table (actually a table storing JSON data) to an EAV table. EAV tables generally don’t perform well, which isn’t the same as scalability, but in the real world these two items are interrelated. I likely would have chosen a hybrid approach, using a wider table for known items, but keeping an EAV table for potential one-offs.
In any case, I don’t want to discuss EAV solutions. Instead I want to discuss schema migrations. That’s part of the focus I have at Redgate with our Compliant Database DevOps solutions, as deploying schema changes is a challenge for many customers. It’s why DBAs usually have jobs, but the traditional built-the-script-from-developer-descriptions-and-lists-of-changes doesn’t always work smoothly. This creates stress for the individuals and risk for the organizations.
One of the items in Slack’s post is about a dark mode of deployment. That’s similar to what I’d call dark launching, but the idea is the same. In this case, there’s a good description of how this is helpful. In this case, there is the migration of data from one table to another. The new table was populated as part of the deployment, but rather than just using the new or old table, the application was altered to pull data from both sources and compare them. This helps to ensure the data was moved correctly. I assume issues resulted in the old data being used.
There were a couple interesting things with this approach. First, instrumentation was used to measure the time spent pulling data from the new table, as a way of measuring performance. This also allowed the system to discover read/write bugs in the new process. If your system has any headroom and a decent workload, this is a great way of trying to ensure that a data migration worked.
If you have a new feature, you can also use this technique. Make the database changes and add application code for your feature, but don’t expose that to the user. Instead, add code that uses the feature, sending random data to the database and reading it back. In this way, you can test that your methods work and measure the load on the database. Of course, if you do this, make sure you can silently turn off the random generation if the database is negatively affected.
I’m a big fan of dark launching and measuring the impact of changes. In so many cases our deployments might be delayed for any number of reasons, so the pressure to release the feature today rather than tomorrow or next week is silly. I’d argue the ability to measure impact, even for a day, will help ensure better code quality for the user.
That’s if you are given time to fix any issues you find..