Acting with Confidence

Recently, I saw a graph about making decisions that showed the impact of both reversibility and consequences. Here is an example of such a graph and how one might approach decisions. If things are easily reversible or have a low consequence, we tend to make a decision and move on. Or we are willing to make a decision. One of the examples of such a decision was choosing what to wear out to dinner. It’s easy to change, and (in general) of little consequence. Choosing to send a large amount of money to someone through Venmo (or some other mechanism), can be hard to reverse and have substantial consequences.

This made me think of some of the DBA and developer decisions I’ve made in the past. When we work with databases, the changes we make can have a large impact and be quite consequential to our organization. Downtime, data quality, etc. could all impact revenue, profit, reputation, or even future prospects of survival. That can be a lot of pressure when you are deciding to refactor a data model or adjust a lot of data during a deployment.

We might think we can rollback or undo changes, but often we would end up applying reversing transactions. If I change a data type, the data is changed ( assuming the DML completes). To change back, I can’t roll back outside of a restore. I would change the type back and have an equally large and long transaction run. Having HA or replication technologies in the mix can dramatically impact the scope of both the initial and reversing transactions.

How confident must you be in your actions before you undertake something of consequence? Do you require an easy rollback, or are you willing to act even if the rollback is painful?

Maybe a better question is how do you appraise the consequence of an action? Is it the application/database or perhaps the data? You might consider all the dependencies from other applications or pipelines on this. I know many DBAs worry about the performance impact of changes that can slow or stop other work. I constantly see people asking if Flyway can estimate how long a change will take, especially when dev/test environments are a poor representation of production sizes, scales, and workloads.

If you have a 2GB database, you might just make changes. A restore is quick, and I’ve often found greenfield applications taking this approach since the data sizes are small and even consequential actions can be undone with a restore operation. Many of the “code-first” technologies work great in these situations, but once we have multiple application dependencies and large data sets, restores can be non-trivial or even unacceptable ways to deal with issues.

The image linked above talks about gathering data and analyzing, which sounds like the prudent thing to do, but this can be easier to say than do in practice. Deciding what analysis to undertake and how long to spend on it are the real tricks. Those are the judgment calls that only experienced humans can make. While AI might help, this is an area I really want capable humans with the final say.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Unknown's avatar

About way0utwest

Editor, SQLServerCentral
This entry was posted in Editorial and tagged . Bookmark the permalink.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.