The Challenge of Deleting Data

We collect a lot of data in our databases. Not as much in bytes as a lot of the video/audio/TikTok/Instagram sites, but still enough that many of us are constantly adding storage to our systems. All this data is not only a challenge to manage, but it also means that we are regularly dealing with query tuning issues. Better code, indexes, and more become regular challenges with large volumes of data.

I am a big fan of trying to reduce the data you manage where possible. Archive, delete, remove older data, do something. This not only makes your systems easier to manage and improves performance, but it reduces your risk. Any PII data you have that might store is an ongoing risk in the event of a data breach. I don’t pretend this is easy to do in any way, but it’s a good idea.

If you can remove data (or must because of a regulation like the GDPR), how do you ensure that data is deleted? Most of us know how to submit a DELETE statement, but that just removes the data from an online system. What if you restored or recovered this database tomorrow, would you remember to delete the data again? What about losing a copy of the data or log backup? What about older dev/test systems that were refreshed from production? The data might be in there. If you work through the possible problems, deleting data from a system isn’t as simple as you might expect.

This might be even more complex in the age of cloud computing, where we don’t control the hardware for primary systems, or for backups. There is an article on deleting data in the cloud that talks about the government standards that require that you not only delete data, but that you overwrite the physical hardware to ensure it can’t be recovered. This still doesn’t address backup systems, but it does help to clarify that many of us might start to demand cloud vendors not only de-allocate the disks we use (or the backup storage), but they also overwrite the storage with zeros.

Data security and the risks of not taking this seriously is becoming a bigger issue all the time. I don’t know that poor security will cause your organization to fail, but there can be significant costs and possibly reduced employment opportunities. While you might not want to be overly paranoid or concerned about every possible issue, it is worth asking questions of vendors, working through likely scenarios, and trying to quantify risk.

More and more systems are regularly under attack from malicious groups, which means we want to minimize simple mistakes, reduce human error, and limit the exposure we have from the data we have by storing only the data we need.

Steve Jones

Listen to the podcast at Libsyn, Stitcher, Spotify, or iTunes.

About way0utwest

Editor, SQLServerCentral
This entry was posted in Editorial and tagged , , . Bookmark the permalink.