I ran across a neat post from Michael Swart recently. In doing some spring cleaning, he was looking to remove unused, or maybe unnecessary, tables from his database. He published a script that looks through the plan cache to determine what connections exist between tables. He joked that if he doesn’t find any joy when examining a table, he drops it. At least I hope he was joking. If not, I suspect we’ll see a note on LinkedIn soon.
Actually, I’m sure he’s joking, and he makes a good point at the end of the his post. His script isn’t used to make decisions, but rather it provides a place to begin to investigate more about what the table might be used for in an application. It provides a starting point for more questions, such as is there value in removing this table. His company is embracing AWS, and they are becoming more cost conscious. This means keeping less data, and possibly moving cheaper data stores where possible to reduce license and/or hardware costs.
I suspect that other organizations will start to embrace similar attitudes as more move to the pay per month model of the cloud. Many of us rent hardware, and when we do, the recurring costs become an issue. Suddenly we might rethink the amount of data we keep and archive or remove older data, either to reduce costs, or risks. The GDPR brings with it a push to not keep all data in perpetuity.
The goal of better understanding our databases, and specifically tables, makes a lot of sense to me. Far too often I’ve inherited some system and only understood portions of the database. I’ve performed hours of investigation to try and better comprehend how data is stored, retrieved, and manipulated. I’ve found my share of unused tables, often renaming them for months and eventually deleting them.
I do think that we poorly document databases, almost treating them like a file share where we drop a new item when we need it, without thinking through the usage, ensuring others know about the entity, and often forgetting it exists if we don’t regularly use it. A RDBMS isn’t a file share, or at least it’s not an inexpensive one, so we ought to be cognizant of the data we keep and trim unnecessary waste over time.
I empathize with Michael and would relish the challenge to review and trim old tables where I could. However, I also know that often there are tables that won’t “make a difference”, either in cost or any other savings and aren’t necessarily worth the time to investigate and remove. Spending even an hour to decide if I should remove a 10 row table whose purpose is unclear doesn’t seem like a good use of time. Unfortunately, I’ve run into plenty of those, which nag me, but really shouldn’t be something I spend time on, and I have to leave them be.
You can hear the podcast here: http://traffic.libsyn.com/voiceofthedba/understanddb_50_v2068.mp3