I love this quote, though I’m not sure it’s accurate. From The Future of Data Storage, the piece states: “What’s the most expensive thing you can do with a piece of data? Throw it out.”
That’s from a storage vendor, and obviously they’d prefer that you keep all your data, which means more storage and backup space needed. Certainly I do think that losing data that’s valuable can be expensive, but I also think that we often keep around older data that we don’t use, or won’t use, which is expensive. Not for individual pieces, but in aggregate, it becomes expensive. This is especially true if you move to the service area where you pay for what you use, as opposed to investing in a large chunk of storage that has a fixed cost.
I didn’t really think a lot of the piece, though it did get me thinking about backups. I’ve run backups for my entire career, and in 99 point some number of nines cases, I haven’t ever used the backup file again. These were insurance against potential problems. Even in places where I restored the backup to verify the process worked, I often just discarded the backup file at some point.
Early in my career, we had tape rotation systems to reuse the media a certain number of times, while also ensuring that we had off site copies and specific points in time saved. Today there are plenty of backups systems that perform deduplication and complex disassembly or re-assembly of files from blocks to use space more efficiently. That doesn’t always work well for database restores, especially when time is of the essence.
As vendors look to add more intelligent, or at least more efficient, processing to backup systems, I wonder if they really think about databases and how we use files. I hope so, and I’d like something that was optimized for database restores. I don’t mind combining the duplicate parts of files into some index, but I need to have the latest files available for quick restores. What about backing up a database to a file and keeping this file online and ready. Then, after the next backup, move the previous one to an area that dedups it, maybe takes it offline, etc. That way I have the best of both worlds. I rarely go back further than the latest full backup for a restore, so keep this ready.
Of course, we need to consider log backups, which really need to be kept online and intact if they have been made since the last full backup. Keeping track of that is a pain, but it’s something software could easily handle. Once we’ve made a new full backup, you can mark older log backups for deduplication. Though, if you’re building this into a system, perhaps performing a restore of the full backup files automatically should be included as well.