Data storage has always been a concern for data professionals. Early on in my career, we dealt with large ESDI, IDE, and SCSI drives, all of which would fail unexpectedly in servers. Sometimes after a few years, sometimes after a few weeks. We learned to use RAID and tape backups to ensure that our data was recoverable.
In many places tape was the long term storage medium used. These days, I know many people have moved to secondary disk storage of some sort, often rotating data across a few disk types that give you recovery for days, weeks, or longer. I don’t know how long term storage work in Azure or AWS, but I assume some sort of combination of technologies are in use. I also know I don’t trust them completely to be readily available and recoverable after a few years.
For most of us, database backups aren’t really relevant after some number of weeks or months. We usually just don’t need to recover things from long term storage. The exception might be for some types of data that do need to archived for legal or financial purposes. I know we used to keep a end of year tape for 7 years after we’d closed the financial records at one company. I don’t know if that would be the case today, especially with so many “digital records” of transactions. Would we really need to recreate a system as it looked on a particular day from 5 years ago?
However, there are types of data that we might want to archive for a long time. An example might be the arts, where we have lots of music and video that can preserved. There might be other records, such as historical government records, which are suitable for WORM (write-once-read-many) systems.
A new type of recording uses glass and may provide archival storage for thousands of years. Obviously we don’t know this is the case as we haven’t been recording digital information for thousands of years, but it’s an interesting medium. It also doesn’t require the algorithms to be maintained as the idea is machine learning systems can read back the data and learn to interpret it.
To me, that might be the most interesting part of this project. Using computers to learn to read the data rather than requiring us to know have an MP3 player, a database system like SQL Server, or any other particular technology. Instead, we can let the computer learn how to read the data and then play back that recording of Prince in the year 3510.
Listen to the podcast at Libsyn, Stitcher, Spotify, or iTunes.