Not many database systems get designed with a well thought out archiving plan at the beginning of their lifecycle. In most cases that’s fine since few databases seem to gather enough data to require archiving, and if there is extra time early in the life cycle of an application, it’s probably better spent tuning queries for efficiency anyway.
However when archiving is implemented, it can’t be done so in a cumbersome way. That almost always leads to some sort of data hoarding by individuals, which often leads to an application being built on Excel rather than SQL Server. This article talks about the problem of data access from a storage point of view, but it could easily apply to databases. If we require some special function, or intervention by the DBA to allow access to archived data, it’s quite possible that users will take matters into their own hands.
As SQL Server has grown, we have ended up with a number of features that should make the archival of data much easier. Partitioned views and tables in many cases can allow an application to function with understanding that the underlying data is stored in multiple locations, and potentially even multiple systems. It does take a little more administrative work to seamlessly implement these features, but it’s a skill that DBAs should be able to make work smoothly.
The problem of large data sets will continue to grow over time, and it’s a problem I suspect most DBAs will face at some point in their career. Take some time to learn about partitioning and how you can both manage data more efficiently over time, as well as improve performance if you find access patterns vary for different sets of data.
It’s also a good reason to ensure that your applications don’t ever expect to pull back an entire table for any reason. Working with a few rows of data at a time is prudent when you have thousands of rows and essential when you have billions.