I regularly see questions posted online from people struggling to manage a large archive process. Often the inserts into an archive table and the deletes from the source require large amounts of resources that strain systems and block other activity. These types of processes can also require lots of transaction log space, something many people forget about.
The best solution, in my mind, is to archive regularly. Decide how much data you need and then trim your working set back to that level on a regular, preferably daily, level. That way you limit the amount of resources you need at one time.
However that’s not always easy to implement in a system. I’ve encountered many applications that were set up to read from a single table and archiving data means no one can access that data unless it’s inserted back into the main table.
I wish that we would have better patterns and frameworks for building archiving into systems from the start. I almost wonder if we should be using a view built across a a regular table and archive table from the start for accessing data. This isn’t a complete solutions, and there are certainly problems here. Maybe we should have partitioning implemented early on, though the edition limits on that technology are problematic.
Microsoft is bringing us “stretch” tables in SQL Server 2016. Not a bad idea for archiving cold data, but as Brent Ozar points out, are they going to do better partition elimination? Will Microsoft give us a good way to handle backups? There are certainly issues here, and likely room for improvement in later versions.
Ultimately I think the complexity of archival data doesn’t matter for most systems. These days 10GB of data is a pittance, and since many applications (and their databases) are smaller than this, why bother. If you have a larger system, however, I’d encourage you to think about archival before you need it. When you actually have time to implement it well.