Archive Every Day

I regularly see questions posted online from people struggling to manage a large archive process. Often the inserts into an archive table and the deletes from the source require large amounts of resources that strain systems and block other activity. These types of processes can also require lots of transaction log space, something many people forget about.

The best solution, in my mind, is to archive regularly. Decide how much data you need and then trim your working set back to that level on a regular, preferably daily, level. That way you limit the amount of resources you need at one time.

However that’s not always easy to implement in a system. I’ve encountered many applications that were set up to read from a single table and archiving data means no one can access that data unless it’s inserted back into the main table.

I wish that we would have better patterns and frameworks for building archiving into systems from the start. I almost wonder if we should be using a view built across a a regular table and archive table from the start for accessing data. This isn’t a complete solutions, and there are certainly problems here. Maybe we should have partitioning implemented early on, though the edition limits on that technology are problematic.

Microsoft is bringing us “stretch” tables in SQL Server 2016. Not a bad idea for archiving cold data, but as Brent Ozar points out, are they going to do better partition elimination? Will Microsoft give us a good way to handle backups? There are certainly issues here, and likely room for improvement in later versions.

Ultimately I think the complexity of archival data doesn’t matter for most systems. These days 10GB of data is a pittance, and since many applications (and their databases) are smaller than this, why bother. If you have a larger system, however, I’d encourage you to think about archival before you need it. When you actually have time to implement it well.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.5MB) podcast or subscribe to the feed at iTunes and LibSyn.

This entry was posted in Editorial and tagged archiving. Bookmark the permalink.

1 Response to Archive Every Day

Matthew Darwin says:

June 10, 2015 at 1:37 am

I definitely agree that this is something that should be considered at the design stage, not when it turns out that archiving is needed. I’m currently working on a few systems that have been allowed to grow massively without archiving/deleting and due to the scale (half a billion rows) are now a major problem to sort out an archiving strategy. In terms of partitioning, this is also essential to be considered at design stage due to restrictions around ensuring that the partitioning key is present in all unique indexes; I’ve recently been passed requests to partition tables that have a pk on an identity column, but need to be partitioned by created date. The only way to achieve this would be to change the pk (and all the referencing foreign keys….)

LikeLike

Comments are closed.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30