Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.
The other day I saw someone asking about a way to determine which databases have a log file larger than a data file. It’s an interesting query, but not necessarily an issue. I did query as to what their thinking was, and this was more a first step to investigation rather than an alert or concern. That’s good, because that’s what this query is.
I’ll write a query to check this in another post, but here I wanted to just discuss the meaning of data file size and log file size.
Data File Size
In most databases, you likely have a single .mdf file. Some might have more with other .ndf files, and others could have Filestream/MOT objects as well. The file sizes here are a combination of two things.
- your data
- free space allocated to the database for data in tables and objects, but not used.
These files make up your data, and are roughly the equivalent of the characters in a Word file, plus any whitespace you’ve added. This isn’t quite right as Word doesn’t pad out some allocation, but it’s similar. If I create an 8MB file for my database, I use 8MB on disk. This whether I’ve added a 1 table with 1 character, 1MB, or 7.999MB of data.
Log File Size
The log captures transactions, or the changes to my data. This includes inserts, updates, and deletes. The size of the log file is an indication of a couple things.
- log backups
The more activity in my database, the more log data I’ll capture. A busy database could have a lot of activity, as each change needs to be recorded. If I have a lot of inserts, I’ll grow my data file and my log file. If I have a lot of updates, I grow my log file, but may or may not grow my data file. If I have a lot of deletes, I grow the log file, but the data file remains the same, though I would have the option to shrink it.
Side Note: DO NOT USE AUTOSHRINK
The second influencing factor is the log backup frequency. If I generate 24MB of log records every day, do I need a 24MB log file? Not necessarily.
I could run log backups every hour (24 a day), which would mean I only need a log file to contain the largest amount of activity that occurs during any hour. If my log gets even levels activity every hour, I’d need roughly 1MB of log file space, which would then be marked for reuse after every hourly log backup. Of course, I’d want some padding so maybe 2MB is enough.
If I get 4MB of log record generation every hour during the business day and none outside those hours, maybe I need a 4 or 5 MB log file.
If I’m in simple mode, I need a log file big enough to contain my largest transaction x2 (to allow for rollback).
Is the Log File Larger than the Data File Normal?
I wouldn’t say this is common, but it’s perfectly normal, if you follow a certain pattern in your database. If your workload consists mostly of updates or deletes, and few inserts, you might have a need for a large log file. This would also mean the level of activity is larger than your data size, and that you don’t have frequent enough log backups to allow for a lower size.
Have I seen this? Yes.
I managed a database for our other sysops that powered our anti-virus application. This tracked the activity for all workstations and servers. The number of nodes was relatively fixed (small additions and deletions every day or so), but the number of transactions was high. We had reporting every 30 minutes, which was really changing statuses for the nodes, so constant update activity. To prevent the server from being overloaded, and because the data was mostly replaceable, we only backed up the log every 4 or 6 hours.
In this case, the change activity was higher than the data size, so we had a large log and a slightly smaller data size. Not common, but it happens.
This is a great post for everyone to write. Explain what you understand and think about data size v log size, use your own words, and examples from your career.