You come into work one day and as you sit down, your phone rings. It’s one of the business groups complaining that the database is running slow. You check the server and find CPU at 80%, 800 pages/sec, disk IOps of 230 and 124 transactions/sec. Is the database the problem?

Baselines are important to understand how your system is performing.

Baselines are important to understand how your system is performing.

Good DBAs know that baselines are essential. If you don’t know what values to expect from your server, it’s often hard to determine if the system is running slower than normal. Normal is something you need to define for each system, preferably in an automated way that updates your baseline over time.

When building a baseline, however, how do you average out the information?

That’s the poll this Friday. Let’s assume that you are examining the CPU percentage for a SQL Server and you have data points from every 5 minutes across the last month. What’s the average? Do you take the straight average? Do you break this down to hourly segments and then create further analysis that looks at different business periods?

It can become problematic very quickly. Many of us have slow and busy periods. Do we want an average that’s perhaps lowered by the slow periods in our workload? Do we want to break out the averages for maintenance periods separately from normal operations? If you are looking to compare today’s values, do you look at yesterday’s for the same time period? Last week? An average of all points across the last week?

Let us know what methodology you use and if you’d like to describe it in more than a paragraph or two, we’d love to have some articles published here on the site.

Steve Jones

The Voice of the DBA Podcasts

We publish three versions of the podcast each day for you to enjoy.

About way0utwest

Editor, SQLServerCentral
This entry was posted in Editorial and tagged , . Bookmark the permalink.

1 Response to Baselines

  1. SqlChow says:

    To me, the problem of baselining is more mathematical. We have a large set of discrete samples and we need to cofidently tell whether a given set of values fall within an acceptable range x% of the time. We can then use the standard deviation of such result sets as max deviation from the baseline (mean + 2x [std deviation], for 95% confidence).



Comments are closed.