Can you monitor every configuration change and setting on all your SQL Server instances? Can you get an alert every time code changes, or even if an option for an object changes, such as the changing of an EXECUTE AS or the rebuild options for an index? Do you want to know about every security change (new logins, grants, revokes, etc.). Can you keep up with every alteration of a SQL Agent job?
You certainly can, but across any busy enterprise, all these alerts might result in a constant stream of items to review each day. In fact, in some companies, the volume might be high enough that this becomes a full-time job for someone. I’m not sure many of us think that it’s worth an employee’s time to actually review every change.
Even if you decided it’s worth reviewing every change, is it possible to do a good job actually doing so? Too many alerts usually result in an individual starting to treat all of them as though they are the same priority. It becomes hard to differentiate what’s important to review and what’s not over time if you are always looking at a stream of changes across disparate systems. This is one reason why I never want to get success message, but even failure or change messages can be problematic when the volume is high.
What can you do in a situation like this? Certainly there are alerts that are critical and need to be addressed right away, aren’t there? There are, but they are probably few. Making the decision about which items are important enough to review daily can be hard. I’ve typically only wanted critical alerts for backup tasks (after some retry) and privileged security alterations (add/change/remove sysadmin/securityadmin/processadmin/serveradmin). Those are items I need to take action on. Most other items, such as failed index rebuilds, job schedule changes, configuration alterations, I just want to capture and log.
In one of the SQLskills newsletters recently, Paul wrote about an issue where replication settings had changed. The distributor had gone from 72 hours to 72 days, resulting in memory pressure for the workload. While I know this is a problem, is this the type of alert you’d define as critical? I’m not sure I would. In fact, this is the type of alert I’d want logged as a set of changes on this system so that I could review it when it seems that the system is not running as smoothly as it otherwise might be. In fact, this is the type of problem I’d hope I’d catch through performance monitoring, which might pro-actively catch performance degradation that would lead a DBA to review changes and metrics before a user reported the situation.
Ultimately I want monitoring systems to help me find issues, and only notify me when I might need to take some action. To do this, the system might need to capture everything, but I want most of the items filtered until such time as the information might actually help me solve an issue. This takes some time to setup and tune as you discover holes in your monitoring, or you find that too much data is being passed on. Don’t be more afraid to add more data to capture, but be ruthless about removing extraneous notifications from the system. This is a case when too much information sent to the administrator is as bad as too little.