T-SQL Tuesday #70 – The Enterprise

It’s hard to believe this is the 70th edition of T-SQL Tuesday. I haven’t missed many, and I’ve enjoyed them all. I hope more of you are participating in this monthly blog party started by Adam Machanic (b / t). Whether you write today, or you write at some time in the future, T-SQL Tuesday topics are good ways to showcase your knowledge.

This month’s invitation comes from Jen McCown, of Midnight DBA. Jen asks us to write about managing an Enterprise of SQL Servers. That’s a good topic, since many of us struggle to manage multiple instances. Whether you have 5 or 500, there are some good ideas that you might implement in any environment, and I’m looking forward to reading what people have to say.

Decoupled Consistency

A long time ago I was thrust into the role of managing hundreds of SQL Server instances at a large company. I’d managed dozens of machines before, but this was a whole new level of scale. What’s more, the majority of machines were set up completely independently of each other, with no concerns other than a mandate to try and keep versions close to each other. That was a challenge in and of itself.

The only common tool in use was Patrol, which was mainly used to monitor performance counters, but we didn’t have the SQL Server specifics, so we really could only get general performance counters, and even those were difficult to access when thousands of hosts were being monitored.

I brought an idea with me from a previous job that had served me well with a handful of disparate instances. We’d consistently set up each instance, both installation, configuration, and monitoring, however we’d decouple each instance from others. Our goal was each machine capable of operating independently from all the others.

We had found that central servers go down, that we had no good way of tracking the status from machines when this happened, and most importantly, there are always exceptions.

With this in mind, we

built a procedure to install SQL Server, but included a number of scripts for post installation that would standardize settings. This allowed our server build people to easily handle SQL Server installation as part of their job. These days I’d use something like FineBuild to make this easy.
set up a DBA database on each instance that monitored jobs and other important status for the instance. If the instance was up and SQL Agent running, we’d know the status of that instance.
Performed basic monitoring of some key performance counters for a quick trend of the latest performance over the last week, similar to what SQL Monitor shows. Getting a quick snapshot was quicker than accessing central monitoring, especially in a crisis.
Assembled a report for the instance each day, calling out exceptions at the top, and leaving expected data below. We needed documentation for our ISO certification, and our auditors loved this.
We did use a central server to assemble the reports from all instances and compile them for the DBAs to review. All exceptions were at the top, and we used a left join to compare the list of instances with current reports. If any were missing, we bubbled that to the top as an exception.

These were all good, simple ideas that allowed a team of 3 DBAs to manage 400-500 instances of SQL Server. We were flexible enough to handle the needs of mission critical Finance instances as well as often changing development machines. Our data was simple and always available for us to give to clients when they had questions.

Most importantly, we built a system that allowed us to deal with exceptions, but not review the mundane stuff. Our time was precious, and that’s the case in any enterprise situation. You need to focus on the 10-20% of systems that really need regular attention while ensuring the other 80-90% of them are still being maintained.

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30