At one point in my career, I worked for a wealth management firm. We managed funds for various customers, a part of which was making trades in the financial markets. Various brokers and companies used our platform to run their business, and performance was always an issue. At the time we were an NT 4.0/SQL Server 6.5 shop, though we moved to Windows 2000 and SQL Server 2000 while I was there.
While we were looking at upgrades, a number of potential customers asked why we weren’t using AIX or Solaris or Linux. Management would come to a few of us technical leads to ask, and we usually had to provide some justification. Our success was hit and miss, though we did run into a few companies that were doing real time trading on the Windows platform, and a few of us had the chance to talk to them about how they managed their systems in an age when Windows hosts often needed patching, a requirement of which was rebooting. If you’re interested, the company actually had a fairly server oriented architecture built on top of Windows, essentially managing work by just connecting to whichever boxes were running.
Things have changed. These days Microsoft has been working hard to build a better Windows OS to power Azure and they’ve done some amazing work. I saw a post on one of the things they’re doing, which is finding ways to patch the underlying OS without disturbing applications, including VMs that are running on the host OS.
The work is amazing to me, and this is where we should have been going with all operating systems. We ought to be able to patch these without downtime, and certainly without disturbing guests or programs that might be running. As this rolls out to Windows hosts in our data centers, I could imagine an era when we have monthly patches for Windows that never cause downtime for SQL Server.
Well, I guess there is some downtime. Applications such as the hypervisor are paused, which some of us might consider downtime. However, if the time frame were in the single digits of seconds, I’m not sure many clients would this as downtime. It’s along the lines of a network hiccup or a momentarily busy server. In some sense, this would be a great move forward for HA.
On the other hand, this would raise expectation. Clients and customers would be less tolerant of downtime, which is something we can’t control with Windows and SQL Server patches. However, it is something we can control with our deployments. A system that is almost never down because of patches would put pressure on us to ensure that our enhancements to the database itself also didn’t cause downtime.
While there’s no magic in how we make changes to objects in SQL Server, there are techniques that can make changes in stages, perform additive work, and use automation in a DevOps style workflow to limit downtime and lower the risk of making changes. To me, this might be a bigger change than cloud systems. With that in mind, I’d urge many of you to learn techniques to avoid interrupting applications when you deploy changes. There are ways to do this, but it takes some effort and practice to build better skills and habits.