All the Costs of Downtime

I studied economics in university, which isn’t that close to database work, though I did have to work through linear regression problems by hand. I always enjoyed mathematics, so this wasn’t a hardship. Until I purchased a PC that was capable of letting me do graphs and calculations in PASCAL and BASIC. Then I realized that my enjoyment wasn’t that efficient or useful, and a computer could help me get things done way more efficiently.

Many of us work on systems that process tremendous amounts of data, something our organizations couldn’t complete without computer hardware, efficiently or not. We just wouldn’t be able to get the work done by hand. That’s the main reason why downtime is such a problem in the modern world; we can’t fall back to manual systems in many cases.

I ran across an article that discusses some of the large-scale failures in recent history (Heathrow, Delta, NYSE, Royal Bank of Scotland) due to computer system failure. Certainly, there are large financial costs and lost revenue for organizations that suffer these outages. However, there are other costs that are borne by the staffers, which don’t often make the news.

When it’s “all hands on deck” to solve a problem, other work isn’t being progressed. There is certainly the interruption of Operations people, but often developers get asked questions or pulled into meetings to provide input. That can take them away from their existing work. Apart from the “23 minutes to get their head back in the game,” as noted in the article, can they even focus anymore? Will they be thinking through all the possible causes, and did they actually provide the right information or all the details needed?

During a crisis, or even after, it is very hard for humans to focus on anything else. Apart from the technical details, IT staffers can have a range of emotions and thoughts. They might have sympathy for customers affected. They might worry they’re at fault and might be blamed (or terminated). They might be thinking about how they should have coded or configured something differently? Should they have tested more or accounted for issues? They might have simple anger at others who didn’t do their job, or frustration at the failure of a piece of hardware.

Perhaps even more concerning is the load management can place on employees to get things fixed. If people work long hours, how do we ease them back into the flow of all the other daily work? I know I’ve struggled to get people to rotate work with rest as a manager. As an employee, I struggle to even sleep if I am sent home while others are still working. I’ve had to work 100+hour weeks and very quickly we get into survival mode, not productive mode.

There are lots of costs to downtime apart from the financial impact. If you can’t maintain a stable environment that limits the time employees spend firefighting, you likely aren’t going to survive as an organization. Startups sometimes can do this, but often it’s from a few extremely dedicated employees who make a difference at a smaller scale. And these employees often pay the price in their personal lives with health, relationship, or other issues.

The article goes on to look at predictive analytics that might help us reduce some of the issues from hardware issues. I think this is likely true, as we’ve seen digital twins that simulate loads on equipment help proactively catch issues.

What do we do with software? If we don’t write well architected software that handles the load, how do we write an analytical system that can predict failures? This seems like a level of static and dynamic code analysis that we aren’t mature enough to build.

Heck, even if we could, how hard is for many of you to get queries tuned in a running system? I find too often there isn’t enough effort or enthusiasm from developers, management and others to follow solid tuning advice and change your SQL. Maybe that’s too limited a view.

Perhaps the AI analysts of the future will become the consultants of the past, whose recommendations often mimic the words of the current staff, but somehow carry more weight. Maybe they’ll get more things done and changed to help us build more robust systems.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | Comments Off on All the Costs of Downtime

Monday Monitor Tips: The Jobs Report

A customer wanted a report they could email to their boss about jobs, something that showed failures. This isn’t hard to get in Redgate Monitor, though it is manual (for now). Here’s how to do this.

This is part of a series of posts on Redgate Monitor. Click to see the other posts.

The Estate View of Jobs

Under the Estate menu item, there is an entry for jobs.

2025-09_0336

Click this and you get to the estate view of jobs. Here I see an overview, and I can click the scale in the upper right for day/week/month.

2025-09_0337

Below this I see the failed and successful job details, though only for the most recent execution. Otherwise this is a ton of data. Good to note that the 10 jobs that failed must have succeeded on a subsequent run.

2025-09_0338

There is a button to see the details. Click Export:

2025-09_0339

This downloads an Excel sheet of my jobs. There is a summary tab of the status with some details by day.

2025-09_0340

In the job details, I see the last run status for jobs. In this case, I don’t have any failed jobs (last run). However, I downloaded this earlier in the day, when I did have a job that failed in its more recent execution. Here’s what I see in the details if I scroll across. Look at the top line, which is the failure.

2025-09_0341

I can see the reason the job failed and the date/time.

If I were grabbing this every day, I’d have a nice report of how things are going. I would like to see a more detailed version of a report, and if you want it, please email sales@red-gate.com and request it.

Summary

Getting reports to share with others is important. As much as I’d just like to get people to look at the tool, I do understand having periodic reports. I used to self-generate these and store them off in a secure place. These were invaluable for auditors and kept them from asking me lots of questions.

You could do this as well, and Redgate Monitor makes it easy to get this info. Look for enhancements in this area in the future that might help you with estate level reports for your management (or auditors).

Redgate Monitor is a world class monitoring solution for your database estate. Download a trial today and see how it can help you manage your estate more efficiently.

Posted in Blog | Tagged , , | Comments Off on Monday Monitor Tips: The Jobs Report

Remembering Phil Factor

One of the most prolific and popular authors at Simple Talk has been Phil Factor. He wrote many pieces on all aspects of database work and has probably written more articles on the Redgate Product Learning site than anyone else. He has entertained, informed, and inspired many database professionals in his many years as an author.

Phil, aka Andrew, passed away recently. This was a shock to many of us and a sad day.

Tony Davis introduced me to Phil, whom I always thought of as Andrew, many years ago when I first traveled to Redgate. Tony published a tribute to Andrew on Simple Talk and has many more fond memories of Andrew. If you ever get the chance to meet Tony, ask him for a few.

Over the years, I’ve had the chance to get to know Andrew better. He, Tony, and I would often go out for lunch when I was in Cambridge. He came to PASS a few times, and he and I had many discussions about technology and ranch life over the years. Andrew lived on a plot of land similar to mine. We both tended to build, fix, and repair things ourselves, and we often discussed our latest projects.

He also had a love of bluegrass music and wanted to come to Colorado for the Telluride festival. I’m not sure that he ever made it, though I somewhat regret not being more enthusiastic in encouraging him and offering to go with him. That isn’t my style of music, but does it matter?

As I get older, I appreciate the time I get to chat with friends and family. I cherish the opportunities to spend time with others, however long or short. These are the important things in life: the events and conversations. It’s a sad time as Andrew and a few others I’ve known have passed away in a short period of time, but I hold many happy thoughts of the times we’ve spent together.

I hope you remember to appreciate the opportunities you have to spend time with others. And in memory of Andrew, flip through his articles and pick one to read today. There are lots of great ones, and some fun ones, like the SQL Limerick.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | Comments Off on Remembering Phil Factor

The Book of Redgate: We Value Teams

This value is something that I still hear today: our best work is done in teams. On the facing page, there is a short description of what this means.

2025-08_0103

I do think that teams are very helpful, especially when building software. Our products aren’t simple inside, and they have a lot of pieces and parts assembled to try and keep tasks simple for you.

Working in teams is good, and remembering that the hierarchy matters is good. This has helped us in the past, though I find myself reminding people that the company matters more than the team. At times, teams seem to think they shouldn’t be disturbed or altered. I’ve also see people resistant to working as individuals at times. The team is important, but we can work separately to get things done.

I find many companies not stressing teams, and individuals not wanting to work with others. I get that sometimes it can be comfortable to work on our own, at our pace, the way we want, but I also know that teams allow us to get more done than we would individually.

I have a copy of the Book of Redgate from 2010. This was a book we produced internally about the company after 10 years in existence. At that time, I’d been there for about 3 years, and it was interesting to learn a some things about the company. This series of posts looks back at the Book of Redgate 15 years later.

Posted in Blog | Tagged , , | 2 Comments