I ran into this quote on the Microsoft Learn site, which I thought was a great way to think about how to administer a system: “Without a baseline, every issue encountered could be considered normal and therefore not require any additional intervention.”
When I’ve had users file tickets or complain about things not working well, I’ve found more often than not their perception has changed more than the actual performance. I’ve been called for “slow applications” only to find out that “slow” was 30 seconds and the complainer wasn’t sure how long it used to take, but today being end of the quarter, it is slow. Digging into monitoring history has shown that the query always took at least 20s and could take over 30s. My main takeaway was a little stress for users sometimes culminates in unnecessary work for operations staff.
There certainly are times when a database query takes longer than expected but is it because the system is overloaded or there’s a lot more data? When was the last time this ran and what changed? Are there more queries against the same objects than in the past? Even when there are real problems, without knowing how a system typically looks at this time, we may struggle to quickly determine where the problem lies. We may not even know how to craft a good solution without some baseline.
Maybe the best reason for me to know a baseline is for triaging and prioritizing issues. Seeing a server at 100% CPU is one thing, but if this is a daily occurrence, I might decide another issue is more important. Especially at 2 am.
Having a baseline for your systems is important. Build a system if you must, buy one if you can, but get monitoring set up for your systems. It will help you focus development efforts when changed code doesn’t work as expected. It also helps your operations staff to help them respond more efficiently to future issues.