The Change Failure Rate

One of the measurements used in DevOps to determine if your team is improving is the change failure rate. This is the number of times that there is a problem with a deployment as a ratio of the total number of deployments. The idea here is to determine the reliability, and to some extent, the risk of deploying changes in your environment.

In the past, I’ve been very successful with deployments as a DBA or developer. Often I’ve completed database changes within the change window, with success that allowed our applications to run after changes. That doesn’t mean I pressed a button and the deployment worked. On a regular basis, my expertise with SQL was needed to fix a script or re-run a process, or make some other “development change” in the production environment to ensure the entire deployment completed. While still a minority of times, this wasn’t an uncommon experience.

Many people have had the same experience, as the State of DevOps report has shown for years. In both the application and database worlds, doing anything other than pressing a button or running a single script is a deployment failure if the steps aren’t clearly documented and run to completion without any alteration. Not a catastrophic failure, and certainly one that many of us can recover from, but still a failure from the standpoint of being ready to deploy and having a reliable process.

Quite a few of us have made a career of cleaning up other people’s messes in deployments, with our ability to get changes deployed and applications back up and running being a testament to our expertise and skill. That’s not a reliable process, especially when an organization is forced to depend on a Steve or a Brent or a Kendra to ensure smooth deployments. That’s a recipe for disaster, especially as all of us want to go on holiday, undisturbed by some software “emergency”.

I think this is one of the more telling metrics for a strong software development process and a reliable deployment process. If this number isn’t extremely high, in the 95-99% range, then our organizations are spending resources, especially time, on items that don’t add value. Instead, we ought to invest in moving to a DevOps style process that allows our expertise to be used solving new problems, not cleaning up the mess of the poor development practices of others.

Steve Jones

Listen to the podcast at Libsyn

About way0utwest

Editor, SQLServerCentral
This entry was posted in Editorial and tagged . Bookmark the permalink.