I’ve been in a data center when most servers turned off. I’ve actually heard dozens of systems powered off quickly, and it’s a strange sound. You become so used to the white noise of numerous fans that having them turned off is a little unnerving. It’s neat when it’s a scheduled patch day and all servers cleanly shut down together. It’s an altogether different experience when there’s an unexpected issue and management sees their expensive hardware not working.
However, imagine losing your servers because of a loud noise. That’s what happened to ING Bank when a fire extinguishing test caused a number of hard drives to fail. To be fair, the loud noise was north of 130db, which is very loud. Since sound is really vibration, the impact to read/write heads caused numerous failures. The bank needed 10 hours to restart systems in their DR center, and managed to do so. While that might not have been what the bank officials wanted, this is a good DR test, and I hope they learned a few things that might help to fail over much quicker in the future.
This might be a good reason to think about SSDs, which are less susceptible to vibration than the older, spinning rust drives. I’d guess that there are other issues that could affect SSDs and someone is going to discover them at an inopportune time. Already we’ve seen dramatic improvement in SSD technology, driven by numerous early issues relate to writes and reliability.
Engineering facilities is hard, and there can be many unexpected issues. I’m sure the people that designed the fire suppression system weren’t concerned about the noise; they were concerned about shutting down flames quickly. I’m sure that the people filling the system didn’t think a little extra pressure would matter. These seemingly innocuous decisions can cascade, which is why we practice and preach DR preparation. Not just backups, but restores and quick fail over.
If it’s not your organization, it might be humorous rather than stressful, but you never know what design flaws might lurk inside your facilities. I once worked in a data center that had only about half the cooling that we expected. Why? The engineers assumed that since we worked an 8 hour day, so did the computers, which we’d turn off at night. Luckily they had built a pad into their calculations so we were only short half the capacity rather than two thirds.