Talk about a lot of pressure to get a software deployment correct. A software install on an Airbus airplane resulted in a file containing parameters being wiped. This error caused (apparently) an airplane to crash when three engines cut off in flight and four people were killed.
I don’t know if this is the final report on this, but the fact this is a possibility concerns me as a technologist. Certainly we have probably had similar mechanical failures and installation issues in the past, but there are some scary issues here with regards to software. There was a faulty software installation (yikes!), a poor architecture (not assuming more then 2 engines would stop), too much tolerance for software errors (the review letting this pass), and poor overall design (no alerts on the ground).
I can’t decide if I think that software makes issues like this more or less likely. Certainly checks of physical systems are skipped regularly by people. It’s far easier, and more reliable, to automate checks of software systems, especially with deployments, than it might be for complex mechanical changes. However, maybe that’s not true. Perhaps mechanics are more likely to notice a loose bolt than a misconfigured software menu. Or maybe we need a new type of mechanic that’s savvy with technology.
Ultimately I think that any software that makes changes to systems, including through deployments, needs to have double checks by an independent process and clear alerting of any issue, not relying on someone to look for the success of a long series of steps. We also need to take review of potential software errors very seriously and ensure the tolerance for potential issues shrinks as the impact of those issues rises.