It’s the second Tuesday of the month and time for T-SQL Tuesday again. This month I’m grateful that Kerry Tyler is hosting. I had the pleasure of sitting with Kerry and his wife at a SQL Saturday last year before the world locked down and presented travel. At the time I was looking for hosts and pressured a few people into agreeing to host a month.
This month Kerry wrote an interesting invite based on his passion with airplanes and flying. Mistakes and errors with planes can be catastrophic, so they are always revisiting issues and learning from others. He asks us to do the same this month.
Root Cause Analysis
One of the things that we talk about in technology is analyzing incidents and failures to determine a root cause. Despite the talk, relatively few places I’ve worked in actually perform a root cause analysis and produce actions that are taken into account in future protocols, processes, and documentation. Too few people perform a blameless postmortem, which is what we need more of.
And too few companies share their learnings publicly.
I can’t change that, but I can think forward.
I talk a lot about DevOps and work with companies to transform what they do. This is part of my job, but it’s also an area I have passion in because I’ve worked in organizations that were following the tenets of DevOps before the term existed. A couple of areas where I learned from others to build better systems for the future.
Networking is something that relatively few of us deal with as technical professionals. Since DHCP became a very solid platform and the amazing growth of wireless, we don’t need to dig into networking most of the time. However, understanding how DNS works and the networking protocols is still relevant.
I was fortunate to work with a talented network engineer at one company. We were having issues with a client being able to connect to our database server, and he managed to show us how the DNS records were different between our office and theirs, when working across a VPN. Since then, I’ve always learned to check simple things, with tools like nslookup, to verify a client can see a server.
As a data professional, I know backups are important. I use BackBlaze at home, but at work I often deal with a variety of products, even with SQL Server. While the backup process has been very consistent and stable in SQL Server, we’ve had different options emerge, as well as third party products that sometimes deal with backups in a different way. I’ve often insisted on verifying that we can restore databases to a point in time, precisely because this is what will be required during some DR incident.
Over the years, many people have helped me understand how their product or system worked, too many to name, but I’m grateful for those that have helped explain to me exactly how a particular backup or restore process works and helped me validate the process to recover data.
For the most part, I’ve just been the one pushing to ensure documentation was captured and available for others to use in the future.
I do try to give back and teach others as well. I’ve been writing about Version Control Systems, including some basics I’ve learned over the years. Sometimes by reading a tutorial or blog, sometimes by making mistakes and trying to work through documentation. My learnings from others inform most of the content I produce to teach you, so it’s not that I’m doing anything innovative; I’m translating the lessons from others and myself for you.
We all learn from others. We stand on the shoulders of giants and of ordinary people. Remember that, continue to learn, and help others where you can.
Pingback: T-SQL Tuesday #128: Let’s Talk About Your Incident Reports | Airborne Geek