Not many of us work in startup environments, but many of us do work with new databases that are created for new applications. These might be carefully designed, thrown together, or your database might be constructed by an ORM. In any case, I find many people make decisions and write database code for today, solving the problems that they see in front of them. they often do this with little data and a single system. They might have their eyes on using some new technology, and they decide on a data strategy without really considering what they will need later.
That seems to be what happened with Expensify, which started in the financial industry. Their system had requirements for low response times, multiple locations, and detailed logging. This required a robust database architecture, which turned out to be helpful when the company pivoted to a new business model. Their CTO talks about some of the problems he sees with startups making database decisions. I think many of these lessons are helpful for all organizations that are trying to ensure their database can grow and meet their needs.
I think one of the most common problems I see is that developers and leaders get enamored with new technology. There is a lot of promise in some of the platforms and designs that are being put forth. Some are even proving themselves in high-profile situations, but not all. For most of us, however, we aren’t going to be solving the same problems in the same way. As the article notes, we’re not Google, but we’re also no Uber, Facebook, or Spotify. Choosing to mimic their choices because of their success doesn’t necessarily map to our business model. I find no shortage of companies that struggle to adopt some new platform because they built a proof-of-concept and assumed the way the system works with small amounts of data. This becomes an issue later with the moderate or large amounts that they have over time.
I also see companies creating complexity, with the chance that they will need to deal with many petabytes or exabytes of data at some point. Face it, most of us will barely deal with terabytes of data in any particular system. We ought to plan for a high-performing system at that scale, not worry about a future that will not likely come. At the same time, we aren’t going to be dealing with megabytes of data, so if your developers only test on MBs, they are going to miss problems.
I like the advice to go into your decisions with your eyes wide open. Don’t copy others, and realistically think about what you will need. I believe that engaging a data professional early is helpful. Developers do some amazing things when they build software, but so often the majority of them don’t really think about the challenges of a database system. They don’t consider low response times or ensuring there are HA and DR (two separate things) strategies. They also forget about the challenges of aggregation and reporting lots of data. Most humans work with a few rows of data at a time, which is what developers do on their systems. When you need to aggregate things, or all your customers are generating a workload, that’s when a data professional can help ensure you’ve properly indexed entities and planned for a demanding workload.
I do like the common sense advice that most startups won’t outgrow a relatively modest single database server. Many applications might not as well, but that doesn’t mean you can put all your eggs in that one server basket. Make sure if it dies that you have multiple people that can recover it and ensure your system is quickly running in another place. There are different ways to handle this, but engage someone that knows your platform and have them ensure you have some staff, operations or developers, that understand how HA and DR work in your environment.
Lastly, be secure. I really like the idea of always using stored procedures. I know this becomes a pain for developers, who now write code in two places, but this really helps you ensure better security, and maybe more importantly, ensures you can tune one part of your code regularly, the database side, without impacting the other side.