Is there a “best” database engine? Is there one database system that you can use for all of your needs? The question is addressed in a long, but captivating, piece at O’Reilly. The author talks about finding an HTAP database, which stands for Hybrid Transaction/Analytical Processing. In short, the ultimate database that can handle your OLTP, ODS, BI and analytical needs, while also addressing all the other types of features most of us would want.
The piece is worth a read in that it examines the requirements and features needed to achieve a high performing database system. This system doesn’t look at RDBMS over NoSQL or NewSQL systems, and does a fair job of evaluating why trade-offs are needed by many of the types of databases that have been used by companies in an effort to find a better platform. It’s almost a bit of a history of database platforms.
What I like about this piece is that they look at the differences between startup companies and enterprises. The former is bound by legacy constraints, but more importantly, their failures are just inherently less impactful. For enterprises, however, there is a certain inertia, not to mention existing skill sets that make adoption of new technologies more difficult. There’s also that pesky transaction requirement. Some domains might not need these, but in some situations transactions are really important, and as the article states, “Transaction support was useful in certain cases in spite of its overhead.”
Throughout the piece, the author seems to move back and forth in examining the advantages of moving to one platform over the other. We have certainly spent time trying to build large monolithic databases, then lots of effort to move data around to other systems. Who remembers the ODS/large data warehouse trend, which then fractured into further movement to “data marts” before it seems many people have moved back to a large single DW against which analysts can run queries? Movement of data, gluing complex ETL (or ELT) pipelines together requires a lot of resources, and often these can be somewhat brittle, prone to regular, albeit small, failures.
The nirvana is a single database platform, but I’m not sure that’s possible. All of the requirements we place on databases require some trade-offs. I don’t think anyone has found a way around the CAP Theorem, which will inherently limit the ways in which we can meet many requirements. However, I think we can “fake” the success of our platforms at times, especially since our end users don’t care how we meet their needs. Just that we meet them.
The Azure platform seems to be moving in the right direction, in my opinion. There is a separation of storage from compute, which in database terms means a separation of data storage (the storage engine) from the query engine (where compute resources are needed). Certainly this isn’t complete or clean, but already in Azure SQL Data Warehouse, I can scale up or down my compute needs separately from the storage for my data. The Data Lake allows me to store disparate types of data, without specifying a structure, and query the bits from multiple applications. Even the ability to instantly “copy” a database in Azure SQL Database could be used to “fake” the instant replication of data to another system.
Granted, there are issues with clients constantly querying databases and maintaining up-time (or switching clients to a new copy), with reconciling changing data across many clients, and certainly the management of all this data. However, as we search for ways to build the ultimate OLTP/ODS/BI/Machine Learning database platform, I’d like to remind you we don’t need to build a single engine. We just need to make our clients to think we did by providing a data source that appears to handle all their needs.