Defending the RDBMS

A few weeks ago I ran across an essay from Randolph West called, Relational Databases Aren’t the Problem. This was a response to another essay that made a case for relational databases being bad for many businesses. I thought that both pieces were interesting for different reasons. Certainly I don’t believe the the RDBMS is perfect, and it certainly can be hard for developers to build software that interfaces with a relational system.

The original complaint about the RDBMS is somewhat rambling and deceitful, in my opinion. It is an excellent study of how to use a few concepts to confuse and create doubt in a casual reader. If I weren’t reading closely, I might fall for a number of the issues that exist with relational databases. However, in my mind, part of the issue is that quite a few of the issues that are discussed aren’t problems with relational databases, but often the issue with poorly developed software or design of the entities and relationships. I find myself even more disappointed that the author hasn’t really addressed any comments, but rather just pasted a link to his followup article.

I do think that the defense from Mr. West does a good job, though it also misses some of the primary issues we struggle with relational databases. There are problems with the knowledge of how to build a well performing database, both from application developers that view this as a necessary evil as well as experienced database developers that don’t regularly improve their skills and try new design techniques.

I also think that both of the pieces fail to address the issues of gathering and working with multiple rows of data. The second discussion of “doing without databases” really implements its own database management structure, which may work well, but is fraught with issues such as the concurrency issues of multiple users searching and scanning through data without having indexes. While indexes are overhead, they are necessary as hash buckets aren’t necessarily feasible for all the properties in a class. Also, if you end up building them for multiple properties, you’re building an index. There’s another good defense of some of the issues here.

I do think that keeping more data in memory and synchronizing access to structures sounds great, but scaling that out to multiple systems, and ensuring consistency at high volumes, not to mention potential loss of data issues from crashes are a problem. Having a write ahead log in SQL Server does a wonderful job of ensuring we can handle redo/undo on system restart. The method presented doesn’t necessarily ensure this, though perhaps accepting some data loss from high concurrency changes is OK for many applications.

I will say that the idea of all data in memory is interesting. I had to stop and think about how many databases really have more than 1TB of data. If we throw out indexes, does this cover most data stores? I bet this does, though that doesn’t mean that there aren’t issues with using in memory array structures, with widely varying data sizes.

Would I use an in-memory data structure for software? It’s tempting, but honestly, I wouldn’t. The value of data is too high, with potential issues from poorly implemented ACID control structures. Plenty of issues have been found with different RDBMSs over their years, and even some in NoSQL systems. Thinking that I could avoid any issues and protect data is something I wouldn’t even try. After all, if there is some error, I’d prefer it from a system that many people use, rather than one I tried to emulate for no good reason.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 6.2MB) podcast or subscribe to the feed at iTunes and Libsyn.

About way0utwest

Editor, SQLServerCentral
This entry was posted in Editorial and tagged . Bookmark the permalink.

4 Responses to Defending the RDBMS

  1. I agree, there are too many people throwing out relational db’s without thinking properly about the consequences. to be the major issues with relational db’s is the scaling issue – and to me the answer to that problem is to make them scale properly over multiple servers rather than throwing out the baby with the bath water. nudb was the first db i saw to do this properly, and now aws is throwing its self in the ring with Aurora Serverless – not only do you not have the manage the database server yourself, you don’t have to sale it out yourself, and you don’t pay for uptime only for usage! I think this is really awesome, and am about to migrate a database for a system I’m working on to this from SQL Server. Now if only there syntax wasn’t mysql but transact sql then it would be even easier to migrate. To me this is what the current vendors need to do to make people relook at relational databases for structured data.

  2. sorry thats “scale it out yourself” not “sale it out yourself” 🙂

    • way0utwest says:

      The scaling issue is definitely a problem, though I think it’s a fairly small problem in most systems. Often we have poorly written code and poorly constructed indexes, so our hardware can’t handle the load it would otherwise if we had better built the system.

      To be sure there are features being added to try and improve scaling, but they do have a ways to go. The complexity is still too high.

  3. JRStern says:

    Note that even relational databases like SQL Server can implement some or all tables in memory, and for large data some kind of indexing is always needed to have performance, and generally hash does not provide the index-sequential groupings that indexes do, and indexes optimized for memory structures are pretty nearly as fast as hashes anyway.

    The only defense that relational databases really need is in regard to scaling. SQL Server starts to break down above a couple of terabytes or above about 100m rows/table because the optimizer becomes confused about granularity, and cached plans are less likely to be universally efficient. However even issues like this are not with the basic *concept* of relational databases and new (and old!) software and hardware approaches are available to address these issues.

    The limiting factor is that The World is complex, and as applications become more ambitious, perhaps the relational model lacks semantics that scale. But then, I don’t know of anything better!

Comments are closed.