A Case for Document Storage

One of the challenges for both database developers and administrators is doing more, often with less. Many companies continue to grow their database estate, both in width with more platforms, and in depth with more instances of the platforms they have.  Some companies will look to shrink their staff, especially when adopting a cloud platform, while others may add more databases, but not increase staffing to match the additional load.

In either case, what many have found over the years is that the cost of labor is high. Both for developers that write code against databases, and administrators that manage those platforms. While licensing can seem to be a large number, compared to the cost of labor, it isn’t usually a significant number.

Often it seems administrators would prefer more of the same database platform. Developers often seem to ask for new types of database platforms, often some type of NoSQL data store. I ran across an article that makes a case for adding in document storage data stores to your environment, instead of just choosing am RDBMS. Labor is one of the big reasons for doing this. The other one is that for a given workload, the hardware cost is lower.

The article opens talking about the object/relational mapping problems. There is some truth to the time and effort to map an object in an application to a table (or set of tables) in an RDBMS. There is some knowledge required to do this, but I also think it’s an important skill for many developers. The same type of object mapping to a serialized JSON document is shown as being easier, and it is.

However, if you add or change your object, the application code to handle the document from the data store gets complex. Over time, you will have lots of “new” fields that don’t exist in older documents. How do you handle those? It’s not hard, but labor is required to write this code. And this code has to be maintained over time.

The other argument is that less hardware is needed, made by noting all the data you may need can be co-located with your object. This is what we would call denormalization in an RDBMS and leads to data duplication? Whether that is a problem or not depends on the amount of duplication. Certainly the structure of an application that often works to send or retrieve singleton rows is easier in a document database.

However, non trivial queries, which the author postulates are hard to write for developers, are likely hard to run for a document database. The load of querying across lots of rows, or updating them, is much higher in a document database. Depending on how often you update data, this can be an issue, and require more hardware.

Which is better? The classic “it depends” applies here. Database modeling is important in both cases. As I’ve worked with people that move to NoSQL databases, I find they struggle to model in that world as much as many of us struggle to model in the RDBMS world. I also find that a NoSQL database often is going to require some sort of data warehouse or other structure that is built for reporting across documents.

I’m not against the various types of NoSQL databases, but I also don’t think they are a panacea of any sort that magically makes building and operating an application easier.

Steve Jones

Listen to the podcast at Libsyn, Stitcher, Spotify, or iTunes.

About way0utwest

Editor, SQLServerCentral
This entry was posted in Editorial and tagged , . Bookmark the permalink.

4 Responses to A Case for Document Storage

  1. Like anything else, NoSQL DB’s are a tool and depending on the job you pick the right tool. I use to work at a software company with 11’s of developers both in the US and India and what I found to be teh common reason for a NoSQL DB wasn’t because it was the right tool but because as traditional programmers who thing procedurally, the relational structure of SQL is very difficult for them to learn. I’ve seen numerous times where they’d use a cursor because it works procedurally.

    When working there the vendor would hold an annual conference for our clients ( typically around a few thousand) and I did a session called Choosing The Right Tool where I reviewed the various custom reporting options ( Crystal Reports, etc.) and how no one tool was the right one for every type of report.

    Like

  2. way0utwest says:

    11s? Quite a staff there 🙂

    It is just a tool. If you want to run a warehouse and move data over for reporting, document stores work well. They are very fast, and if you primarily insert or update just one singleton, I think they can be a great choice.

    As you noted, you need to pick the right tool for the job.

    Like

  3. Maybe I’m the weird odd one out on this but I don’t see SQL as some difficult or verbose language as this article describes it to be. To me SQL is a very logical and easy to use language and I have learned and worked with both object oriented programming in C++ and procedural languages like VB. The author is right about SQL not being the right tool for documents but it wasn’t designed/created to store large objects like that but text, dates & numbers in a relational normalized way. Just as the OODB sounds like the better tool for the documents I doubt it would be the better tool for accounting/banking data.

    Like

    • way0utwest says:

      I think there are two things here. First, SQL as a language is both simple (few keywords) and complex in how you put things together. I think beyond simple 1-2 table joins and INSERT/UPDATE/DELETEs from one table, it is asking someone to learn a very different type of language. I do agree it’s not too hard, but I also get that it’s a place where lots of people make silly mistakes, especially with joins.

      Second, the OR mapping, going from C# properties to a list in SQL and then back is a time sync. We don’t make it easy. Like I wish that a compiler would look at what data is used and then alter queries to not SELECT *, but select a, b (and not c). Wishful thinking, but it’s there.

      An OODB can work well for banking or any data, when it’s a single row. When we are trying to work across rows, or more complex multi-row transactions, then it can be poor. The problem with OODBs and banks is that if we’re trying to prevent two withdrawals from the same account in different places, you need really good modeling, and some of the scale out/eventual consistency stuff won’t work.

      Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.