Where’s the schema?

Across the last few years, I’ve read many articles and listened to quite a few talks that discuss the advantages of NoSQL databases. I’ll admit that I’m often skeptical of the advantages of other datastores overcoming the disadvantages with a relational system, but I try to keep an open mind. I do appreciate that there are some benefits to using another data store in certain situations.

One of the talks I heard recently discussed the fact that in many of these stores, we can add data in a “schemaless” fashion, and it’s stored in a flexible format that allows the developer to quickly capture the data they are using and retrieve it without requiring up front design work to build a particular format.

That had me pondering the question of whether or not here really are schemaless data structures. If a developer (or whatever SDK or framework they use) looks to persist come data, clearly there is a format of sorts, which means there is a schema. That schema might not be transferred or persisted in the data store, but there is some schema they expect, both on storage and retrieval. Whether this is a JSON, XML, some proprietary structure, or something else, there’s an known structure that the developer uses to work with the data.

Is there really schema-less data? I tend to think no. All of the data we have contains some schema. That schema might vary from row to row, which is often what developers like when building applications. There is, however, a structure. The developer knows it, and must serialize and deserialize the data, or depend on some library like ADO.NET to do so. This often appears to a developer to be a lower barrier to entry. There’s less complexity and often no need to map the objecct-like structure of properties to some relational schema and make decisions on sizes.

That’s not completely true, as the schema of the data still exists and must be persisted in the application. There is code that must handle the various values stored in some hierarchical fashion. If this changes over time, as values are added, the the application must deal with the missing values in older properties or arrays. If items are removed in the application, then would older sets of data just disappear? Perhaps, but the developer must make a decision, which may have implications for users of their application. This doesn’t even deal with the issues of aggregation and reporting, which might force other systems to implement the same schemas and business logic. Those rules and specifications don’t easily transfer from one application to another, especially when different teams or developers are involved.

There’s always a schema, and the rules have to be implemented up front, or later on. Whether you use a RDBMS or a NoSQL store, you are going to be dealing with a schema. The question is do you want to deal with it in a central location or in every application? I lean towards the former, but you might prefer the latter. Neither is wrong, but you should be sure you understand all the advantages and disadvantages of your choice.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 4.1MB) podcast or subscribe to the feed at iTunes and Libsyn.

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31