In a scalability discussion, I saw this comment: Companies like Google or FaceBook manage a lot of data, but it’s not held the same degree of scrutiny. For example, if FaceBook dropped 1 out of 1,000 random guestbook posts, would anyone notice? At the end of the day would they even care enough for it to make national headline news?
How many companies would accept a random 1 out of 1,000 dropped data entry row? Or an update that didn’t take? Most management in companies I’ve worked for wouldn’t even want to think about accepting that level data loss.
Ultimately I think this points out the difference between some of the non-RDBMS platforms that can accept some data loss. Even Google, as amazing as their results are and with lots of redundancy, aren’t held to some large standard of data integrity. If two of us search for the same term at the same time and get different results, is that an issue? Or to put it another way, if the CFO and CEO both run reports at the same time, can they differ in their results?
For most of us, the answer is that the results cannot differ. While I think most of the NoSQL and other non-RDBMS architectures have a lot of effort put into ensuring that data gets hardened on a node when it is updated, there can be a lack of consistency between nodes. A node could lag behind others or even fail before synchronization with other nodes. That is a concern in any system that looks to scale out to a large number of servers, and an even larger concern for data whose integrity is critical.
An amazing level of thought has gone into SQL Server to provide extremely high levels of data integrity. Every time I think I’ve found a problem or hole in the product, it seems someone at SQLskills explains the reason behind the architecture. The answer usually makes perfect sense to me and has me wondering what else I will learn in one of their Immersion training weeks. Hopefully I’ll get to one soon.
There are definitely places where you might accept dropped rows. Information published on intranets, an application recording vacation requests, and any other number of small non-critical systems. SQL Server is not a good fit for all database applications, but for those that use it, you can be sure that none of your rows will be dropped.