I wrote an editorial a few years ago about a Zettabyte (ZB) of data being created in a year in the world. This was 2012, and those seemed like crazy predictions. At the time I was doing some work with full text search, Filestream and Filetable, and wanted to do a presentation on those topics. I did some research and found some neat facts, which I incorporated into the presentation. At that time, there was an estimate that we’d have 50x that data by 2020. I also wrote I was carrying 3.5GB in my pocket.
Those values seem crazy now. Crazy small, that is. At a recent event, I was carrying 1.6TB in my pockets. My pockets, not my laptop bag. With 64GB in my phone, an mSata 512GB drive and a 1TB mSata drive. I was carrying a smaller physical amount of electronics than was in my first 10MB hard drive on my person, both in size and mass. What’s funny is that there was only another 1.5TB in my bag, between my laptop and another external 2.5″ drive.
Between the growth of storage and the Internet of Things (IoT), data growth estimates are rising. At the recent SQL Nexus conference, part of the keynote was given by Dr. Troels Peterson, a physicist with the Neils Bohr Institute, working with the Large HADRON Collider. The work there means big data has a new meaning. Dr. Peterson noted that during his work with the Atlas detector, they can generate 1 PB/s of data.
That makes the few TB I carry around seem puny. There were two other really interesting items from the keynote. One is that computers cannot process that level of data so hardware sensors must make decisions on what data to capture and store for analysis. The other item is that much of the data will not really be analyzed, and can’t easily be analyzed directly. Instead, algorithms and patterns are used to determine which data is good and should be used. They know there are errors in the data, so the trick is to actually use computers to find the good data.
The world is a little different for those of us that deal with customers and orders instead or particles and uncertainties, but we are still seeing lots of data growth. Our challenge is to find ways to better manage this growth, while still making data available and useful for our users.