One of the talks I give on SQL Server deals with unstructured data. I start out this talk looking at the scales of data we deal with and was been amazed by the research I did about how much data we humans have created. What’s even more interesting is that the growth is outracing predictions made just a few years ago.
When I started working with computers, we talked about kb of data, thousands of characters. That’s an amount of data we humans can easily comprehend. In fact, we used to talk about floppy disks and the number of average sized books that could be stored in kb, or single digits of MBs. As humans, we can comprehend that scale. Most of us have seen hundreds or thousands of books in a library.
When we move to GB, things get harder, though at 4GB for a DVD, many of us can conceive what multiple GBs can mean. However terabytes? Can we conceive the scale of data? Sure. A TB is about 40 Blu-ray disks. While we might not appreciate how much data that is, we can picture it.
A PB? That’s 41,000 Blu-ray disks. I can’t even conceive of what that looks like, much less imagine the billion MB sized pictures. That’s a scale that has no reference. However as humans, we will create multiple exabytes of data this year. As individuals working with data, few of us will work with EB in our organizations, but some of us will. I read recently that Paypal processes 1.1PB of data regularly. Regularly processes, not just has in cold storage.
We have zettabytes and yottabytes, but who could possibly conceive of what those mean? There’s not frame of reference I can imagine, though that may change. I expect that we will become used to PB at some point, just like a TB is no big deal right now. In fact, I really think I’ll see a TB on my phone sometime before the end of this decade.