Getting Big Data

Many of us work with data every day. Often we do so at work, but plenty of us also look at data outside of our jobs. The recent Power BI Report contest shows that our colleagues look at all sorts of different kinds of data. What is more interesting to me is that there is a huge variety of data that people choose to analyze, from sports to goverment to healthcare to airlines.

I encourage you to get a set of data and work with it. I know that I’ve often just created my own sets when building demos, but there are so many interesting sets out there that I would think many of you could find one that might be of interest to you in building experiments. After all, knowing something about the data is important in deciding how to analyze or visualize the information contained within.

One of the things that I’ve had people ask about is where can we find data. How can you get a set of data that you might be interested in. What’s amazing is that there are all sorts of sources out there, many of them free. I ran across a list of data sets at Forbes that contains quite a bit that many of you might want to play with. I know I’ve used a few of these in the past and am currently downloading various machine learning sets from UCI with which to experiment.

Certainly some of these data sets cost money if you want specific information, which makes sense. It takes resources to compile data, and a nominal charge makes sense. However most of these sources will let you download large sets to play with if you choose. You might need some PowerShell, Python, or other scripting skill to get lots of files, but that’s a good excuse to learn another skill.

Be aware that many of these sets are CSV files, so you’ll need to work on your ETL skills as well to load these into a SQL Server database for fun. Yet another excuse to learn. And if not, you could just take them as is for your very own Power BI dashboard. And to help you along, and provide a resource, I’ve started my own list of sets at SQLServerCentral.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.9MB) podcast or subscribe to the feed at iTunes and LibSyn.

About way0utwest

Editor, SQLServerCentral
This entry was posted in Editorial and tagged . Bookmark the permalink.