Baseball is an interesting game from a data perspective, with lots of numbers being tracked, and lots being generated every year. I used to have a sample database for demos that I used, since it was fun to run various numerical queries on the data. I should set that back up.
Recently I saw that an analysis site, FanGraphs, was adopting MariaBD, but in a cloud version of the database. They gather a lot of statistics, more than most places publish. Not only do they have the various aggregates from games, but they have tracking for things like the velocity of pitches thrown. Add that to odds, projections, and more, and this is a lot of data.
For a fan, that’s a lot. For database people, maybe not so much. They are projecting a million records each season for pitches, which might be the largest data set. However, for a database, even with 100 years of baseball, 100mm rows isn’t that large. There can be, however, lots of queries on this data.
The founder used to manage the database himself, and has been on MariaDB for a long time. He started on Windows, but has been looking to outsource some of the administration. They left dedicated servers to move to a vendor running on the Google Cloud Platform (GCP). Now they are looking at data warehousing and other options to continue with deeper data analysis options.
The move to the cloud, removing some of the headaches and hassles of managing servers is something many executives think about. Certainly it isn’t cheaper than buying your own machines, but with the cost of people, benefits, and the inflexibility of being limited by past decisions, I get why companies do this. Especially those that aren’t so focused on the technology, but are more interested in what technology can enable them to do.
Many of us working might find opportunities to work with data at a company like this, without the need to actually manage the systems. I can imagine for a data analyst or developer that enjoys baseball, this might be a fun type of challenge. Learn to apply technology at a company that doesn’t really care about the software itself, but wants to use it to build something new and exciting for their customers.