There is an incredible amount of data in the world, and all that data is changing the way industries work. That’s the opening to a keynote talk from Jim McHugh at the O’Reilly Artificial Intelligence conference. The talk is short, 12 minutes, and interesting to listen to as Mr. McHugh looks at autonomous cars and healthcare, talking about the impact of artificial intelligence on advancing these industries. There are examples showing how data and AI systems are already being used to change the way the transportation and medical fields can work.
Whether you want to see more robot help in our world or not, I suspect some level of this is coming, and it’s being driven by data. We have more and more data, and as companies have success in analyzing this data with various types of AI and machine learning systems, there is pressure for other companies to join the trend and build their own systems. We certainly see that with the push from Microsoft that emphasizes the R Services in SQL Server. At the recent Data Science Summit, there was a demo in the keynote (around 17:00) of over 1 million classification queries per second running inside SQL Server. You can even try this yourself on SQL Server 2016 Developer Edition (for free).
I’m sure that a few of you will start to get more complex analysis projects inside of your organization. Maybe you’ll help develop some sort of prototype, or maybe you’ll just be responsible for helping get the data to the data scientists. I’m also sure that some of you won’t be thrilled with the results. After all, throwing a bunch of data at a few algorithms and expecting some rapid development isn’t likely to work great.
At least not the first time.
One of the thing I’ve seen from many people as I study data science, machine learning, and related topics is that this isn’t a simple process. Building a useful and successful machine learning system requires experimentation, and really, ongoing experimentation, as you examine, clean, discard, and make decisions on your data. In fact, the data preparation might be the most difficult and time consuming part of the process. That’s great, since many of us are the people that will work with the data, but it’s bad in that our management might not want to have the patience to experiment, evaluate, and re-tune their systems, much less wait for data to be well prepared.
I do have high hopes for many complex problems to be assisted with machine learning and artificial intelligence in the future. I’m glad that companies are experimenting, and I think it’s great that so many data professionals are getting excited by the possibilities. Remember that this field is hard, and requires lots of work. Keep learning and growing your skills, and above all, remember that the scoring against your data is more likely to be closer to a baseball game than a bowling match. A 30% success rate might be amazing and those perfect games are likely very close to impossible.