Machine Learning in the Database

When SQL Server added the ability to execute R code, the decision seemed to split the customer base into two groups. One group was impressed and thought the idea of executing R code to analyze data in the database was a good idea. They were excited and impressed by the loan classification demo. If you haven’t read about this or seen the demo, it’s very interesting, and it’s something you might take a few minutes to read or watch it.

The other group of customers felt this was a poor use of CPU cycles for a very expensive SQL Server CPU license. Running a complex analysis, training models, and other functions commonly associated with R scripts aren’t a good use of scarce resources. They would rather have R code execute on a separate server, much like any large messaging workload might be better served by a service such as AWS’ Simple Queue Service rather than Service Broker.

I tend to be in the first group, as is Dr. Low. He writes that there is a place where Machine Learning Services (MLS), with both R and Python, are a good use of resources. Not in all cases, and certainly not for all work. The difficult parts of training models and doing the hard work of coming up with new ways to perform an analysis is definitely better left to workstations and data scientists. Those actions might not be worth the resources they take.

Once the models are trained, however, the executable load of submitting parameters to a model and getting a prediction is small. SQL Server allows us to load pre-trained models into the database and just call them as needed. Plus, the R models run in a multi-threaded fashion, unlike the single threaded execution in clients such as R Studio.

As with any feature of SQL Server, it’s important to test and evaluate the real world impact of new code on production sized workloads. Not only will you want to measure the load of your model execution, but you should also measure any changes in your existing workload with the additional R or Python code load. While I wouldn’t prevent the use of MLS in SQL Server, just like SQL CLR code, I would be careful about introducing without extensive testing, including dark deployments and simulated loads.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.2MB) podcast or subscribe to the feed at iTunes and Libsyn.

This entry was posted in Editorial and tagged machine learning. Bookmark the permalink.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30