I know most of you don’t work with the R language. In fact, plenty of you might not know anything about R other than a cursory understand of this as some sort of data analysis language. If you want to know more, here’s what the R Project is.
Microsoft wants you to use R Services in SQL Server, or the R Server product available as a standalone system. However, I saw someone ask the question why would someone run their R scripts inside SQL Server, because these are expensive CPU cycles to burn on analysis. Someone else noted that Microsoft loves your licensing dollars, so their push to use R Services is perhaps a little self serving.
Push the intellignce to the data makes sense. Isn’t that what we do with large data warehousing queries or SSAS cubes? We’re trying to get the analysis done at scale without having to move the data elsewhere, especially considering we’ve (usually) already moved the data in some sort of ETL (or ELT) process. Gaining insights from our ever increasing scales of data requires some computational cycles somewhere.
What’s the alternative? Large queries that pull data to some client? I think that’s fine, and that might be a better alternative since simple queries to pull data don’t burn as many CPU cycles as those that might perform analysis. I certainly understand that the licensed CPU cycles for a SQL Server instance are expensive, and we want to be careful how they are used. Adding complex R scripts might not be the best use of our licensing dollars. On the other hand, if I can perform analysis quicker, that is more useful, than perhaps I can eliminate other random queries analysts want to run on my database?
Ultimately I think that R Services make some sense in SQL Server, but not as some experiment. I would suggest that the R client is the way to experiment, preferably on a copy of data that allows someone to build scripts and determine if there is insight to be gained from a particular set of data. Build a Proof of Concept (POC), and only deploy it to a SQL Server if you find it provides value.
And if you do so, continue to experiment. That R script you run today might not be as useful in six months as your application, database, and business evolve. Data analysis isn’t a set-it-and-forget-it, but rather an ongoing, iterative process.