I’ve been working on various skills over the last few years, trying to solve some simple problems in Python and PowerShell, in addition to T-SQL, to continue to improve my skills. It’s an interesting challenge at times, especially when I need to use new features or functions to which I haven’t had exposure in the past.
I also work with R lightly, as I need to build some Questions of the Day for SQLServerCentral and I try to alternate Python and R questions every week. This has caused me to dig in and try to learn more about the language and how to manipulate data.
Recently I was reading an essay from a consultant that works with clients using both R and Python. The piece talks about the differences and how these work to solve business problems. If you don’t want to read the entire thing, the comparison starts with the simple “you need both”, though there is more to the story.
The most interesting part of this for me was that the author notes that while these are good languages in different ways for data analysis, they aren’t great for data preparation and SQL is still required. Either a database like SQL Server or a platform like Apache Spark. Part of the reason is that R and Python aren’t very efficient, and as we work with more data and larger workloads, efficiency matters.
The other part of the piece I liked was the note that we need to collaborate and our work needs to be reproducible for others. I love having git for moving code around and keeping configuration files in a repository of some sort. It has certainly helped me take advantage of bits that others have written and easily reproduce their work on my system.
While some of us work with just SQL, I expect that we will get involved with other parts of projects and may need to help troubleshoot or improve code. I find both of these languages interesting and a nice complement to each other. I’ve also learned there are places where I much prefer one over the other, especially with some of the Advent of Code problems. Some are simple in SQL, but others are much more suited to Python. I haven’t tried them in R, but I bet some of them would be well suited to that environment.
If you have tried either, or have a preference, let us know. What are the advantages or disadvantages of each when you are working in a business?