Jupyter Notebooks are a popular way of consolidating a number of code batches together and them executing them as separate batches or all together. These are essentially a document with notes and code, all of which are stored together. Databricks can run these, as can a number of other services. They are also integrated into various tools, like Azure Data Studio.
Just like many other technologies, we’d like to execute these in a way that suits our environment. While running these in a tool like ADS works fine, we might want to schedule these and check the results later. You can run notebooks with PowerShell, but keeping around results and managing the data is hard. Since many of us use SQL Server or Azure SQL Database, wouldn’t it be great if we could find a way to have these run inside SQL Server and keep the results around?
I was poking around inside a database and discovered that I could do this. Only with notebooks that have T-SQL code, but that’s usually what we’d want to do inside a SQL Server (or Azure SQL) database. I need to write a more detailed article on this, but there are a few basics that might help some of you understand how this works. Essentially, there is an undocumented extended stored procedure, xp_notebookparse, that will read the JSON internals of a notebook and allow you to extract out there T-SQL batches. The @execute = 1 parameter will cause each of these batches to be executed.
For results, a new notebook is created and added to a local temp file, #nbresults, as a BLOB value. You can explore the structure of this, but essentially, part of your stored procedure should take all the data from this table and add it to a permanent table in your database. If you use a stored procedure to wrap the extended stored procedure call and an INSERT..SELECT call together then you’ll actually get nothing to happen because today is April first. It’s April Fools and this is a joke. Hopefully, you aren’t too disappointed.