T-SQL Tuesday #162–Using AI with Data Science

This month is a timely topic, with Tomaz Kastrun hosting. I was lucky to meet Tomaz before the pandemic, and we had a great time at the SQL Saturday as well as at Lake Bled the next day.

Tomaz does a lot of data science with R, Python, and a lot of data. He has some great blogs and does a neat advent series every December. This month he asks us about Data Science in the era of ChatGPT, with two prompts:

Where and what have you used it [ChatGPT] for?
Have you considered responsible usage of Chat GPT?

I’ll tackle these below. First my experiments with the AI, and second, Ethics.

Getting a Visualization

One of Tomaz’ ideas was a visualization. I have some data, so I used this prompt:

A very poor prompt on my part, but the AI tried. I asked other questions and kept context of the process.

Not sure why the image didn’t appear, but there was an explanation, so I asked for code.

I got some code, and decided to restart this experiment. I started a new chat and recorded my efforts. That result is on YouTube and (hopefully) embedded below.

It wasn’t perfect, but I could see this being the way software developers end up using ChatGPT.

Ethics in AI

There are a lot of things to think about here, but let me focus in on two things. First, is it ethical to get paid for a job if you are asking ChatGPT for help? Second, is it ethical for ChatGPT to synthesize code samples from SQL Server Central, Stack Overflow, etc., that others have written, and give it to you to use in your code.

The first one is easy. I say yes, since this is what many of us do with colleagues. We ask them for help, they give us code or examples, and we use them in our work that we commit, turn in, etc. We don’t write all code by ourselves, so I see ChatGPT as asking someone else or posting on a forum.

The second one is tricky. If I see code on SQL Server Central, then I know who wrote it, or who posted it. If I use that, I need to have rights to do so, or give someone credit. For example, I can post this code:

–============================================================================= — Create and populate a Tally table –============================================================================= –===== Conditionally drop IF OBJECT_ID(‘dbo.Tally’) IS NOT NULL DROP TABLE dbo.Tally –===== Create and populate the Tally table on the fly SELECT TOP 11000 –equates to more than 30 years of dates IDENTITY(INT,1,1) AS N INTO dbo.Tally FROM Master.dbo.SysColumns sc1, Master.dbo.SysColumns sc2 –===== Add a Primary Key to maximize performance ALTER TABLE dbo.Tally ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100 –===== Let the public use it GRANT SELECT, REFERENCES ON dbo.Tally TO PUBLIC

In doing so, I would give credit to Jeff Moden for writing this code. If I used this inside an organization, I should ask Jeff for permission.

If I get this from ChatGPT, it is providing sources? Potentially the AI read a lot of code and then synthesized something that works the same, but perhaps it’s also copied the most popular code it’s seen in many training sets of data. Hard to know.

It’s also hard to know what rights we want to give to the AI, it’s programmers, it’s corporate (or personal) owner, etc. How do we deal with rights here? I’m not sure. This is a an area where I don’t even know what I want.

What do you think?

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31