Learning From Breakage

I’ve had the fortunate, or maybe unfortunate, experience of being thrown into a few jobs with no training. At a couple of my bartending jobs, I had to start working without any training, calling over someone to help run the ordering machine while I made and served drinks. I managed to slowly learn how things worked throughout that first shift, so I was ready to work on my own the second night. I had a similar experience at a tech job, starting as the lead DBA/IT Manager in a crisis, having to try and solve problems after ask others how things were supposed to work. I ended up fixing a bit of code, adjusting networking, and directing others on my first day.

When we have a crisis, we often learn a lot from the situation. I’ve been through crashed upgrades, virus breakouts, hardware failures, and more in my career. While each was stressful and often not enjoyable, I learned a lot each time and came through the incident a more capable developer/DBA/whatever. When we work through a tough time, we are often better equipped for the next time something goes wrong.

I ran across a great piece that says you never really know a system unless you’ve broken one. This is Tim O’Brien, a software architect who has learned a lot about databases from failure. In fact, I love his interview question for data professionals: “tell me about the worst database schema you ever created. What did it teach you to avoid?” I’ve certainly learned a few things over time from my schema designs, but those are stories for another piece.

The piece draws parallels to today’s use of GenAI technology and vibe coders who seem to have success that they highlight in posts without discussing the problems. I do believe AI technology is going to make a lot of things easier (and faster) to build and then fix when they break. And they are going to break, partially because AI tech might not do a great job, and partially because we might not direct it well enough. Clear communication is key when working with AI.

I’ve started to build some skills with AI, but as I try to tackle more complex tasks or scale up my work, I realize that I often don’t know enough about either the problem or AI technology, and I’m going to make mistakes. I’m going to break things and then have to fix them, or more likely, learn how to get the AI to reduce the number of broken things in some way before I have to take over.

And learning to take over might be the number one skill with AI tech, but that’s something that you will only learn from the AI not working well for you in a variety of situations.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged , | Leave a comment

Who’s the Winningest Coach (with AI Help)

I was listening to the radio the other day and the hosts were discussing the NFL playoffs in 2026. Someone mentioned the winningest coach was Nick Sirianni of the Philadelphia Eagles and no one wanted to face the Eagles. I was wondering if the first part of that was true. I used Claude and SQL Prompt AI to help.

Note, I enjoyed watching the Eagles lose in the first round to the 48ers. As a lifelong Cowboys fan, that was great. However, I am impressed with Jalen Hurts and was glad to see him win last year.

This is part of a series of posts on SQL Prompt. You can see all my posts on SQL Prompt under that tag. This is part of a series of experiments with AI systems.

Getting the Data

First I had to find data. I did find this page of coaching history at Pro Football Reference, which lists coaches, however, I wanted to compare the first few years of their careers, not their totals.

I decided to see if Claude could help get some data. I started with a query: This pages has a list of nfl coaches: https://www.pro-football-reference.com/coaches/

I want to loop through each coach and get the details of their career to find the team they coached and the year, returning this in a CSV that has team, year, and coach name. Can you write a python script to do this?

This kind of worked. I got a script, and it ran, but there were some errors. PFR doesn’t like scrapers and they want to protect their data. I get it.

2026-01_0117

I told the AI there was an error and it helped me get the Chrome driver and Selenium module to drive a real browser from automation. I commented out the “headless” part so that I could see it working (a bit).

2026-01_0118

This kind of worked, but not really. I got the coaching list and I could see the browser going through each coaches page, as well as the CLI output, but lots of errors. PFR does a good job of blocking this.

2026-01_0119

What was amazing is that the script in my repo is something that would have been hours of me messing with different modules and trying to debug the issues. This was literally about 30 minutes of multiple tries before I gave up for the night.

The next day I decided to give in and just grab data from some coaches that I know have won Super Bowls and had success early in their careers (sorry, Andy Reid). I went to each page and clicked the “CSV export” item and then copy/pasted the data into a file. I then asked the Copilot AI for help. Each set of data was nice, and the file was named for the coach, but the coach’s name wasn’t in there. So I let Copilot edit it.

2026-01_0196

Admittedly this wasn’t automated, but by the time I’d created a new file for a new coach, CP had cleaned up my old one. With this in mind, I had 6 coaches of data stored in CSV.

Back to Claude. No reason not to use Copilot, but I like Claude. I asked it to give me an import script.

2026-01_0196

It did, and I saved it (coachimport.py). I ran this, but it errored out. Since this was a file in my repo, I moved back to Copilot, which has access to the files. I asked it to fix the errors. This was the prompt (everything after the colon was pasted in) : I am getting this error in the script: Error importing rawdata\NickSirianni.csv: (‘42000’, ‘[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 24 (“”): The supplied value is not a valid instance of data type float. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision. (8023) (SQLExecDirectW)’)

It started to examine the files.

2026-01_0200

Then it found a problem and started fixing things. Since this is an import and I’ll stage the data, I can let it run and check the results.

2026-01_0201

While this was running, I also used SSMS and the Flat File Import to pull in one file from the subfolder to see how well it worked. I basically, next, next, next on this, which was good enough.

In the time it took me to import 1 file, Copilot had finished and imported all the data. I could query the staging data and see all my rows. I quick check of the different files showed me all rows imported.

2026-01_0203

This was with the free Claude web tier and paid corporate Copilot account.

Now, what do we learn?

The Analysis

Since I had the data in a table, I could use SQL Prompt AI to help analyze the data. I started with a simple query:

SELECT coach, SUM(w) OVER (PARTITION BY coach)
FROM dbo.Coach_Staging

I could do more, but I’m playing with AI tech, so I then added a prompt: help me analyze this data by comparing each coaches first 5 years with wins, losses and ties and win percentage.

I got this code:

SELECT coach,
SUM(W) AS TotalWins,
SUM(L) AS TotalLosses,
SUM(T) AS TotalTies,
CAST(SUM(W) AS FLOAT) / NULLIF(SUM(W) + SUM(L) + SUM(T), 0) AS WinPercentage,
AVG(CAST(Rank AS FLOAT)) AS AverageRank
FROM
(
SELECT coach,
Year,
           W,
           L,
           T,
Rank,
ROW_NUMBER() OVER (PARTITION BY coach ORDER BY Year) AS YearNum
FROM dbo.Coach_Staging
) AS CoachYears
WHERE YearNum <= 5
GROUP BY coach

That’s about what I would have written. I was about to start adding the row_number when I thought Prompt could help me. I additionally asked for an order by and an average rank, and ended up with a query that made sense.

The takeaway for me was that I could have written the code, but in a few seconds, SQL Prompt gave me a query I could use. This was way faster than I could have written the query, even with SQL Prompt Intellisense.

The Results

Well, is Sirianni the winningest coach in his five years. Here’s what I see.

2026-01_0210

By win percentage, Don Shula is better, but there were less games, so Nick Sirianni has more wins and losses, and a slightly lower percentage, but a higher rank. I think it’s fair to say he tops the list, though it depends whether you look at wins or percentage.

It was also surprising to see Mike Tomlin, who recently left the Pittsburgh Steelers, coming in third.

An interesting analysis that went way quicker than a few I’ve done in the past. AI is incredibly helpful here, and as I think about all the similar types of queries people have asked for help with over the years, I can see how AI will be very helpful over time.

Of course, with larger data sets, you’ll want to verify the queries are working with other checks of the data, and you’ll likely want some automated tests on small sets to verify any changes to algorithms still return the correct results.

If you haven’t tried SQL Prompt, download the eval and give it a try. I think you’ll find this is one of the best tools to increase your productivity writing SQL.

Posted in Blog | Tagged , , , | Leave a comment

Learning from Mistakes: T-SQL Tuesday #194

We’re a week late, once again my fault. I was still coming out of the holidays and forgot to check on my host. Luckily, Louis Davidson (who did have Feb) agreed to go early. He has a nice invite, and I am glad to answer.

This is the monthly blog party on something SQL Server/T-SQL/etc. related. I have about half of 2026 covered, but if you would like to host, I’d love to have you. Ping me on X/LinkedIn/BlueSky.

A Mistake

Since we aim for T-SQL, I decided to ping something I’ve done a number of times in T-SQL, and sometimes still break. However, a little testing has helped me (mostly) keep this from getting to production.

Always have testing in place.

I am good at T-SQL, but not amazing.  I learn things from others all the time, and these days, take help from AIs, though I do test and double check what they do.

One of the places I’ve struggled with is with outer joins. Usually left/right outer joins where I am trying to get a list of things from a join, but filter out some of the missing items. Here’s an example from Northwind. I want a list of customers joined to orders, but I might have a way where customers filter out those who haven’t been charged freight. There’s likely some business reason, but it escapes me now.

If I run this query, I get lots of stuff.

2026-01_0109

That doesn’t seem right. If I check, I see this:

2026-01_0111

What’s the problem here? Well, the main issue is one I keep doing, fortunately, I catch this. If I move the Freight IS NULL to  WHERE instead of the ON, it works. You can see this below.

2026-01_0112

If I see too much data, which can be hard to catch in large result sets, I can ask Prompt AI.

2026-01_0113

I get the response I’d expect from most AIs.

2026-01_0114

How do I test for this? Well, the best way is to have test coverage for queries. For example, I might build a test like this:

EXEC tsqlt.NewTestClass @ClassName = N’QueryTests’ — nvarchar(max)
go

CREATE OR ALTER PROCEDURE [QueryTests].[TestCustomersWithoutOrders]
AS
BEGIN
— Arrange
— Create temporary table to hold expected results
DECLARE @Expected TABLE
(
CustomerID nchar(5)
)

— Insert the expected result – customers with no orders
INSERT INTO @Expected
SELECT CustomerID
FROM dbo.Customers
WHERE CustomerID NOT IN
(
SELECT DISTINCT CustomerID FROM dbo.Orders WHERE CustomerID IS NOT NULL
)

— Act
— Create temporary table to hold actual results
DECLARE @Actual TABLE
(
CustomerID nchar(5)
)

— This should be the query that’s being tested
INSERT INTO @Actual
SELECT DISTINCT
Customers.CustomerID
FROM dbo.Customers
LEFT OUTER JOIN dbo.Orders
ON Orders.CustomerID = Customers.CustomerID
WHERE Orders.CustomerID IS NULL

— Assert
— Check that we have exactly 2 results
DECLARE @ActualCount INT =
(
SELECT COUNT(*)FROM @Actual
)

IF @ActualCount <> 2
BEGIN
EXEC tSQLt.Fail ‘Expected exactly 2 customers without orders, but got ‘,
@ActualCount;
RETURN;
END

— Check that we got the expected customers
IF EXISTS
(
SELECT 1
FROM @Expected e
WHERE NOT EXISTS
(
SELECT 1
FROM @Actual a
WHERE a.CustomerID = e.CustomerID
)
)
OR EXISTS
(
SELECT 1
FROM @Actual a
WHERE NOT EXISTS
(
SELECT 1
FROM @Expected e
WHERE e.CustomerID = a.CustomerID
)
)
BEGIN
EXEC tSQLt.Fail ‘The actual set of customers without orders does not match the expected set.’;
END
END;

That’s a lot of code, but I can see it works. I get two customers back, which is what I expect. Lines 53-58 have my query being tested above. If I run the test, it passes.

2026-01_0115

If I change those lines to put the filter in the ON clause (and remove WHERE), it fails.

2026-01_0116

Ideally I’d have this in a proc so I can change/tune this and compare plans, run tests easily, etc.

This is a mistake I still make at times today, albeit rarely. Now I write some tests to look for my mistake. Maybe that’s the thing I’ve learned the most: have tests for my code.

Posted in Blog | Tagged , , | 1 Comment

Republish: Representative Data Challenges

It’s a holiday in the US, though I’m in the UK, having arrived this morning and heading to work. I’ve got a long day ahead, and a long week as I spend a few days here and then head home on Thursday.

In the meantime, you can get re-read Representative Data Challenges. I wish I’d talked more about this, especially the idea that so many bugs in software systems are data related, because we haven’t tested on representative data.

Posted in Editorial | Tagged | Leave a comment