Voice of the DBA

Sparse Columns Can Use More Space: #SQLNewBlogger

Posted on September 10, 2025 by way0utwest

I saw this as a question submitted at SQL Server Central, and wasn’t sure it was correct, but when I checked, I was surprised. If you choose to designate columns as sparse, but you have a lot of data, you can use more space.

This post looks at how things are stored and the impact if much of your data isn’t null.

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

Setting Up

Let’s create a couple of tables that are the same, but with sparse columns for one of them.

CREATE TABLE [dbo].[NoSparseColumnTest](
     [ID] [int] NOT NULL,
     [CustomerID] [int] NULL,
     [TrackingDate] [datetime] NULL,
     [SomeFlag] [tinyint] NULL,
     [aNumber] [numeric](38, 4) NULL,
  CONSTRAINT [NoSparseColumnsPK] PRIMARY KEY CLUSTERED 
(
     [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[SparseColumnTest](
     [ID] [int] NOT NULL,
     [CustomerID] [int] NULL,
     [TrackingDate] [datetime] SPARSE  NULL,
     [SomeFlag] [tinyint] SPARSE  NULL,
     [aNumber] [numeric](38, 4) SPARSE  NULL,
  CONSTRAINT [SparseColumnPK] PRIMARY KEY CLUSTERED 
(
     [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO

Once we have these, I used claude to help me fill this with data. That’s coming in another post, but I uploaded the script here. This is for the SparseTable Test, where I replaced the select on line 59 with NULL values. In the NoSparse table, this selected random data.

If I select data from the tables and count rows, I see 1,000,000 rows in each. However, the Sparse table is all NULL values in these columns.

Checking the Sizes

I can use sp_spaceused to check sizes. The results of running this is below, but here is the summary

NoSparse Columns – 42MB and 168KB for the index
Sparse Columns – 16MB and 72KB for the index

A good set of savings. Here is the raw data:

Adding Sparse Data

I’m going to update 10% of the rows to be not null in different columns. Not 10% total, but a random 10% amongst all the columns. Again, Claude gave me a script to do this and I have run it. This is the SparseTest_UpdateData.sql in the zip file above.

After running this, I have 900,000 nulls i the TRackingDate, as well as the other columns. You can see the counts below, and a sample of data.

If we re-run the size comparison, it’s changed. Now I have:

NoSparse Columns – 42MB and 168KB for the
index
Sparse Columns – 33.7MB and 88KB for the index

Not bad, and still savings.

Let’s re-run the update script and aim not for 10% updates, but 65% updates. This gets me to only 315k NULL values in the tables, or a little over 70% of my sparse columns are full of data. My sizes now are:

NoSparse Columns – 42MB and 168KB for the
index
Sparse Columns – 67MB and 192KB for the index

My sparse columns now use more space than my regular columns.

Beware of using the sparse option unless you truly have sparse data. I didn’t test to find out where the tipping point it, but I’d hope it was less than 50% of data being populated.

SQL New Blogger

This is another post in my series that tries to inspire you to blog. It’s a simple post looking at a concept that not a lot of people might get, but which might trigger a question in an interview. That’s why you blog. You can share knowledge, but you build your brand and get interviewers to ask you questions about your blog.

This post took a little longer, about 30 minutes to write, though the AI made it go quicker to actually generate the data for my tables. There were a few errors, which I’ll document, but pasting in the error got the GenAI to fix things.

This post showed me testing something I was wondering about. In a quick set of tests, I learned that I need to be careful if I use a sparse option. You could showcase this and update in 10% increments (or less) and keep testing sizes until you find when there is a tipping point. Bonus if you use a column from an actual table in your system.

https://learn.microsoft.com/en-us/sql/relational-databases/tables/use-sparse-columns?view=sql-server-ver17

Posted in Blog | Tagged administration, SQLNewBlogger | 3 Comments

T-SQL Tuesday #190–Mastering a New Technical Skill

Posted on September 9, 2025 by way0utwest

It’s time for T-SQL Tuesday again and this time Todd Kleinhans has a great invitation that is near and dear to my heart: mastering a new or existing technical skill. That’s been a lot of what I try to inspire people to do at SQL Server Central.

Make a plan and start learning. And respond to Todd’s invitation and write down your plan and share it. Start a blog, use Linked In, whatever. Spread the word on socials as well.

If you want to host, I’m always looking for hosts for T-SQL Tuesday. Ping me on Twitter/X, BlueSky, or LinkedIn.

Mastering a New Tech Skill

Like Todd, I’m interested in AI and I think it will dramatically change the world in the coming future. I also think it’s a bit of a technical skill that is important to learn. I wrote about this a bit in last month’s post.

How do I work with a GenAI model and improve my technical skills? The easy answer is more and more. I’ve been having more conversations with Claude, usually looking for ways to help me solve a problem or write code, and then ask the GenAI to explain things.

However, I wrote awhile ago about an experiment in helping someone else learn something: Can an AI Help Me Find a Job?. For me, I’ve been looking a bit more at DataBricks, as I hear this from clients all the time. I wanted to gain some skill here, so I decided to ask Claude to help me.

I got a good outline of things to do across a few months. I then asked for references and got some:

The next stage for me is to start embarking on this journey a few nights a week and learn some things that might help me both in my job, and potentially in a future position if I need one.

My Complete Outline from Claude

Here’s the end result, with links.

Databricks Learning Outline with Resources

Phase 1: Foundations (1-2 weeks)

Understanding the Basics

What is Databricks and why it’s used
Core concepts: clusters, notebooks, workspaces, and Apache Spark
Databricks architecture and modern data stack integration

Key Resources:

Databricks Academy (Free): https://www.databricks.com/learn/training/home – All self-paced training across AI, data engineering, and more is now free Databricks Training & Certification Programs | Databricks<?XML:NAMESPACE PREFIX = “[default] http://www.w3.org/2000/svg” NS = “http://www.w3.org/2000/svg” />
Databricks Learn Hub: https://www.databricks.com/learn
Getting Started Documentation: https://docs.databricks.com/getting-started/
Free Edition Signup: https://www.databricks.com/learn/free-edition – Databricks Free Edition has replaced Community Edition Databricks Databricks

Phase 2: Getting Started (2-3 weeks)

Hands-on Basics

Navigating workspace interface
Creating and managing clusters
Working with notebooks
Basic data import methods

Key Resources:

Databricks Free Edition Setup: https://docs.databricks.com/aws/en/getting-started/free-edition Databricks Free Edition | Databricks on AWS
Azure Databricks Free Training: https://learn.microsoft.com/en-us/azure/databricks/getting-started/free-training Get free Databricks training – Azure Databricks | Microsoft Learn
Databricks Quickstart Tutorials: https://docs.databricks.com/getting-started/quick-start.html
Apache Spark Documentation: https://spark.apache.org/docs/latest/

Phase 3: Data Analysis Fundamentals (3-4 weeks)

Core Analytics Skills

Data exploration and cleaning
Working with different data formats
Basic SQL and PySpark operations

Key Resources:

Databricks SQL Documentation: https://docs.databricks.com/sql/
Delta Lake Documentation: https://docs.delta.io/latest/
PySpark Documentation: https://spark.apache.org/docs/latest/api/python/
Sample Datasets: https://docs.databricks.com/discover/databricks-datasets.html

Phase 4: Intermediate Techniques (4-5 weeks)

Advanced Analytics

Complex transformations and ETL
Streaming data and machine learning
Performance optimization

Key Resources:

Structured Streaming Guide: https://docs.databricks.com/structured-streaming/
MLlib Documentation: https://spark.apache.org/docs/latest/ml-guide.html
Performance Tuning Guide: https://docs.databricks.com/optimizations/
Databricks Machine Learning: https://docs.databricks.com/machine-learning/

Phase 5: Production and Best Practices (2-3 weeks)

Professional Development

Job scheduling and workflows
Security and monitoring
Integration patterns

Key Resources:

Databricks Workflows: https://docs.databricks.com/workflows/
Security and Access Control: https://docs.databricks.com/security/
Best Practices Guide: https://docs.databricks.com/best-practices/

Additional Learning Platforms:

Coursera Databricks Courses: https://www.coursera.org/courses?query=databricks Best Databricks Courses & Certificates [2025] | Coursera Learn Online
Udemy Databricks Courses: https://www.udemy.com/topic/databricks/ Top Databricks Courses Online – Updated [September 2025]
DataCamp: https://www.datacamp.com/ (search for Databricks courses)
LinkedIn Learning: https://www.linkedin.com/learning/ (search for Databricks)

Certification Path:

Databricks Certification Overview: https://www.databricks.com/learn/training/certification Databricks Certification | Databricks
Free Certification Overview Courses: Databricks now offers free certification overview courses for data analysis, data engineering, and machine learning Getting Databricks Certified Is Now Easier With Free Overview Courses

Community and Support:

Databricks Community Forum: https://community.databricks.com/ Databricks Community
Databricks Blog: https://www.databricks.com/blog
Stack Overflow: https://stackoverflow.com/questions/tagged/databricks

Getting Started Steps:

Sign up for Databricks Free Edition: Visit the Databricks Free Edition signup page and pick your preferred signup method Databricks Free Edition | Databricks on AWS
Access free training: Free unlimited access to training content on Databricks Academy provides on-demand learning experiences for all skill levels Databricks Launches Free Edition and Announces $100 Million Investment to Develop the Next Generation of Data and AI Talent – Databricks
Join the community: Connect with other learners and experts
Start with sample datasets: Practice with built-in datasets before using your own data

The great news is that all self-paced training across AI, data engineering, and more is now free for learners Databricks Training & Certification Programs | Databricks, making it easier than ever to get started with Databricks!

Posted in Blog | Tagged AI, syndicated, T-SQL Tuesday | 1 Comment

Getting Started with the MSSQL AI Agent in VS Code

Posted on September 8, 2025 by way0utwest

Recently I was working in VS Code and I saw a walkthrough for the new Copilot chat features. I decided to give those a try in trying to get some information from my SQL Server instance.

This post walks through a few things I did with this GenAI agent. There is a video walkthrough at the end.

Note: I have copilot access set on VS Code as a part of my employer’s benefit.

This is part of a series of experiments with AI systems.

First Steps

When I start VS Code, I see something like this.

One of the Walkthroughts recently mentioned copilot. If I click the “More” at the bottom right, I’ll get this image. You might see something different, but Id’ expect you have a Copilot walkthrough if you can use Copilot. I choose the 4th one down (where the mouse pointer is).

This opened a Copilot pane. There were a few items, and you can see on the left in the image below, some have checkmarks. I’d explored these before.

If I scroll up, I see the one I wanted to get, which was “chat about your code”. I picked this one. This opened a blade to the left when I clicked the blue “Chat with Copilot” button.

I had read there are these @ agents (look up at the right side) and decided to type “@”. I saw a list of things.

Lots of places to work, but I choose the @mssql agent, since this is the place I tend to work. In the lower pane, I typed a question.

Above this (still in the left blade), I got a response.

Below this, I got some text and code explaining how to access a list of databases on various platforms. Not sure why MySQL is first, but I’m assuming this is alphabetical. For SQL Server, I saw this. This is a reasonable answer, with some help on how to execute it.

I then decided to connect to my local instance. I have the MSSQL extension, so I clicked that and got a connection.

Rerunning that query produced the same response. However, when I opened a query window, I got different results. Note the little database icon on the left, below my prompt, with “Untitled-1” next to it. This is the context, which I also saw added to the lower prompt box, just above where I would enter a prompt.

However, this didn’t work. After a few minutes, I got this.

and this. The LLM is trying, but can’t seem to get a query to run. It did try.

I then decide to move on.

Getting Results Back from Questions

This isn’t really the type of thing I’d do, but I decided to try and get some info from a database. The one above isn’t that interesting, so I switched to asking the model some questions. Here’s the first one, where I don’t remember the exact table name, but I ask.

It’s queried the database, and there isn’t a player table. However, it continues to look and finds dbo.players.

Even better, once it has the answer, it also provides a little more info. Maybe good, maybe bad. This reminds me of talking with a person that gives me more information than I asked for.

I try something else. Let’s get some metadata, since I clearly don’t remember what’s in this database.

I get a nice response, with some guesses about what information is contained inside these tables.

OK, can I query for information. I’ve always been a bit more of a hitter than a pitcher, so I’ll ask a question. This isn’t asking to join specific tables, but get me an answer.

It worked, though to be fair, I tabbed over to SSMS and wrote this query in the same time (with SQL Prompt) as the Copilot agent. Cool to see as I’d forgotten Thome and Vlad were up there.

While I got the answer, I didn’t get the query. I asked for it and got it, with an apology.

I’ll do something else. Who played the longest. Might be a somewhat funny query to write for a quick answer. I’d have to join a few tables and look for a sum.

It likely remembered I wanted the query, so that was included, with an explanation. However, it only looked at the batting table.

I asked other questions about fielding and pitching and got those answers (Nolan Ryan, 27 years with fielding stats and pitching stats. So I asked that:

Below this, I see the two players who tied, which Copilot noted.

The code provided only returned one player. I checked, which is something that you should always do. I asked if I could get better code. I got a few options, and I liked the RANK one, so I tried it and it worked.

Slightly annoying, but when I think about this type of conversation with someone else, especially a junior dev, I might have the same results and iterate this way.

At this point I also asked about databases, and I got a result. Maybe I needed a query to run first? Not sure why this works now.

Summary

This was an interesting set of things I could get done with this agent in VS Code. It’s not amazing, but it was helpful. I could tackle some light query tasks or db query ones while also handling some other work. In this case I wasn’t in the zone, trying to code or decode a database schema. Instead, I had a few things to try, and I let the agent work while I tabbed over to close some emails and chats.

In a job where I might need to find info from an unfamiliar database, this could be helpful in getting things done, though it’s hard to know if it’s slower for me if I were focused on ths all the time. The agent can find some info without me, but it also failed in a few cases. I started to try and get other things done when I noticed delays in responding and some hung queries.

Learning to use an AI agent to help you is a skill, and it’s one that takes time to develop.

I’ll look at some more practical tasks in the next post.

Video Walkthrough

Here’s a video walkthrough of most of the stuff in this post. It differs slightly as working with LLMs is not deterministic.

Posted in Blog | Tagged AIExperiments, syndicated, tools | 2 Comments

Requiring Technical Debt Payments

Posted on September 7, 2025 by way0utwest

I was working with a customer recently that is trying to improve their processes. This was a large company, over 100,000 employees, though most of them aren’t in the technology area. However, across many divisions and groups, there are a lot of developers and operations personnel who have tended to work in silos, managing their own applications and systems in disparate ways.

In other words, doing software development the way most companies do it.

I had been working with one group to streamline and standardize some of their software practices to implement more of a DevOps flow to smoothly build, operate, and update their systems. They’ve had some success and other groups noticed that this set of teams is very efficient. They aren’t DevOps like a lot of the articles you read. They still have development and operations, but the groups work closely to ensure efficiency.

They started to get requests to onboard other teams into their flow as the management of this group has been advertising their success. Other groups want to implement Continuous Integration, get database unit testing and static code analysis setup, implement gates for approval, and more. The Operations team manages most of this and is happy to help other groups.

But

They require some things to be in place, some of which are cleaning up technical debt. Not all debt, but certain things that create additional risk or instability. Before they onboard anyone, they don’t want to take on a codebase that is difficult to manage. A lot of this debt isn’t difficult, but they want some good coding practices implemented. They require integrated security or a waiver from InfoSec. They want explicit index names, not system-generated ones. They want permissions granted to roles, not users. Not big things, but little items that make a system less maintainable and understandable.

The same things a lot of us let creep into our codebase over time.

On one hand, I thought this was an idea that would slow adoption and allow many groups to continue to operate inefficiently. They won’t clean up code. On the other hand, this might be the lever that helps create a better run environment across the organization. This might help them smooth their upgrade cycles, let staff change between projects, and more importantly, reduce the overhead of communication and work between teams.

I don’t know how this will work over time, but I am interested to see what happens.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged DevOps, software development, technical dent | Comments Off

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Voice of the DBA

Sparse Columns Can Use More Space: #SQLNewBlogger

Setting Up

Checking the Sizes

Adding Sparse Data

SQL New Blogger

T-SQL Tuesday #190–Mastering a New Technical Skill

Mastering a New Tech Skill

My Complete Outline from Claude

Databricks Learning Outline with Resources

Phase 1: Foundations (1-2 weeks)

Phase 2: Getting Started (2-3 weeks)

Phase 3: Data Analysis Fundamentals (3-4 weeks)

Phase 4: Intermediate Techniques (4-5 weeks)

Phase 5: Production and Best Practices (2-3 weeks)

Additional Learning Platforms:

Certification Path:

Community and Support:

Getting Started Steps:

Getting Started with the MSSQL AI Agent in VS Code

First Steps

Getting Results Back from Questions

Summary

Video Walkthrough

Requiring Technical Debt Payments

Search this blog

The Current Book – The Phoenix Project (re-reading)

18 Year MVP Awardee

Tags

Search this blog

Steve’s Tweets

Older Posts

Meta

Recent Posts

Archives

Copyright Steve Jones 2018

Copyright 2016

Meta