A New Word: Waldosia

waldosia– n.  a condition in which you keep scanning faces in a crowd looking for a specific person who would have no reason to be there, as if your brain is checking to see whether they’re still in your life, subconsciously patting its emotional pockets before it leaves for the day.

This is an interesting word to me, and I find myself in waldosia at volleyball tournaments looking for past players. I think it’s a shadow of a desire to see a player I’ve coached in the past and hoping they had made time to come.

This happened recently as I was up in Greeley, CO at a tournament. I have a few former players at CSU, which isn’t far. I somehow hoped I’d see a former player, but they would have no reason to leave college and come, especially early on a weekend morning.

I probably have waldosia at SQL Saturday and other events, sometimes looking (hoping) for a friend that might pop by the event, even if I know they might have moved on in their career.

From the Dictionary of Obscure Sorrows

Posted in Blog | Tagged , | Comments Off on A New Word: Waldosia

A Couple Quick GENERATE_SERIES Tests

I had someone reach out about generate_series() recently, saying they hadn’t realized this was a new feature in SQL Server 2022. They were wondering if it was better than using a tally table.

I didn’t want to do an exhaustive test, but I thought I’d take a minute and try a couple simple things just to see.

A First Test

The first thing was to just generate a million numbers. Rather than just get the numbers. I decided to use a quick DATEADD() to create a list of calendar dates. Here’s the code:

SET STATISTICS IO ON;
SET STATISTICS TIME ON;
-- Create a Tally table with 1 million numbers
WITH Tally (n)
AS ( SELECT TOP (1000000)
             ROW_NUMBER () OVER (ORDER BY
                                   (SELECT NULL)) AS Number
      FROM
        master.dbo.spt_values a
        CROSS JOIN master.dbo.spt_values b)
SELECT DATEADD(DAY, n, GETDATE())
  FROM tally

SELECT DATEADD( DAY, value, GETDATE()) FROM GENERATE_SERIES(1, 1000000, 1)

Since this does read from tables, I ran it twice. The first time, the tally table took 243ms, so I re-ran it and saw this drop to 172ms. The results were consistent for Generate_series, which was 110ms.

2025-02_0343

A Second Test

I grabbed Jeff Moden’s code for random numbers and adjusted a second query to use GENERATE_SERIES(). The code is below.

SET STATISTICS IO ON;
SET STATISTICS TIME ON;

--===== Declare some obviously named variables
DECLARE @NumberOfRows INT,
@StartValue   INT,
@EndValue     INT,
@Range        INT
;
--===== Preset the variables to known values
SELECT @NumberOfRows = 1000000,
@StartValue   = 400,
@EndValue     = 500,
@Range        = @EndValue - @StartValue + 1
;
--===== Conditionally drop the test table to make reruns easier in SSMS
IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL
DROP TABLE #SomeTestTable
;
--===== Create the test table with "random constrained" integers and floats
-- within the parameters identified in the variables above.
SELECT TOP (@NumberOfRows)
SomeRandomInteger =  ABS(CHECKSUM(NEWID())) % @Range + @StartValue,
SomeRandomFloat   = RAND(CHECKSUM(NEWID())) * @Range + @StartValue
INTO #SomeTestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2

SELECT TOP (@NumberOfRows)
SomeRandomInteger =  ABS(CHECKSUM(NEWID())) % @Range + @StartValue,
SomeRandomFloat   = RAND(CHECKSUM(NEWID())) * @Range + @StartValue
INTO #SomeTestTable2
FROM GENERATE_SERIES(1, @NumberOfRows, 1)

When I ran this, I see these results:

2025-02_0344

Execution times are close. Slightly faster with GENERATE_SERIES(), but fairly consistent across runs. In running this 10 times, there were 3 runs where the tally table was faster, and once just under 300ms. A few times the time was the same, but always within 15-16ms. Not sure that means much.

This isn’t a really exhaustive test, and don’t take this as a recommendation either way for your code. Test how they both work in your system, and certainly think about the impact of storing a tally table vs. generating one on the fly vs the GENERATE_SERIES().

However, it seems that GENERATE_SERIES() is worth looking at if you are on SQL Server 2022 or later.

Posted in Blog | Tagged , | 2 Comments

The Job Outlook for Database Professionals

This is a strange time for many technology professionals and their employment. It seems many organizations have open positions and are struggling to fill those slots with qualified candidates (one look at this). At the same time, many other organizations have followed the large technology firms (FAANG, MSFT) and laid off large numbers of their staff in the last year.

At the same time, with the hype and rapid growth of GenAI systems, we have companies that are looking to machines to fill some of their labor needs. Salesforce noted that they did not expect to hire more software engineers in 2025 as their plan was to use AI tech to help them write enough code. They’ve seen productivity gains with AI tech, so they aren’t adding more developers. They still need human developers, just not more of them.

While thinking about the labor market, I ran across this report from the Bureau of Labor Statistics in the US, which tries to project jobs across many different industries in the future. They typically are careful and conservative, and while they show growth in the need for database administrators and database architects (and software developers) they also admit they can’t quite tell what impact GenAI tech will have on the labor market. They do think there will be an impact, but for now they have the need for data professionals growing faster than average.

Why is it hard to get an IT job? I know plenty of people looking for work that have struggled to find a position that suits them. I think there are a few things at work here. First, many companies have been burned in the past with unqualified candidates. We know hiring is hard, and we will make mistakes with some hires. With the focus on cloud, new technologies, DevOps, and more, companies want to hire skilled people who can hit the ground running.  I also think the supply of workers is high and companies are more choosy about who they hire.

Lots of companies are also investing in their own workforces, upskilling and re-skilling them in new technology. It’s often easier to train someone whose work ethic, soft skills, and other factors are known than hire someone new who might or might not be a good cultural fit. I know that I often see budgets for conferences and classes are low, but there are plenty of companies still sending people to classes, conferences, or providing opportunities to learn.

If you want a new job, or you’re struggling to find one, I can’t stress enough the importance of working on your own skills and showing prospective employers that you are investing in yourself. Take stock of your skills and work on projects, experiments, build things, blog about them, and show companies that you have the skills they need.

Don’t forget your soft skills as well if you’re investing in yourself. Learn to communicate well, both written and verbal, as that can often set you apart from other candidates. As someone that has interviewed others, those skills sometimes are more impactful than your technical ones.

Invest in yourself now, and you’ll be ready for the challenges you face in the future.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | Comments Off on The Job Outlook for Database Professionals

A Poor Data Model

Recently there was some online complaints about social security numbers (SSNs) in the US being duplicated and re-used by individuals. This is really political gamesmanship, so ignore the political part. Just know that social security numbers appear to be one of the contenders used in many data models.

I found a good piece about how SSNs aren’t unique, and have a mess of problems. Despite this, many people seem to want to use SSNs as a primary or alternate key in their database systems. They also aren’t well secured in many systems, even though we should consider this sensitive PII data.

As we have moved many analog systems to digital ones, we often find that our assumptions about the rules governing data aren’t that well defined. I’ve worked in many systems where data elements were assumed to be unique in some way, but they actually weren’t. I’ve seen invoices, POs, and other “unique numbers” actually duplicated because of simple mistakes by humans. When we try to enforce uniqueness in a database, we end up with problems. Often we actually need to drop keys and make exceptions because the data must be entered.

This has led many people to not create unique constraints or even foreign keys in their systems, and I understand why they don’t. There are real problems when we assume the real world has the same strict structures we implement in code. I’ve seen systems go online and then FKs removed because of poor data quality. It might be a mistake, but it’s also a reality when we find there is existing child data without a parent. We might create a pseudo-parent at times, but we might also decide not to do so if that creates other problems.

I suspect over time the real world will migrate some of their problematic keys to something more robust. However, some that are used widely in older systems, like SSNs, are unlikely to change in my lifetime. There are far too many places where this is in use and I am not sure that there is any consensus to undertake the massive amount of work to implement something else. I suspect even trying to add digits to the value is a task we’ll put off indefinitely.

Be careful of using natural keys in your data models unless you are sure they are really a natural key. To me, I’m better off with some surrogate key in the event that my “natural key” turns out to not have the uniqueness I expected.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | 2 Comments