The Data Model Matters

I ran across a statement that seems exciting to me as someone that has written a lot of code in their career. It said: “Many of the “modern” software practices of the last decade were early adaptations to this shift, even if we didn’t articulate them that way. Immutable infrastructure. Stateless services. Containers. Blue-green deployments. Infrastructure as code. These ideas all share a common premise: never fix a running thing. Replace it.”

These are a few sentences in this piece on the death and rebirth of programming. That’s how a lot of software developers have viewed the world during the last decade and we’ve seen a lot of software advances in that time. The very successful developers and teams, who often speak at conferences and publish papers have adopted many of these practices. Serverless, containers, lots of tests allowing continuous deployment of new objects into complex environments that scale to levels many of us never thought possible. These are the very high performances talked about in the State of DevOps report every year.

At the same time, many people reading about these successes and trying to emulate them struggle. So many customers I know want to use containers, but struggle. Many teams lose control over serverless functions and stateless systems, having issues with immutable infrastructure. They revert, or often combine, older ways of building and deploying software with some of the techniques they read about.

If they struggle with stateless systems, it’s no wonder they struggle with the really, really important stateful ones: the databases.

Databases are state machines. We evolve and grow them. NoSQL systems were developed to try and deal with some of the scale issues with relational systems, but they often push the immediate problems of concurrency and efficiency to the side, invoking eventual consistency and redundant data models that keep multiple copies of data around for quick access. They also defer one of the strengths of relational systems, aggregating lots data, to another system, usually a data warehouse, data lake, or some other architecture.

That works great, though it comes at the cost of more compute, more latency to develop and produce those aggregations, and more cost to store all that data in yet another place. That’s not to disparage those designs. They work well and handle workloads most relational systems couldn’t manage.

However that brings to mind two things. One, perhaps that easy and instant aggregation isn’t as important as we think. After all, often companies at that size never have a view of all their data. It’s changing too often, yet they are successful. Secondly, if you don’t have the funding to manage that complexity (both in machine and human resources), perhaps you ought to focus on what is important in this age of cheap code changing often.

Build a strong data model and write efficient SQL Code.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | Leave a comment

A New Word: La Guadière

la guadière – n. a glint of goodness you notice in something that you wouldn’t expect, which is often only detectable by sloshing them back and forth in your mind until everything dark and gray and common falls away, leaving something shining at the bottom of the pan – a rare element hidden deep in the bedrock, that must’ve washed there by a storm somewhere upstream.

It is easy at first glance when something sad, distasteful, or otherwise negative occurs to get upset or see a problem. When you think about it, sometimes you feel like there is something good. That blessing-in-disguise that is revealed after your initial reaction fades and you have a moment to consider the situation.

I have often felt that I react to something and think about the negative. That seems like a very human reaction. For example, a customer doesn’t want to spend time changing a process, even though it’s broken. I only think about the negative.

However, if you pause a moment and twist the situation from your view to someone else’s, then la guadiere comes out. A broken process, but a known one, isn’t the worst thing in the short term. It allows other work to get done, people are familiar, and it can be lower stress, even if it requires more effort.

The same thing if you have an issue. A flat tire, broken internet, a broken horse feeder. There’s some la guadiere in there. A flat might let me know my tires need replacing and they’re dangerous, or maybe I haven’t cleaned up around the garage properly. Internet teaches me to prepare more in advance (or take a break). Broken horse feeders let me catch up on some maintenance and maybe improve something before it gets much worse.

Find the good in life, even when it’s bad.

From the Dictionary of Obscure Sorrows

Posted in Blog | Tagged , | Leave a comment

Over or Under Provisioned

Lots of people move to the cloud; it’s common. In fact, it’s very common to hear customers who are being asked to migrate their workloads to a cloud vendor for a variety of reasons. You might not agree, but often there is some reason to move to the cloud. Sometimes it’s even moving from one cloud to another, just because one of the big three (AWS, Azure, GCP) seems more attractive this year than the one from last year.

When you move, do you size your system for the peak? 80% of the peak? Perhaps there is another goal for which you design. Do you worry about ever being under-provisioned and letting customers have a slower system? Or do you ensure you never hit the peak, which increases costs?

Auto-scaling can help, but it doesn’t seem to have worked as well for database systems as it does for serverless functions or other types of workloads. Compute is much easier to scale than stateful database systems that need CPUs and RAM ready on a particular system instantly. In fact, the “serverless” Azure SQL database is attractive to me more for it’s ability to scale the CPUs and RAM more than the on/off capability.

I was in a discussion recently with a number of data professionals who tend to over-provision a bit, mostly because their companies are willing to. It saves them headaches and phone calls (more angry texts these days), but it also means developers aren’t incentivized to optimize any queries. Unless there is a way to determine that the aggregate of all queries could lower the size of the resource provisioned, no one wants to fix any poorly running code.

That was interesting to me, as I’d think we’d want to optimize code as we pay every month, but the reality is that we pay every month for a level of resources. Coming in under that level is all that’s important. If we use 99% of those resources or 25%, we pay the same amount. It’s like saying we want to watch a movie every night because we pay for Netflix/Apple TV/etc. We’ve spent the money, so whether we watch 1 a week or 5 a week, there’s no point in optimizing our time to get value out of the subscription.

In the PaaS world, that might change, but often we’re still purchasing a tier of resources, not paying for each query. Until we need to raise that tier, no one worries about efficiency. If we can’t prove a lower tier would work with better code, no one cares.

It’s a little sad, but perhaps some future version of monitoring that can spin up a digital twin, optimize some code, and model a lower tier will take hold among all the performance tuners and monitoring vendors. Maybe with a few Claude code tokens, one of you will solve that problem.

For now, I still think it’s worth trying to optimize code, especially if an AI can give you suggestions and prove things run quicker in a test environment. If the cost of code is getting lower, then why not extend those savings to SQL code?

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | 4 Comments

UNION vs UNIONALL: #SQLNewBlogger

While writing another post I realized my UNION query didn’t work as one might initiall expect, so I decided a short post was worth writing. This is based on a previous post on QUOTENME().

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

Missing a Row

When I ran this code, I got only a single row. There’s a UNION here, so why? One would expect two rows from these queries.

2026-05_0287

Let’s change to UNION ALL. Now we see this:

2026-05_0288

You can likely spot the reason, but it’s because both rows in the result are the same. In this cse, UNION is designed to remove duplicates. In the docs, it explicitly says

  • UNION ALL – Includes duplicates
  • UNION Excludes duplicates

We can see this in this examples I’ve got this code that gives me two virtual tables of numbers, some of which are duplicate:

WITH myTally(n)
AS
(SELECT n 
 FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) a(n)
)
, myTally2(n)
AS
(SELECT n 
 FROM (VALUES (1), (20), (3), (40), (5), (60), (7), (08), (9), (100)) b(n)
)
SELECT n
FROM myTally
UNION 
SELECT n
 FROM myTally2

When I run the query, with UNION, I see these results, 14 rows:

2026-05_0289

If I change to UNION ALL, 20 results.

2026-05_0290

Use UNION when you want unique things. UNION ALL if you need to see ALL The Rows.

SQL New Blogger

This post was about 8 minutes spent after I finished the other post. It is a quick expansion on something I saw in another post, it has a separate focus, and it shows I’ve realized something and built on previous work.

You can showcase these skills.

Posted in Blog | Tagged , , | 1 Comment