Quickly Creating a tsqlt Test for a Query

One of the things that I feel is important to building better software is testing your code. It should be easy and simple to test code and determine if a chance or enhancement has broken anything.

This usually requires a testing framework. I like tsqlt, which a friend wrote and my employer supports. This framework is easy, I think, to use for tests, but some people feel differently. This post will show me creating a test in < 5 minutes, based on what someone else wrote to ask for help.

The Scenario

In this post, a user was asking about how to write a query that looks to create distinct results. They posted this code, which isn’t far from the type of thing I’ve had other developers send me inside a company asking for help:

DROP TABLE IF EXISTS #BASKET;
DROP TABLE IF EXISTS #ARRIVED;
DROP TABLE IF EXISTS #PREPARED;
DROP TABLE IF EXISTS #WantedResult;

CREATE TABLE #BASKET
(
BASKET_ID nvarchar(15) PRIMARY KEY NOT NULL
)
;
CREATE TABLE #ARRIVED
(
BASKET_ID nvarchar(15) NOT NULL
,ITEM_ID nvarchar(15) NOT NULL
)
CREATE TABLE #PREPARED
(
BASKET_ID nvarchar(15) NOT NULL
,ITEM_ID nvarchar(15) NOT NULL
)

INSERT INTO #BASKET
(BASKET_ID)
SELECT 'A'
UNION ALL SELECT 'B'
UNION ALL SELECT 'C'
UNION ALL SELECT 'D'
;

INSERT INTO #ARRIVED
(BASKET_ID, ITEM_ID)
SELECT 'A','INO 1'
UNION ALL SELECT 'A','INO 2'
UNION ALL SELECT 'B','INO 3'
UNION ALL SELECT 'D','INO 8'

;
INSERT INTO #PREPARED
(BASKET_ID, ITEM_ID)
SELECT 'A','INO 111'
UNION ALL SELECT 'B','INO 44'
UNION ALL SELECT 'B','INO 55'
UNION ALL SELECT 'B','INO 66'
UNION ALL SELECT 'C','INO 170'
;
/* Gives duplicates INO 111, INO 3 */
SELECT
BASKET.BASKET_ID
,ARRIVED.ITEM_ID IN_ID
,PREPARED.ITEM_ID OUT_ID
FROM #BASKET BASKET
LEFT JOIN #ARRIVED ARRIVED
ON BASKET.BASKET_ID=ARRIVED.BASKET_ID
LEFT JOIN #PREPARED PREPARED
ON BASKET.BASKET_ID=PREPARED.BASKET_ID
ORDER BY BASKET.BASKET_ID,ARRIVED.BASKET_ID,PREPARED.BASKET_ID
;


CREATE TABLE #WantedResult
(
BASKET_ID nvarchar(15)
,IN_ID nvarchar(15)
,OUT_ID nvarchar(15)
,SORT_NBR int
)
INSERT INTO #WantedResult (BASKET_ID,IN_ID,OUT_ID,SORT_NBR)
SELECT 'A' BASKET_ID,'INO 1' IN_ID,'INO 111' OUT_ID,1 SORT_NBR
UNION ALL SELECT 'A' BASKET_ID,'INO 2' IN_ID,NULL OUT_ID,2 SORT_NBR
UNION ALL SELECT 'B' BASKET_ID,NULL,'INO 55' OUT_ID,4 SORT_NBR
UNION ALL SELECT 'B' BASKET_ID,NULL,'INO 66' OUT_ID,5 SORT_NBR
UNION ALL SELECT 'B' BASKET_ID,'INO 3' IN_ID,'INO 44' OUT_ID,3 SORT_NBR
UNION ALL SELECT 'C' BASKET_ID,NULL IN_ID,'INO 170' OUT_ID,6 SORT_NBR
UNION ALL SELECT 'D' BASKET_ID,'INO 8' IN_ID,NULL OUT_ID,7 SORT_NBR
;
SElECT BASKET_ID,IN_ID,OUT_ID,SORT_NBR
FROM #WantedResult
ORDER BY BASKET_ID,SORT_NBR

Now if I run this, I see a query that doesn’t work, and I have results in a table. I could query #WantedResult and compare that with my query, but that’s not great, and it doesn’t persist this over time.

Let’s make this better.

Setup

I’ll assume you have tsqlt installed. If you don’t, read this. If I paste that code into my SSMS, I can do a couple things. First, I’ll remove the temps, because I want to be able to test this over time. I’ll assume this is something I got from another developer, and I might need to check their tables. If these were tables in a dev database I used, I wouldn’t run this part. I wouldn’t run any, I’d keep my test data.

CREATE TABLE BASKET
(
BASKET_ID nvarchar(15) PRIMARY KEY NOT NULL
)
;
CREATE TABLE ARRIVED
(
BASKET_ID nvarchar(15) NOT NULL
,ITEM_ID nvarchar(15) NOT NULL
)
CREATE TABLE PREPARED
(
BASKET_ID nvarchar(15) NOT NULL
,ITEM_ID nvarchar(15) NOT NULL
)

INSERT INTO BASKET
(BASKET_ID)
SELECT 'A'
UNION ALL SELECT 'B'
UNION ALL SELECT 'C'
UNION ALL SELECT 'D'
;

INSERT INTO ARRIVED
(BASKET_ID, ITEM_ID)
SELECT 'A','INO 1'
UNION ALL SELECT 'A','INO 2'
UNION ALL SELECT 'B','INO 3'
UNION ALL SELECT 'D','INO 8'

;
INSERT INTO PREPARED
(BASKET_ID, ITEM_ID)
SELECT 'A','INO 111'
UNION ALL SELECT 'B','INO 44'
UNION ALL SELECT 'B','INO 55'
UNION ALL SELECT 'B','INO 66'
UNION ALL SELECT 'C','INO 170'
;

Once I have that, I’m ready to build a test.

Building a Test

The first thing you need is a test class. Over time, this class likely would exist in a project for my database. However, since this is a new one for me, I’ll create a class with this.

EXEC tsqlt.NewTestClass @ClassName = N'QueryTests' -- nvarchar(max)
GO

Now I build the test, which is a stored procedure. The test class above is a schema, and inside my proc, I’ll paste the first script above. I will then alter a few things. First, I’ll change each CREATE TABLE to a call to tsqlt.faketable. You can see this edit below.

2023-07-31 13_29_47-creating_a_test.sql - ARISTOTLE.sandbox (ARISTOTLE_Steve (64)) - Microsoft SQL S

Below this I have inserts. In this case, I have the test data the other dev has, so I’ll just remove the # from each statement and insert into a real table.

2023-07-31 13_30_21-creating_a_test.sql - ARISTOTLE.sandbox (ARISTOTLE_Steve (64)) - Microsoft SQL S

Below this, I’ll leave the CREATE TABLE #WantedResult alone. This is a good place for me expected results. However, I’ll also copy this create statement and edit it to create a new table. You can see this below.

2023-07-31 13_31_23-creating_a_test.sql - ARISTOTLE.sandbox (ARISTOTLE_Steve (64)) - Microsoft SQL S

The format for a test is:

  • assemble
  • act
  • assert

The assemble is above. Now let’s act. I’ll trop a fake stored procedure here for the query. I would prefer the devs use procs, but I can’t always get what I want. However, this does encourage them to start writing procs. Here’s the entire act:

2023-07-31 13_33_18-creating_a_test.sql - ARISTOTLE.sandbox (ARISTOTLE_Steve (64)) - Microsoft SQL S

This runs the procs and puts the results in my temp table.

Now I need to check this. I have the expected results in #WantedResults and the query stuff in #QueryResult. We can assert these are equal with tsqot.AssertEqualsTable. Here’s that code. The message and failmsg are whatever I want.

2023-07-31 13_34_26-creating_a_test.sql - ARISTOTLE.sandbox (ARISTOTLE_Steve (64)) - Microsoft SQL S

Below the assert, you can see the end of the test procedure with GO and then the call to run this. Here is the entire code for the test:

CREATE OR ALTER PROCEDURE QueryTests.[test basket query]
as
-- assemble
EXEC tsqlt.FakeTable
   @TableName = N'BASKET'        -- nvarchar(max)
, @SchemaName = N'dbo'       -- nvarchar(max)
EXEC tsqlt.FakeTable
   @TableName = N'ARRIVED'        -- nvarchar(max)
, @SchemaName = N'dbo'       -- nvarchar(max)
EXEC tsqlt.FakeTable
   @TableName = N'PREPARED'        -- nvarchar(max)
, @SchemaName = N'dbo'       -- nvarchar(max)

INSERT INTO BASKET
(BASKET_ID)
SELECT 'A'
UNION ALL SELECT 'B'
UNION ALL SELECT 'C'
UNION ALL SELECT 'D'
;

INSERT INTO ARRIVED
(BASKET_ID, ITEM_ID)
SELECT 'A','INO 1'
UNION ALL SELECT 'A','INO 2'
UNION ALL SELECT 'B','INO 3'
UNION ALL SELECT 'D','INO 8'

;
INSERT INTO PREPARED
(BASKET_ID, ITEM_ID)
SELECT 'A','INO 111'
UNION ALL SELECT 'B','INO 44'
UNION ALL SELECT 'B','INO 55'
UNION ALL SELECT 'B','INO 66'
UNION ALL SELECT 'C','INO 170'
;

CREATE TABLE #WantedResult
(
BASKET_ID nvarchar(15)
,IN_ID nvarchar(15)
,OUT_ID nvarchar(15)
,SORT_NBR int
)
INSERT INTO #WantedResult (BASKET_ID,IN_ID,OUT_ID,SORT_NBR)
SELECT 'A' BASKET_ID,'INO 1' IN_ID,'INO 111' OUT_ID,1 SORT_NBR
UNION ALL SELECT 'A' BASKET_ID,'INO 2' IN_ID,NULL OUT_ID,2 SORT_NBR
UNION ALL SELECT 'B' BASKET_ID,NULL,'INO 55' OUT_ID,4 SORT_NBR
UNION ALL SELECT 'B' BASKET_ID,NULL,'INO 66' OUT_ID,5 SORT_NBR
UNION ALL SELECT 'B' BASKET_ID,'INO 3' IN_ID,'INO 44' OUT_ID,3 SORT_NBR
UNION ALL SELECT 'C' BASKET_ID,NULL IN_ID,'INO 170' OUT_ID,6 SORT_NBR
UNION ALL SELECT 'D' BASKET_ID,'INO 8' IN_ID,NULL OUT_ID,7 SORT_NBR
;

CREATE TABLE #QueryResult
(
BASKET_ID nvarchar(15)
,IN_ID nvarchar(15)
,OUT_ID nvarchar(15)
,SORT_NBR int
)


-- act
INSERT #QueryResult
   EXEC dbo.BasketQuery

-- assert
EXEC tsqlt.AssertEqualsTable
   @Expected = N'#WantedResult' -- nvarchar(max)
, @Actual = N'#QueryResult'   -- nvarchar(max)
, @Message = N'query result works'  -- nvarchar(max)
, @FailMsg = N'bad query'  -- nvarchar(max)
GO

Before we run the test, we need a proc. I’m going to take the first solution from the thread linked and drop it inside a proc call like this:

CREATE OR ALTER PROCEDURE dbo.BasketQuery
AS
;WITH cte_results AS (
SELECT
BASKET.BASKET_ID
,ARRIVED.ITEM_ID IN_ID
,PREPARED.ITEM_ID OUT_ID
,ROW_NUMBER() OVER(ORDER BY BASKET.BASKET_ID,ARRIVED.BASKET_ID,PREPARED.BASKET_ID) AS row_num
FROM BASKET BASKET
LEFT JOIN ARRIVED ARRIVED
ON BASKET.BASKET_ID=ARRIVED.BASKET_ID
LEFT JOIN PREPARED PREPARED
ON BASKET.BASKET_ID=PREPARED.BASKET_ID
--ORDER BY BASKET.BASKET_ID,ARRIVED.BASKET_ID,PREPARED.BASKET_ID
)
SELECT
c1.BASKET_ID,
CASE WHEN EXISTS(SELECT 1 FROM cte_results c2 WHERE c2.BASKET_ID = c1.BASKET_ID AND c2.IN_ID = c1.IN_ID AND c2.row_num < c1.row_num) THEN NULL ELSE c1.IN_ID END,
CASE WHEN EXISTS(SELECT 1 FROM cte_results c2 WHERE c2.BASKET_ID = c1.BASKET_ID AND c2.OUT_ID = c1.OUT_ID AND c2.row_num < c1.row_num) THEN NULL ELSE c1.OUT_ID END
FROM cte_results c1
ORDER BY c1.row_num
GO

When I call my proc, I see this:

2023-07-31 13_36_53-creating_a_test.sql - ARISTOTLE.sandbox (ARISTOTLE_Steve (64))_ - Microsoft SQL

My test failed. Why? The query doesn’t have the right number of columns for the result. Easy for a developer to miss. Easy to catch with a real testing framework.

I can alter the proc by adding a 4th column to the results, returning the row number. When I do that, my code for the final select is:

SELECT
c1.BASKET_ID,
CASE WHEN EXISTS(SELECT 1 FROM cte_results c2 WHERE c2.BASKET_ID = c1.BASKET_ID AND c2.IN_ID = c1.IN_ID AND c2.row_num < c1.row_num) THEN NULL ELSE c1.IN_ID END,
CASE WHEN EXISTS(SELECT 1 FROM cte_results c2 WHERE c2.BASKET_ID = c1.BASKET_ID AND c2.OUT_ID = c1.OUT_ID AND c2.row_num < c1.row_num) THEN NULL ELSE c1.OUT_ID END
, c1.row_num AS sort_nbr
FROM cte_results c1
ORDER BY c1.row_num

Now if I run my test, I see this:

2023-07-31 13_39_02-creating_a_test.sql - ARISTOTLE.sandbox (ARISTOTLE_Steve (64)) - Microsoft SQL S

It worked. The query has the right results, at least according to the original developer. I might double check their sample data, but regardless of how I change my own dev database, or if the data changes, this test will continue to run with this set of data.

Testing a Refactor

In the thread, there was a second query. I can test that easily as well. All I’ll do is run this code:

CREATE OR ALTER PROCEDURE dbo.BasketQuery
AS
SELECT b.BASKET_ID
  , CASE WHEN ROW_NUMBER() OVER(PARTITION BY b.BASKET_ID, a.ITEM_ID ORDER BY p.ITEM_ID) = 1 THEN a.ITEM_ID END AS IN_ID
  , CASE WHEN ROW_NUMBER() OVER(PARTITION BY b.BASKET_ID, p.ITEM_ID ORDER BY a.ITEM_ID) = 1 THEN p.ITEM_ID END AS OUT_ID
      , ROW_NUMBER() OVER(ORDER BY B.BASKET_ID, a.ITEM_ID, p.ITEM_ID) AS Sort_Nbr
FROM BASKET AS b
LEFT OUTER JOIN ARRIVED AS a
      ON b.BASKET_ID = a.BASKET_ID
LEFT OUTER JOIN PREPARED AS p
     ON b.BASKET_ID = p.BASKET_ID
GO

This is Drew’s solution, where I just replaced one query in the proc with another. After I do that, I just run this again, no test code changes.

EXEC tsqlt.run 'QueryTests.[test basket query]'
GO

I get the same result as the previous test: success.

Adding Other Cases

If I wanted to test other test cases, I could do one of two things here. First, I could also the test data that is assembled in the top of the test and then alter the inserts for the #WantedResult table.

The other option is copy this entire test, give it a different name, and then add different test data that might test specific things. Like, what if I’m missing data in a column or even a table. Whatever crazy cases I get from customers in the live database, I can build tests for those.

Performance

I don’t have a good way to programmatically test performance, but if I were capturing this code in a VCS, I could easily see two versions of the proc. I’d do something like this:

CREATE OR ALTER PROCEDURE dbo.BasketQuery
AS
;WITH cte_results AS (
SELECT
BASKET.BASKET_ID
,ARRIVED.ITEM_ID IN_ID
,PREPARED.ITEM_ID OUT_ID
,ROW_NUMBER() OVER(ORDER BY BASKET.BASKET_ID,ARRIVED.BASKET_ID,PREPARED.BASKET_ID) AS row_num
FROM BASKET BASKET
LEFT JOIN ARRIVED ARRIVED
ON BASKET.BASKET_ID=ARRIVED.BASKET_ID
LEFT JOIN PREPARED PREPARED
ON BASKET.BASKET_ID=PREPARED.BASKET_ID
--ORDER BY BASKET.BASKET_ID,ARRIVED.BASKET_ID,PREPARED.BASKET_ID
)
SELECT
c1.BASKET_ID,
CASE WHEN EXISTS(SELECT 1 FROM cte_results c2 WHERE c2.BASKET_ID = c1.BASKET_ID AND c2.IN_ID = c1.IN_ID AND c2.row_num < c1.row_num) THEN NULL ELSE c1.IN_ID END,
CASE WHEN EXISTS(SELECT 1 FROM cte_results c2 WHERE c2.BASKET_ID = c1.BASKET_ID AND c2.OUT_ID = c1.OUT_ID AND c2.row_num < c1.row_num) THEN NULL ELSE c1.OUT_ID END
, c1.row_num AS sort_nbr
FROM cte_results c1
ORDER BY c1.row_num
GO
SET STATISTICS IO ON
EXEC dbo.BasketQuery
SET STATISTICS IO OFF
GO
CREATE OR ALTER PROCEDURE dbo.BasketQuery
AS
SELECT b.BASKET_ID
  , CASE WHEN ROW_NUMBER() OVER(PARTITION BY b.BASKET_ID, a.ITEM_ID ORDER BY p.ITEM_ID) = 1 THEN a.ITEM_ID END AS IN_ID
  , CASE WHEN ROW_NUMBER() OVER(PARTITION BY b.BASKET_ID, p.ITEM_ID ORDER BY a.ITEM_ID) = 1 THEN p.ITEM_ID END AS OUT_ID
      , ROW_NUMBER() OVER(ORDER BY B.BASKET_ID, a.ITEM_ID, p.ITEM_ID) AS Sort_Nbr
FROM BASKET AS b
LEFT OUTER JOIN ARRIVED AS a
      ON b.BASKET_ID = a.BASKET_ID
LEFT OUTER JOIN PREPARED AS p
     ON b.BASKET_ID = p.BASKET_ID
GO
SET STATISTICS IO ON
EXEC dbo.BasketQuery
SET STATISTICS IO OFF

When I run this, I see these results:

2023-07-31 13_46_49-creating_a_test.sql - ARISTOTLE.sandbox (ARISTOTLE_Steve (64))_ - Microsoft SQL

Clearly query 2 runs more efficiently than query 1. That’s the one that needs to be submitted to the VCS for deployment. Of course, I’d ensure I’d run the test to be sure my query works, and then the CI process would re-run the test because it can’t trust me.

And it shouldn’t.

Summary

It took my much longer to write this post, especially with picking code formatting and shooting images, than it did to write the test. Literally within 5 minutes I’d pasted the code from the thread twice and use it to create real tables and the test as shown above. I verified both queries within 5 minutes.

Using a known process to test is good. It helps your developers. Getting set up with test data, and then learning to re-use some of those Assemble structures will help your team get up to speed and spread the load of creating data. This also forces developers to think logically through what they’re testing and not just vaguely looking at results and thinking they’re correct.

This also means as we discover holes we haven’t tested, or we want to refactor and enhance our code, we prevent regressions.

Give tsqlt a try and see if it can help you.

Posted in Blog | Tagged , , , | 2 Comments

Power BI and Git

When Power BI was released, I was working with customers to try and adopt DevOps for their database code. At that time, I thought the Microsoft developers had made an extremely poor decision with the PBIX format in that it wasn’t well suited for version control. In 2011, when it was first released, we not only had decades of knowledge about version control and the power of DevOps, but we’d also been living with the extremely poor format of code in Integration Services packages (among other data tools), that were poorly suited for understanding in version control systems.

Power BI has grown, and with the Fabric announcements, there are integrations with git for your workspace. There is also a Power BI Developer mode that saves your project in a folder structure rather than one file, which is better. I’m not sure it’s great, especially if it doesn’t enforce text in a way that ensures a developer can easily see changes between settings and understand them. I can certainly diff the XML in two SSIS packages, but that doesn’t mean that it’s easy to understand what has changed.

I think all modern tools used to build software, including the various reporting systems, need to be built with a pipeline and team development in mind, and set up to ensure that multiple developers can review code in some sort of Git-pull-request format. For those tools that use GUI formatting, I’d hope they separate out GUI elements from others, as no one needs to pull request a change that moves a visual box on screen left 2 pixels. However, if you change the OnError response or formula inside the box, that should be easy to see in code.

This isn’t a simple concept to design for, especially for software that is based on a novel idea and is often built by a small team. However, periodic review of how the software stores changes and how a less experienced developer might review those changes is important to ensuring that our tools scale to the modern needs of distributed team development where our collaboration is often through protocols like pull requests.

Steve Jones

Listen to the podcast at Libsyn, Stitcher, Spotify, or iTunes.

Posted in Editorial | Tagged , | Comments Off on Power BI and Git

Friday Flyway Tips: More Config Options

Working with various Flyway configuration options used to be a pain since they were either CLI parameters or in a text files. We’ve made editing these easier in Flyway 6.5.4.

I’ve been working with Flyway Desktop for work more and more as we transition from older SSMS plugins to the standalone tool. This series looks at some tips I’ve gotten along the way.

All the Options

Maybe not all, but most and certainly more.

In the Schema Model tab, there is a button that says “Static data & comparisons”. I don’t love the UX, but there’s a lot of stuff to show here, and this is really relevant here, which is where we start using these options: when we get the object definition.

2023-08-10 15_41_20-Flyway Desktop

If you click this, you’ll see a dialog pop up, with any tables where you are tracking static (or reference/lookup) data. There are two other tabs: comparison options and data options.

2023-08-10 15_41_28-Flyway Desktop

Clicking comparison options shows you the options for schema comparison. Essentially the options for SQL Compare (for SQL Server), Schema Compare for Oracle (for Oracle), or Redgate Compare (everything else). You can toggle these on or off by clicking checkboxes.

2023-08-10 15_41_36-Flyway Desktop

Likewise, there are the static data comparison options, equivalent to project settings in Data Compare.

2023-08-10 15_41_48-Flyway Desktop

There are still a lot of options, and these can be confusing. Knowing what is set and choosing those isn’t simple, but it is easier than trying to edit these in a config file.

Try it out today. If you haven’t worked with Flyway Desktop, download it today. There is a free version that organizes migrations and paid versions with many more features.

Video Walkthrough

I made a quick video showing this as well. You can watch it below, or check out all the Flyway videos I’ve added:

Posted in Blog | Tagged , , , | Comments Off on Friday Flyway Tips: More Config Options

Removing Weak Security from SQL Server

I was checking some arguments in the RESTORE command for SQL Server and saw that the MEDIAPASSWORD option was deprecated and marked as being removed at some point. That made sense, and I assumed that PASSWORD was the option to be used moving forward. However, that option is also marked as deprecated.

What should we do?

Well, we need to better secure our backup files for sure. Disk encryption and limits on AD/directory permissions ought to be set. Of course, we need to use care when handling or moving these files, especially when they cross the secure boundary in production systems to other dev/test/etc. networks.

The security section of the document explains the reasoning here. The protection provided is weak and isn’t intended to protect data. You can still read data in the backup file. This is mainly to prevent an incorrect restore when using tools, meaning the human picking the wrong file. This isn’t to protect your data.

I suspect most people dealing with SQL Server backups that use either of these options don’t know this. They think the password secures their data. I know because I’ve seen people use this to send a backup file through email or file transfer to another party. However, if you’ve ever opened a backup file in XVI32 or another editor, you will see that your data is in plain text. If you’ve never done that, give it a try today and search for strings that you know are stored in the database.

Some security is better than no security and layers of security that build on each other are useful. However, depending on weak security isn’t good. It leads people to count on something that doesn’t work and ignore more serious issues.

I’m glad that Microsoft is (supposedly) removing these options. I understand that backwards compatibility and preventing existing scripts from failing are important. At the same time, we need to move forward. I’d like these password items to become a no-op, and not cause errors, but I would hope that their use in scripts would also generate a message to the user that these options don’t work and need to remove removed. Perhaps with a direct message and a note in the error log, we’d start to see people embracing other security practices that will provide more protection.

Steve Jones

Listen to the podcast at Libsyn, Stitcher, Spotify, or iTunes.

Posted in Editorial | Tagged | Comments Off on Removing Weak Security from SQL Server