Picking a Starting Table in Test Data Manager

I wrote about getting the Redgate Test Data Manager set up in 10 minutes before, and a follow up post on using your own backup. One of the things I didn’t show from my own database was that it had no FKs, so the subsetting didn’t quite work as I wanted.

This post shows how to correct things and add starting tables for the subsetter to look at in order to customize your setup.

This is part of a series of posts on TDM. Check out the tag for other posts.

The Setup

When I ran the subsetter PoC with my own backup, I showed this screen comparing the size of the dbo.players table before and after.

2025-01_0174

Here’s what was missing. Let’s look at the counts from all tables. This seems OK, 10%-ish for most tables. The smaller ones are excepted there.

However, if I look at the subset and my LahmanID = 11, I see the player, but no batting stats.

2025-01_0179

That’s not useful. We got a random 10% from these tables, and my data isn’t intact. In a larger database, I might miss that I had incomplete, unmatched data across tables and write reports or queries that seemed to work, but really didn’t.

Let’s fix this.

Customizing the Subset

We have the ability to customize the way the TDM tools work by adding in things we know, which can’t be detected. Like FKs between tables that aren’t declared. In this case, I want to add in a relationship.

Note: the best place to do this is in a settings file, which I’ll customize for my purposes.

If I look in the TDM-Automasklet repo, there’s a settings file already setup that grabs certain orders for Northwind.

2025-01_0180

I’ll change this as follows to just grab all players born in CA. I have no idea how many this is, but let’s filter on that.

2025-01_0191

Let’s start the TDM-AutoMasklet and just run through the subset. To use my settings file, I’ll alter this line (20) in the file:

2025-01_0182

Since I added my file to another repo, I’ll put in the full path:

2025-01_0183

I’ll run the PoC and I get to the subset section. I’ll stop, but I’ll copy the subsetter command, which is highlighted below.

2025-01_0185

This command is this (broken into lines for clarity):

rgsubset run --database-engine=sqlserver --source-connection-string="server=localhost;database=BB_FullRestore;Trusted_Connection=yes;TrustServerCertificate=yes" --target-connection-string="server=localhost;database=BB_Subset;Trusted_Connection=yes;TrustServerCertificate=yes" --target-database-write-mode Overwrite

However, as of Jan 15, this is missing the options file. I’ll add that to the command, which will now look like this:

rgsubset run --database-engine=sqlserver --source-connection-string="server=localhost;database=BB_FullRestore;Trusted_Connection=yes;TrustServerCertificate=yes" --target-connection-string="server=localhost;database=BB_Subset;Trusted_Connection=yes;TrustServerCertificate=yes" --target-database-write-mode Overwrite --options-file E:\Documents\git\TDM-Demos\rgsubset-options-bb.json

When I run this

2025-01_0187

If I now look at rowcounts, I see this in the original (left) and subset (right) databases. Note that there rather than 10% of most tables, I now see 1800 players, but almost no rows in other tables.

2025-01_0188

This happens as I’ve selected a starting table, a parent, but since I don’t have declared relationships as FKs, the subsetter essentially had no idea where to go. It filtered based no dbo.players.birthState = ‘CA’, but that’s it.

I can add a second starting table, teams, as well. When I do that, I have this in my options file. Note, I’ve filtered on the years after 2000. If I don’t filter, I get all the values passed through. I think this is because this is a fairly small table and I haven’t specified a target size.

2025-01_0193

When I run this, it’s again quick and I see only 360 rows moved over in the dbo.teams table.

2025-01_0194

I could add other starting tables if I had different parts of my database that had unreleated entities. However, in this case, most of these tables are related, just not with explicit DRI.

This post has shown a way to start controlling the subsetting. In the next post, I’ll look at adding in the manual relationships.

Give TDM a try today from the repo and a trial, or contact one of our reps and get moving with help from our sales engineers.

Video Walkthrough

Check out a video of my demoing this below:

Posted in Blog | Tagged , , | Comments Off on Picking a Starting Table in Test Data Manager

T-SQL Tuesday #183 Invite: Tracking Permissions

It’s time for T-SQL Tuesday again and this month I’m hosting. I realized that I didn’t host in 2024 and since I run the thing, I ought to be a part of it one of these days.

So, this is that month. The format is the invite comes out today and you have a week to write a post. Please post it on your blog and link back to this post or leave a comment on this post with the link to yours. That way I can summarize in a few weeks.

With that, the invite…

Managing Database Permissions

One of the things I see lots of people struggle with is the database permissions for their logins/users. DBAs or Ops people are always getting requests to add people, rarely requests to remove people, and not often enough, requests to audit who has permissions.

Lots of systems that live for years have people with too many permissions but no one knows this.

It could be SQL Server, Oracle, CosmosDB, RDS, PlanetScale, etc. Well not that last one, they haven’t been around for years, have they? They have, since 2018, so maybe people do have this issue in MySQL.

Ultimately, we have to audit or understand permissions at some point. Or we need to update permissions. This could be a developer task (as they add/change objects) or it could be strictly Operations. I’ve seen both.

This month, I’m wondering how you track permissions, which I assume involves some code. Show us a good way to do any of these:

  • check permissions
  • update permissions
  • add new logins/users across systems
  • remove people
  • report on permissions at various levels (user, object, database, etc.)

You should publish your post on Feb 11, 2024 UTC. Link back to this post with a trackback/pingback or a comment.

Looking forward to what you come up with.

Posted in Blog | Tagged , | 7 Comments

Declaring a Complex PK in a CREATE TABLE: #SQLNewBlogger

Recently I was talking with someone who had not named any of the primary keys (PKs) in their database. They used system generated names and when they ran comparisons, they got all sorts of drops and creates they didn’t expect. This post shows how easy it is to declare PKs with names, even with complexity.

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

The Scenario

I’m going to use a schema that I have for baseball data. It illustrates the point well, I think.

For my friend, imagine a table like this one:

CREATE TABLE [dbo].[salaries](
     [yearID] [int] NOT NULL,
     [teamID] [varchar](3) NOT NULL,
     [lgID] [varchar](2) NOT NULL,
     [playerID] [varchar](9) NOT NULL,
     [salary] [int] NULL,
PRIMARY KEY CLUSTERED 
(
     [yearID] ASC,
     [teamID] ASC,
     [lgID] ASC,
     [playerID] ASC
)
)

If I create this table, the system will decide what the PK is. In fact, after I run this, I can see in sys.objects that the name is some random code.

2025-01_0258

What’s worse is that if this were an FK and I were dropping these in some script, I’d have issues in other systems. This name is different on each system.

The better solution is to make a simple change. Before the Primary Key keyword, I can add Constraint and a name, like this:

CREATE TABLE [dbo].[salaries](
     [yearID] [int] NOT NULL,
     [teamID] [varchar](3) NOT NULL,
     [lgID] [varchar](2) NOT NULL,
     [playerID] [varchar](9) NOT NULL,
     [salary] [int] NULL,
CONSTRAINT salariesPK PRIMARY KEY CLUSTERED 
(
     [yearID] ASC,
     [teamID] ASC,
     [lgID] ASC,
     [playerID] ASC
)
)

If I run this, then I have a better named PK.

2025-01_0259

Be explicit in your work. It makes for better code and easier, repeatable, reliable deployments.

SQL New Blogger

This is a simple thing, a code smell, but one that wastes DBA and developer time. This post shows a simple thing you can do to have better code. This took me about 10 minutes and you could do something similar.

Write this and maybe someone asks you how to do this in an interview.

Posted in Blog | Tagged , , | Comments Off on Declaring a Complex PK in a CREATE TABLE: #SQLNewBlogger

Words vs Data

I would guess that most of you reading this are very comfortable looking at data for insights and answers. You might even prefer to provide a result set instead of a picture or chart to a user when they are asking for help with data analysis. However, do you add any words to your analysis to help? Any descriptions, summaries, or conclusions that could be drawn from the data or the picture?

I ran across a blog asking about the right ratio of words to data. The post uses the childhood story of Goldilocks and the Three Bears. Many of you might know the story and have drawn your own conclusions of what the story shows or means. If you read this post, you will find a very different interpretation. While some of you may not think that’s a valid interpretation, it’s possible that some thought that when they first heard the story.

The point of the post is that we can provide data and pictures, but others might interpret things differently. Each of us has our own point of view, our own experiences, and our mood. That last one might lead us to focus on a piece of data or a part of the picture that the author didn’t intend for us to focus on, or didn’t think was relevant. Without any sort of guidance on the narration from the author, we don’t know how closely our interpretation matches theirs.

Many of us have certainly seen others spin data, especially aggregates and statistics, to suit a narrative. However, the idea of providing some narrative isn’t to hide or mislead, but rather give context to what you see in the report. As the blog notes, don’t leave their interpretation to chance. Give them a “well-crafted, objectively reasonable narrative that is supported by your data.”

Or, if you don’t have one, let them know that and ask them to send you one back showing what they see or what they expect.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | Comments Off on Words vs Data