Adding Manual Relationships Between Tables in the TDM Subsetter

I wrote about getting the Redgate Test Data Manager set up in 10 minutes before, and a follow up post on using your own backup. One of the things I didn’t show from my own database was that it had no FKs, so the subsetting didn’t quite work as I wanted.

A previous post showed how add starting tables for the subsetter to look at, however that didn’t get me a good data set for testing. This post continues looking at the subsetter by adding manual relationships to our configuration.

This is part of a series of posts on TDM. Check out the tag for other posts.

Declaring a Relationship in the Options File

In my previous post, I’d picked a starting table and had reduced the dbo.players table from 16564 to 1800. However, I only had player information. If I query my subset database, I see there is a player, but I have no batting statistics for this player.

2025-01_0196

This is because my table has no declared FKs in it. If I check the dbo.batting table, I can see only a PK.

2025-01_0197

Let’s fix this.

Declaring Manual FK Relationships

In the options file documentation, there is a section that notes manual relationships can be declared with a key called “manualRelationships”. If I copy/paste the example section into my options file, I’ll see this:

2025-01_0198

I don’t have a SourceTest table, so let me edit things. I’ll set a relationship between dbo.players.playerID and dbo.batting.playerID. This gives me the following in my options file.

2025-01_0199

Before I run my subsetter, here are the row counts by table.

2025-01_0200

I’ll re-run this subset command, which includes my option file at the end.

rgsubset run --database-engine=sqlserver --source-connection-string="server=localhost;database=BB_FullRestore;Trusted_Connection=yes;TrustServerCertificate=yes" --target-connection-string="server=localhost;database=BB_Subset;Trusted_Connection=yes;TrustServerCertificate=yes" --target-database-write-mode Overwrite --options-file E:\Documents\git\TDM-Demos\rgsubset-options-bb.json

When I do that, I know see these row counts. Note I now have batting rows.

2025-01_0201

My player query won’t work, so I still need to declare another relationship with the dbo.teams table. That is shown below:

2025-01_0202

I can re-run the same command above, and then I see this set of rowcounts (original on left, subset on right).

2025-01_0203

There is teams data, and if I re-run my queries from the top, I can see stats now.

2025-01_0204

Now I have a dataset that I can perform development work with in terms of players, teams, and batting.

I can also add more relationships as needed, for example, I’ll add this section to include pitching, batting post, and fielding. Here’s my complete options file:

{
  "jsonSchemaVersion": 1,
  "startingTables": [
    { 
      "table":
      {
        "schema": "dbo",
        "name": "players"
      },
      "filterClause": "birthState = 'CA'"
    }
  ],
  "manualRelationships": [
    {
      "sourceTable": 
        { 
          "schema": "dbo", 
          "name": "players"
        },
      "sourceColumns": [ "playerID" ],
      "targetTable": 
        { 
          "schema": "dbo", 
          "name": "batting" 
        },
      "targetColumns": [ "playerID" ]
     },
     {
        "sourceTable": 
          { 
            "schema": "dbo",
 
            "name": "batting"
          },
        "sourceColumns": [ "teamID", "yearID", "lgID" ],
        "targetTable": 
          { 
            "schema": "dbo", 
            "name": "teams" 
          },
        "targetColumns": [ "teamID", "yearID", "lgID" ]
       },
       {
          "sourceTable": 
            { 
              "schema": "dbo", 
              "name": "players"
            },
          "sourceColumns": [ "playerID" ],
          "targetTable": 
            { 
              "schema": "dbo", 
              "name": "battingpost" 
            },
          "targetColumns": [ "playerID"]
         },
         {
            "sourceTable": 
              { 
                "schema": "dbo", 
                "name": "players"
              },
            "sourceColumns": [ "playerID" ],
            "targetTable": 
              { 
                "schema": "dbo", 
                "name": "pitching" 
              },
            "targetColumns": [ "playerID"]
           },
           {
              "sourceTable": 
                { 
                  "schema": "dbo", 
                  "name": "players"
                },
              "sourceColumns": [ "playerID" ],
              "targetTable": 
                { 
                  "schema": "dbo",
 
                  "name": "fielding"
                },
              "targetColumns": [ "playerID"]
             }
  ]
}

After re-running the subsetter, I have these row counts. Note there are rows in all the tables defined in the options file.

2025-01_0205

I can keep adding in more tables as needed to ensure the subsetter can walk down the data relationships I need in my database to produce a useable dev/test dataset that’s smaller than production.

TDM can help your devs build better software and with the subsetter, this can create lots of agility to ensure the data you need to accurately build this software is available.

Give TDM a try today from the repo and a trial, or contact one of our reps and get moving with help from our sales engineers.

Video Walkthrough

Check out a video of my demoing this below:

Posted in Blog | Tagged , , , | Comments Off on Adding Manual Relationships Between Tables in the TDM Subsetter

Creating a GitHub Actions Self-Hosted Runner

I had to demo the Flyway Autopilot system recently and created a GitHub Actions runner as a part of that. This post documents how this went.

First, if you go to the settings in a repo and click the Actions area, you see a Runners item. Click that. Notice I have no runners.

2025-01_0235

In the upper right corner, I can click a button to create one.

2025-01_0236

This gives me the instructions to get a new one. Note, these are PowerShell commands, and the first command doesn’t quite work right. Still, this is what I need.

2025-01_0237

I opened a CMD window and stared running these. Note, I need to repeat the change directory.

2025-01_0238

Now start PowerShell as the next commands are PoSh ones. When I copy the next command, it starts downloading a zip file. As of this writing, this is a 600-ish MB file.

2025-01_0239

Once this is done, you can run the next commands, which unzip and configure this. For the config, I just hit enter as I don’t have multiple groups or tags, and I leave it named as my machine. I also don’t bother to run this as a service.

The last command runs the runner agent.

2025-01_0240

If I go back to the Runner screen in Settings, I see I have an idle agent set up.

2025-01_0241

And that’s it. If I pick one of my workflows in the Actions tab, I can run it and I’ll see the job started in my runner folder. Here are my actions with the Run workflow button on the right.

2025-02_0284

If I click this, I see the job start in the CLI.

2025-02_0285

If I get back to the Actions, I’ll see things in progress. As you can see, I was slow here.

2025-02_0283

That’s about it. Now I can run local automations in my repo that connect to things like local databases, which can be handy.

Video Walkthrough

I’ve got a video of this process if you want to watch it.

Posted in Blog | Tagged , , | 2 Comments

Extended Event Comfort

I saw an interesting thread recently in the SQL Server Community Slack where someone posted about extended events (XE). They were asking about whether XE would have a problem with a situation. The problem wasn’t so interesting, but a quote from one of the responders was. The quote was:

The best time to have learned Extended Events was ten years ago. The second best time is today.

I love that, and I tend to agree. If you need to trace what is happening inside your SQL Server, you need to learn how to capture information with Extended Events. That’s the best way to dig into the details of how queries affect your system.

It’s also hard. I know that whenever I need to use it, which is rare, I have to dig through some articles and docs to understand what thing I need to do. Even having some scripts hasn’t helped because it’s a sufficiently complex system that unless I use it regularly, I forget how all the filters, targets, events, etc. work.

On one hand, I think it’s amazing, and on the other, it’s too hard to use. Even when I try the Extended Events profiler, it’s so different from Profiler that I find myself getting frustrated at times trying to dig through the information.

I am curious how many of you think XE is easy to configure and if you use it often. What are the places it works well? For those of you that don’t use XE or haven’t learned, why not? Do you not have to trace what’s happening with queries in some detail? Or do you have another way that you dive deep into your system? Or do you not have the need?

If you do want to learn more, we have a short Stairway Series on Extended Events to help you get started, as well as a few other articles. If you’re an expert, we’d love a few more on using XE in specific situations.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged , | 4 Comments

No End to Programmers

There’s an interesting piece at O’Reilly this week titled The End of Programming as We Know It. That’s actually a good title, but it doesn’t mean the end to programmers, developers, software engineers, of whatever people call themselves.

The piece looks back at history, starting with physically connecting circuits to program and moving to switch flippers, cards, compiled languages, and on to the web and mobile systems. Each of this brought more and more programmers into the industry because there’s no end to the software needs of the world and it’s hard to write good code.

I like that the web was seen as the end of programming as anyone could easily build a minimal application to share things. Even frameworks like WordPress let novices create applications that do all sorts of useful things. However, we still need programmers. Look around at how many WordPress consultants are willing to help you.

That’s because there’s a difference between configuring something off a shelf (or from a digital store) and actually having a working application for your situation. There’s no end to what people want to build, and really, they need someone to build it for them. Or at least modify, improve, or re-implement the proof of concept they created.

The world of AI LLMs, and chat-oriented programming, is no different. Anyone can ask for an app and get a reasonable prototype. However, I don’t think this means less programmers. While more people will build more PoCs or prototypes, that also means more people will need a professional to clean up their system and make it work, scale, and perform well. And probably help secure it better.

Programmers (developers, software engineers, etc.) will scale themselves with AI tech, as they’ll get the AI to do some scaffolding and initial work that they clean up. I’m sure the very best will build RAG AI systems that generate the type of code they want with a little training and input. I could see millions of AI assistants that help write basis code outlines, write tests, check for standards and code smells, and help the human shift left, catching small, simple, silly mistakes.

I think we’ll have more people producing software in the future, but I also think that those who know their industry well will be in more demand. They’ll recognize issues in AI-generated code, they’ll guide AIs better, and they’ll communicate more clearly with AIs. They’ll get their pick of the best jobs, and they might be better compensated.

Things they likely do today with their fellow humans.

Steve Jones

Posted in Editorial | Tagged , | Comments Off on No End to Programmers