redesis– n. a feeling of queasiness while offering someone advice, knowing they might well face a totally different set of constraints and capabilities, any of which might propel them to a wildly different outcome – which makes you wonder if all of your hard-earned wisdom is fundamentally nontransferable, like handing someone a gift card in your name that probably expired years ago.
These days, as I speak with others as a coach or just community peer, I feel some redesis about any sort of advice I give that isn’t tightly bound in some way to a situation. Any sort of general advice has me wondering if the things that worked for me will work for others.
I recognize that the way each of us see the world varies, depending on your situation. I think EVs are great and convenient, but I’m in a position where I can afford one and afford a charging station in my house. I don’t worry about the price of eggs or bacon, but that’s because I’m lucky. I don’t worry about health insurance because I’m healthy now.
The advice I give people about their careers is always tempered to remember my redesis. The things that worked for me might not work for others, so I’m careful to explain the situations where this worked for me rather than give the advice without context.
The Solutions Engineers at Redgate recently released an Introduction to Redgate Flyway Autopilot course on our Redgate University. They’ve been working on this for quite some time to help people get started with Flyway in their own environment. It’s gotten smooth and slick, so I’m going to set this up in 10 minutes in this post and video, but with a twist. I’m using my schema to show you how easy this is.
I’ve been working with Flyway Desktop for work more and more as we transition from older SSMS plugins to the standalone tool. This series looks at some tips I’ve gotten along the way.
Having a SQL Server or PostgreSQL server (I’ll use SQL Server)
You will also get a Redgate Token and set up a local runner. I won’t detail those steps here, but I will have them in another post. The video will also skip those steps.
Creating a Repository
I’m working in GitHub, but you can do this in Azure DevOps. Others work, but those aren’t in the course. The main thing to do is go to the official Redgate repo at: https://github.com/red-gate/Flyway-AutoPilot-FastTrack
This brings you to this site:
From there, don’t clone or fork, but use as a template. This is in the upper right corner.
When you click this, you get the Create a new repository page, that looks like this. If you’re familiar with GitHub, this looks like any other repo. Give it a name, which must be unique in your org. I added FWAutopilot as I already have an “Autopilot” repo that is public.
You can make this public or private, but just be aware of this from the standpoint of your org, especially if you add internal schemas. You can also set a description.
Once this is created, you’ll see the repo in your org. Here’s my Autopilot repo:
This is a copy of the template all set up. Now, on to Flyway Desktop.
Creating a Project
In Flyway Desktop, I’ll click the drop down by Open Project and select Open from Version Control.
Here I’ll paste in the URL of my repo. I also check that the folder for the local clone is valid. In my case, I tend to put things in Documents/Git, but you might have your own standard.
Once this clones down, you can see a repo in the path above that looks like the online repo. Flyway Desktop will also refresh the schema, which should give you this error.
This is because the databases don’t exist. As you can see, I’ve filtered to databases with “auto” in the name and I have nothing.
If I use the file | open in SSMS, I can go into the repo and into the Scripts folder, where I see this:
I want to open the CreateAutoPilotDatabases.sql script, which looks like what you see below. This creates 5 databases and adds schema objects to one of them. The goal is for Autopilot to use Database DevOps and Flyway to migrate these changes to other databases.
Run this, and I see different databases. I’ve refreshed things, and you can see Prod has nothing but Dev has objects.
Now that I have a db, let’s refresh Flyway Desktop. Now I see no changes.
Note: If you aren’t doing this on your localhost instance, then you can edit the connections to the dev database (and other databases).
Adding Our Own Schema
Don’t start deleting schema objects yet, but you can add your own. I’ll do that. I have a script that contains a schema for baseball data. The beginning is shown below, but I’ll run this in my AutopilotDev database.
After I do this, I’ll refresh Flyway Desktop again and I see my tables. This is a partial list as the full list scrolls off the screen.
I’ll select all these from the checkbox at the top next to Object Name and then click “Save to project” in the upper right. This writes the CREATE scripts for all objects to the schema model, as you can see below in the update message.
The next step (shown at the bottom) is to generate a migration script. I’ll click that.
I get the screen below, which shows me all the changes that have been made to objects. I can select one or more of these to put into a migration script (deployment script). If I don’t select them all, then I will see those I haven’t selected re-appear here and I can add them to a different script. To keep this simple, I’ve selected them all.
When I click “Generate script” in the upper right, Flyway will create a script containing all the objects I’ve selected with the appropriate create/alter/drop code inside. Here is my one large script. You can see the start of the script below. If we scrolled, we’d see the CREATE for all the tables in here.
This project is configured to automatically generate an undo script, so below the above part, there is the undo script. Again, this is just the beginning of the script. However, you can see before we drop tables, we need to remove constraints.
Once I’m happy, I can click save and this is written as a migration script (and an undo script).
I can click Verify, which essentially runs these scripts against my shadow database, but I don’t do that if I haven’t altered the scripts. You can if you want.
Now that we’ve made some changes, let’s commit those. On the right side of Flyway Desktop is the VCS blade. You can see I have 28 changes in my repo.
If I click the “28”, this opens to the commit tab. I can also click the arrow at the top and select the commit tab. In here, I see my changes and I can include all of them or some of them and write a commit message.
I’ve selected them all and written a message, so I’ll click the drop down by commit and select the combined Commit and Push.
If I check my repo, I see this commit included. You can see this altered the schema-model and migrations folders.
Now we need to keep this Database DevOps flow going and deploy our code.
Setting Up Runners
If I check the Actions tab in my repo, I see there are two workflows configured. They are the same, but one works for Windows and one for Linux. I don’t have any runs yet and I haven’t configured a runner.
If I go to Settings in my repo and the Actions | Runners area, I’ll see this. The runners are the agents that execute your code. In this case, I need to setup a new one. I’ll detail that in another post.
Once I have a runner set up, I should see something like this:
Adding Secrets
Flyway is a licensed product, so I need to tell the runner that it is licensed to use the product. If you don’t have a license, this system can get you a 28-day trial, but if you have one, you can just use that.
Note: I’ve deleted this token, so this code doesn’t work.
Don’t close this, but open a new browser tab for your repo. Go to the Secrets | Actions section under Settings. You should see this. Click New repository secret.
The documentation notes you need to add two secrets: FLYWAY_TOKEN and FLYWAY_EMAIL. These are essentially secret variables picked up by the automation. When I click new, I add the email like this.
I added the token in the same way, pasting in the token from the portal. When I finished, I see two secrets.
Run the Automation
Check your production database (and the test one). There should be no objects, which is what we saw above.
Now, go to the Actions section of your repo. Click the Windows workflow on the left (or Linux if you used that).
Now on the right, click Run workflow, and then Run again in the pop up.
In a minute, your web page should show this running with a yellow circle before the name.
Your CLI window should look like this as well, with a job running.
If you click then name of the run on the web page, you should then see the three tasks. Here my build completed before I could get the screenshot, but yours likely has the yellow on the build database.
If I click any of the tasks, I’ll see the logging as they run. In this shot below, I’ve clicked the running prod deploy, as that was running when I was ready for the screen shot.
The output scrolls along and can be hart to follow, but after any of these are complete, you can click on them and see the task outline. Each of these items below can be expanded by clicking on the angle bracket. You can see I’ve expanded the Migrate Test DB task.
However, most of the time we assume we have a repeatable, reliable execution of our migration, so we don’t care. The proof is in checking the databases.
Here is my refreshed AutoPilotTest database.
and here is the AutopilotProd database.
I moved code from dev –> VCS –> test –> prod without executing it anywhere past Dev. This is the way changes should be made to test them before they hit prod.
We should also have feature branches, PRs, and more, but that’s beyond the 10 minutes to get started. From here, I could easily make other changes in dev and get them deployed by clicking a button in GitHub.
Summary
This process took me ten minutes. To be fair, I’d tested it a few times, but in knowing what things are needed in the docs, it took me ten minutes, which I show in a video below.
Flyway is an incredible way of deploying changes from one database to another, and now includes both migration-based and state-based deployments. You get the flexibility you need to control database changes in your environment. If you’ve never used it, give it a try today. It works for SQL Server, Oracle, PostgreSQL and nearly 50 other platforms.
Most of you reading this have likely seen a data center, and maybe sat in one while connecting to a database, but how many of you have had to try and do some of the work to set one up? It’s not something most data professionals do, but since we often have some of the most expensive boxes in one, I thought it would be interesting to discuss what goes into getting one running.
There’s a blog post that starts to cover what it takes to set up a data center. I found it interesting to read as I’ve been a part of the group choosing a space a few times, and a lot of the issues discussed in the blog were things I dealt with or experienced. Both with large and small companies, I’ve helped to spec out and decide what we need from a colocation facility or our own room in an office. In smaller companies, I’ve had some of the “what-the-duck” moments outlined in the post when I tried to rack and connect systems, only to find some issue with power, connections, etc.
When I read that post, I think about the months of getting everything set up, it’s hard to see this as more cost-effective than the cloud in some ways. Certainly setting up racks and servers is easy. Getting them powered, having a cold and hot aisle, and deciding what gets connected to where these are relatively simple things. However, making decisions on networking, routing, and other complex tasks can take specialized expertise. For a single company, finding, hiring, and retaining someone who knows this stuff well is hard. Startups might get someone excited, but the really good people who can design this stuff are often working at a cloud company of some sort.
Maybe those people are tired of the FAANG/Microsoft world and want to come to your startup, but for how long? And how do you replace them? Lots of companies found maintaining a data center to be more expensive than it was worth.
I suppose a similar situation in the database world is setting up a complex system, perhaps something like a new Firebolt warehouse. If the person that did that leaves, can the rest of the company fill their shoes? Do you know how Firebolt works? Would it be better to use a more popular system, like Snowflake or Databricks, with lots of other people writing articles and blogs about how things work? I think it would be, if for no other reason than there are other models to follow and more people that can give you guidance.
A data center, whether a private one or a cell in a large cloud provider’s footprint, is a complex beast. The days of using a closet with servers on a bookshelf are likely gone for most commercial enterprises. Even SQL Server Central moved on to hosted systems. We used to rent a rack in the early 2000s in a co-location facility, one I enjoyed visiting, as long as the visits were few and far between. Now I like connecting to EC2, which is much simpler and easier.
When I ran the subsetter PoC with my own backup, I showed this screen comparing the size of the dbo.players table before and after.
Here’s what was missing. Let’s look at the counts from all tables. This seems OK, 10%-ish for most tables. The smaller ones are excepted there.
However, if I look at the subset and my LahmanID = 11, I see the player, but no batting stats.
That’s not useful. We got a random 10% from these tables, and my data isn’t intact. In a larger database, I might miss that I had incomplete, unmatched data across tables and write reports or queries that seemed to work, but really didn’t.
Let’s fix this.
Customizing the Subset
We have the ability to customize the way the TDM tools work by adding in things we know, which can’t be detected. Like FKs between tables that aren’t declared. In this case, I want to add in a relationship.
Note: the best place to do this is in a settings file, which I’ll customize for my purposes.
If I look in the TDM-Automasklet repo, there’s a settings file already setup that grabs certain orders for Northwind.
I’ll change this as follows to just grab all players born in CA. I have no idea how many this is, but let’s filter on that.
Let’s start the TDM-AutoMasklet and just run through the subset. To use my settings file, I’ll alter this line (20) in the file:
Since I added my file to another repo, I’ll put in the full path:
I’ll run the PoC and I get to the subset section. I’ll stop, but I’ll copy the subsetter command, which is highlighted below.
This command is this (broken into lines for clarity):
rgsubset run --database-engine=sqlserver --source-connection-string="server=localhost;database=BB_FullRestore;Trusted_Connection=yes;TrustServerCertificate=yes" --target-connection-string="server=localhost;database=BB_Subset;Trusted_Connection=yes;TrustServerCertificate=yes" --target-database-write-mode Overwrite
However, as of Jan 15, this is missing the options file. I’ll add that to the command, which will now look like this:
rgsubset run --database-engine=sqlserver --source-connection-string="server=localhost;database=BB_FullRestore;Trusted_Connection=yes;TrustServerCertificate=yes" --target-connection-string="server=localhost;database=BB_Subset;Trusted_Connection=yes;TrustServerCertificate=yes" --target-database-write-mode Overwrite --options-file E:\Documents\git\TDM-Demos\rgsubset-options-bb.json
When I run this
If I now look at rowcounts, I see this in the original (left) and subset (right) databases. Note that there rather than 10% of most tables, I now see 1800 players, but almost no rows in other tables.
This happens as I’ve selected a starting table, a parent, but since I don’t have declared relationships as FKs, the subsetter essentially had no idea where to go. It filtered based no dbo.players.birthState = ‘CA’, but that’s it.
I can add a second starting table, teams, as well. When I do that, I have this in my options file. Note, I’ve filtered on the years after 2000. If I don’t filter, I get all the values passed through. I think this is because this is a fairly small table and I haven’t specified a target size.
When I run this, it’s again quick and I see only 360 rows moved over in the dbo.teams table.
I could add other starting tables if I had different parts of my database that had unreleated entities. However, in this case, most of these tables are related, just not with explicit DRI.
This post has shown a way to start controlling the subsetting. In the next post, I’ll look at adding in the manual relationships.
Give TDM a try today from the repo and a trial, or contact one of our reps and get moving with help from our sales engineers.
Video Walkthrough
Check out a video of my demoing this below:
Posted inBlog|TaggedRedgate, syndicated, TDM|Comments Off on Picking a Starting Table in Test Data Manager