Stretch Database is finally going away. It is being retired. It was deprecated on Nov 16, 2022, from SQL Server 2022. Effective Jul 9, 2024, the supporting Azure service is retired. I saw this in an announcement on Jul 3, though I hope anyone using this service has been seeing lots of reminders over the last couple of years. I know I’m getting MySQL retirement notices for one of my services and need to migrate some workloads this month.
If you tried this service, you might have realized that the pricing didn’t make sense for most of us. If you hadn’t tried it, it worked by moving some of the data in your tables into Azure, where it could be queried if needed. It was an interesting idea, though most of us would have wanted this to work between two SQL Server instances, not between SQL Server on-premises and Azure.
In any case, if you’re on an older version of SQL Server, the recommendation is that you bring your data back on-premises. If you have SQL Server 2022, they recommend CETaS (Create External Table as Select), which lets you query data in Azure storage. This lets you put some data in text formats and query it as needed, reducing the use of expensive relational storage disks. Parquet is the recommended format here. There’s also a weird mention of Fabric in the announcement, which doesn’t seem to fit with the objective of the rest of article.
I haven’t worked with this enough to know how well this performs or what patterns might fit here. I do know that reducing the queries from clients, especially SELECT * and unbounded queries across all your data helps. If your clients always want to query all data in a table, then nothing works well. Building systems that query hot data by default, and filter away from warm/cold data is always helpful. Having some good indexing is also important.
I don’t know of anyone that uses Stretch Database, so I’m not sure how many people are affected here, but I hope they knew about this before Jul 9.
Test Data Manager (TDM) is a suite of products from Redgate that make it easy to build dev and test databases in seconds. It’s a nice rewrite of a number of pieces of technology that we have sold for years, and it was launched at the PASS Data Community Summit in 2023.
I’ve been working with a few customers and sales engineers as they evaluate the fit for TDM in their environment. TDM is great once it’s running and can create a lot of agility for development teams as well as help them build better tested, higher quality software.
One of the challenges we’ve found is that the setup can be complex, and the knowledge required to get up to speed is high. There are a lot of moving parts to get this working in a way that makes it seem simple for the end users (usually developers).
In the spirit of ingeniously simple software, one of our engineers, Alex Yates, built a mini-PoC system using PowerShell that’s available. This post shows how you can get started to demo subsetting and masking in less than 10 minutes.
There are really a few things you need, but the tool does most of the work:
dbatools
Redgate tools (and a Redgate account)
git
make sure c:\temp exists
If you have dbatools installed, we import the module, if not, we download and install it for you. For the tools, we’ll download them and start a trial or you can get these tools and a license from your account executive. These tools run on various platforms, so ping your rep or sales@red-gate.com.
You should have git installed, and temp should be there.
Git makes this super simple. Just clone this down.
Configuration
There isn’t much to configure. In the repo, open run-auto-masklet.ps1 and look at the first 15 lines. These are where you might change things.
The local instance is set in line 2. If you have a named instance, use that. I’ve included a config file below that I used on a named instance.
That’s it. The repo includes a copy of Northwind in an install script to create the full sized database as Northwind_FullRestore. The subsetter will then move a portion of data to Northwind_Subset. If you want to change these names, you can do that.
Running the Tool
This tool can run run as a normal user, but if you need dbatools installed (it will do that), then it needs to run as Admin. I added that as a requirement above, so you don’t need to run this as an admin.
Here’s what the tool does:
Get dbatools
Get the latest versions of the subsetter and anonymize.
authorize you, and start a trial if a license isn’t assigned to your Redgate ID.
drop the two databases (Northwind_FullRestore and Northwind_subset by default)
Creates the two databases with schema (and data for the full restore)
pauses with output
runs the subsetter to move a portion of data to the Northwind_Subset database
pauses
runs the classification process against Northwind_Subset to classify columns
pauses
Runs the masked against Northwind_Subset to mask data
Here’s the first set of output, showing the config and first part of the process.
Here is the first pause. You can see there are db create notes and then an explanation of what to see:
I like that this gives the subset command, which takes some getting used to. The TDM GUI hides this, but every customer has wanted to customize things, so this is a helpful way to do the PoC.
The subsetter does a lot, as you can see below, but basically it map out the database and then starts to determine which data needs to move. In this case, lines 10 and 11 of the source scripts shows that we are subsetting dbo.Orders with the OrderID<10260.
When this is complete, we get another message that explains what happened. We get some telemetry as well with the time taken here.
We also see the next part of the process, which is classifying the data. Again, we see the command for this, and you see this runs quickly.
Lastly, the next pause tells us there is a classification file at a particular location. We get the path if we want to look or edit the file.
Then the masker runs, and we see that 5 tables are masked. We get telemetry and below the results you see, there is more info on what’s happened and what to look for in the databases.
Checking the Tool
Once the execution is complete, I decided to look at the two databases. In SSMS, I had a vertical tab group to compare things.
First, subsets. I’ll count orders, order details, products, and employees. You can see the original db on the left and the subset on the right. Less data.
Not super impressive, but imagine there were a factor of 1000 on the left. That would be cool.
What about masking? Let’s check.
The Shippers, Suppliers, Employees, Customers, and Ordere tables were masked. Let’s look at Shippers. We can see the phone number is masked.
Checking Employees, I see less as there is a subset here, but I see data masked.
You should see similar results, and what’s more, you can alter the various config files or filters to test how your changes work.
I’m a big believer in sandboxes for learning and experimenting. This gives you a nice sandbox. You can change the various files or script and then re-run the tool in a couple minutes to see your changes.
Here are the CLI docs you might use to change things:
I almost missed this month, so this is also a good #SQLNewBlogger post. I thought about it for a few minutes as I ate breakfast at my desk and then knocked this out.
This is the monthly T-SQL Tuesday blog party. I manage the site at tsqltuesday.com, trying to keep the party going. I have a lot of help from hosts each month running the topic, and I appreciate their efforts. Join in an write, and then host a month. Lots of people have done it.
Past Advice
I struggle with this, as I’ve had a great life. I wouldn’t change anything, given where I am today, 33 years after my first data job. However, knowing I’m not changing my life in some time travel way, this is something I wish I’d have known about in my early 20s.
Network and help others in the community.
I’ve done this lightly in my first few jobs, but mostly within organizations. As I’ve grown and changed jobs, I’ve seen tremendous power in people getting together to talk, to get to know each other, to share problems and solutions, to present their knowledge and learnings with others.
User groups were a core part of this, and out of them grew the PASS organization and Summit, the SQL Saturdays, and the amazing community we have. It’ s far different than other communities, and many of them see it as well. They wish their world was like the data platform world.
The power of networking is amazing. The rich world that comes from community is something special.
It’s still there today and it’s worth joining, no matter where you are in your career.
I had a lot of local branches for a repo (actually a few repos). I know these are old and not used anymore, so how do I delete them? This post shows how to do that on Windows.
Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers. You can see all posts on Git as well.
The Problem
As I’ve been making changes for various SQL Saturday events
I saw this SO post, which was a good starting point. I grabbed this code, which I’ll explain below.
I assume most of you installed Git and have Git Bash. The xargs and awk commands are Unix/Linux ones, so you need a bask shell to tun them
The solution for me, was to open a bash shell in the repo with the right click menu on Windows.
Then run the code:
Local branches removed. Well, almost; read the next section.
Additions
Note that in the first execution, I had two errors noting that there were some unmerged branches. When I look at these, I see they were old branches, ones that haven’t been used in years. I’m guessing either I was fixing something for someone, or they fixed something in another branch.
So, I forced delete by re-running the command a capital D.
How this Works
This code uses some Unix based utilities that I haven’t used in a long time. The flow of this is similar to how PowerShell, or even VBScript works, but on a single line. In this case, this code:
Gets a list of branches from the remote with git fetch after pruning the references for local branches that don’t exist on te remote.
Run the branch command with the verbose output. Could be –verbose as well
Take the output if the previous step and pipe that through awk. This command will parse text, looking for “gone” in a line and then printing the branch name.
This text is then taking with xargs and passing it to the git branch command with the delete option.
Note this doesn’t force delete branches.
SQL New Blogger
This post took about 20 minutes to write. I spent about 5 minutes checking a few code examples online, and then tried one after I’d killed branches from GitHub. I don’t have a great solution there, but I don’t do this often and I can click a few buttons to manage this.
I then structured this post with a few screenshots and spent 15 minutes working on it. I’d actually sketched it in 5 minutes with the major sections and a sentence in each and realized this would be quick to write, so I just filled it in on a Sunday morning.
You could do this as well and give an interviewer something to ask you in the next interview. This might catch their eye. I’d also suggest (and I will) do a few posts on awk and xargs. Those are good skills to have and you might spent 20 minutes experimenting and having fun with them.