Classifying Sensitive Data

Our databases store all kinds of data in them, depending on the purpose of your database. Most of us create tables and store data in response to some requirement or necessity in our organizations. Our decisions should be driven by good design principles, and I’d hope they are, but we do need to find a place to put all the data that our system will receive.

Depending on the sensitivity or personal nature of data, we may have to alter the way we store data (perhaps encrypt it) or alter the security for parts of the system. This is almost an ad hoc, deal with it at that time task. The exceptions might be when I’ve had to conform to a regulatory statute, such as SOX, PCI, HIPAA, etc. In those cases, I’ve often had to ensure the entire database is protected in some way that ensures it meets the requirements of the statute.

The time when I have had to think about individual columns of data is usually when building a development database where potentially sensitive information can’t be transferred to development machines. In that case, because of the effort of changing data, I’ll try to build scripts that change out individual columns and ensure that sensitive data doesn’t get copied. However, the data that may be deemed sensitive for one company, isn’t always classified that way for another.

At least that’s been my experience. I shared some of this with the Redgate Foundry, who is running research into data classification. They’re looking for people to share opinions, but the project has me curious. I’ve always felt intuitively we could classify data in tables, but perhaps that’s too simplistic a way of looking at the problem. I know that legal groups struggle with some this problem with email and file server documents. Classifying the content in different ways is a challenge.

Is it the same in databases? I’m not sure. I don’t know if there complex rules needed or if this is a simple problem that we easily solve and rarely deal with. I’m curious from those of you that deal with highly regulated industries. Is data classification something that you work with often? How do you decide the data classes and does this impact your administration of the database? If you don’t classify the data, do you worry about the sensitivity of the bits in your database? Let us know today.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 4.1MB) podcast or subscribe to the feed at iTunes and Libsyn.

Posted in Editorial | Tagged , | Leave a comment

Adding Performance Counters back for SQL Server

I had a strange situation the other day, where a number of things went wrong with my instance. First, I lost permissions to detached databases. The SID was listed in the file permissions, but apparently unlinked to an account.

Next, I went to add an alert, and I only had the XTP counters.

2017-06-14 12_16_20-SQLQuery4.sql - (local)_SQL2016.sandbox2 (PLATO_Steve (63))_ - Microsoft SQL Ser

The counters are also missing in Performance Monitor. What is interesting is that I show the correct SQLAgent counters for each of my three instances.

2017-06-16 08_54_19-Add Counters

A quick search found me this blog on MSDN, where it recommends the following:

unlodctr mssqlserver

lodctr perf-mssqlserversqlctr.ini

I had a named instance, so for me I entered:

unlodctr mssql$sql2016

lodctr perfMSSQL$SQL2016sqlctr.ini

from an elevated command prompt. Running the last command again shows the counters loaded.

2017-06-16 09_19_26-cmd (Admin)

I also checked my registry, which appeared to be fine:

2017-06-16 09_08_38-Adding Performance Counters back for SQL Server - Open Live Writer

I next found another blog that noted I might need to resynch WMI, so I ran winmgmt, using the PID from Task Explorer (details tab):

2017-06-16 09_24_32-cmd (Admin)

I didn’t see counters at first, but I restarted the instance. Once that was done …

2017-06-16 09_23_29-New Alert

A nice fix, and one I probably won’t forget after this blog.

Posted in Blog | Tagged , , | Leave a comment

Improving Replication

I really like replication as a technology. I think the ability to move data around to other systems, at a gross level, is extremely handy in many systems. While SSIS and other ETL tools are very flexible and powerful, they also require quite a bit of work to maintain. Being able to send a table (or a vertical/horizontal partition) to another system is just valuable.

This is why I’m constantly disappointed that SQL Server hasn’t really bolstered their replication technologies to make it more robust.  Don’t get me wrong, there have been improvements in various versions, and replication has come a long way since SQL Server 6.5, but as an overall subsystem in SQL Server, it has a long way to go. The tooling needs work, the reliability and robustness needs work. I find replication brittle, as do many others, and when there are code deployments needed, it seems that administrators often just script out the system, tear it down, make changes, and rebuild it.

That’s not a great plan. In 2017, that shouldn’t be the plan. I don’t get why Microsoft hasn’t made things better, after all, more replication options could mean more SQL Server instances installed to support disparate workloads. However, rather than complain, I’d like to give Microsoft ideas.

Today I’m curious. What would you do to improve replication? What would you like to see? Perhaps you want better monitoring of the process. Simpler setup, such as that available in Azure. Better bidirectional replication? Maybe an easier way to deploy changes? Let us know your ideas for improving SQL Server replication.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.4MB) podcast or subscribe to the feed at iTunes and Libsyn.

Posted in Editorial | Tagged , | 8 Comments

Loading All CSV Files with PowerShell

I ran across Denis Gobo’s post about working with names from Social Security cards and wanted to play with the dataset. However, rather than use xp_cmdshell, which I might have, I decided to use PowerShell. I need to repeat this with Python, but for now, this worked.

There is a file you can download that has a lot of .txt files, each one with a CSV. You can read about this in more detail in Denis’ post. Essentially there were three fields, and the year of the file in the name. I downloaded and unzipped the files into a folder.

Here was my process:

  1. Loop through the files
  2. Extract the year from the file name
  3. bcp in the file into a staging table
  4. call a proc that takes the year as a parameter and moves all data

With that in mind, here’s how I set things up.

The SQL Server Side

I started by creating two tables.

(   FirstName  VARCHAR(500),
     Gender     CHAR(1),
     NameCount  INT,
     YearInFile INT
(   FirstName VARCHAR(500),
     Gender    CHAR(1),
     NameCount INT

With these in place, I created a procedure:

INSERT Names SELECT FirstName, Gender, NameCount, @year FROM Names2;

I tested these and they worked fine for my process. I want to load into Names2, then to get the year, I move the data to the other table, adding the year. There are probably better ways, but this worked fine for a relatively small load (few million rows).


With PoSh, I started with a simple loop. I tend to set variables first since this makes this easier to turn into a function.

$sourcefolder = “E:\Downloads\names”

$sourcefiles = Get-ChildItem $sourcefolder -Filter “*.txt”

foreach($file in $sourcefiles ){
   $yearinfile = $file.Name.Substring(3, 4)
   write-host “Loading File:” $file.Name ” year: ” $yearinfile


Running this gets me a list of all files along with the  years. That’s what I want, since I’m going to process each file and load it in with bcp. Thanks to Mike Fal for the outline of how I’ll do this.

Next, I set more variables at the top:

$InstanceName = “.\SQL2016”
$DatabaseName = “Sandbox”
$StagingTableName = “Names2”
$StagingProc = “MoveNames”

I’ll use these in this code. I call Invoke-Expression to run bcp and then Invoke-Sqlcmd to run my proc.

$cmd = “bcp ‘$DatabaseName.dbo.[$StagingTableName]’ in ‘$file’ -S’$InstanceName’ -T -c -t’,'”

Invoke-Expression $cmd
Invoke-Sqlcmd -ServerInstance $InstanceName -Database $DatabaseName -Query “$StagingProc $yearinfile”

From here, I put this in a file and ran it from my download location. The result:

2017-06-16 12_41_08-Windows PowerShell ISE

and from SQL Server:

2017-06-16 12_41_50-SQLQuery7.sql - (local)_SQL2016.sandbox (PLATO_Steve (56))_ - Microsoft SQL Serv

Now I can run some queries, and find out where I stand. #26, it turns out.

2017-06-16 12_43_35-SQLQuery7.sql - (local)_SQL2016.sandbox (PLATO_Steve (56))_ - Microsoft SQL Serv

Posted in Blog | Tagged , , | 2 Comments

The Traveling Data Professional

It’s summertime in the Northern hemispheres, and a time when much of the world takes their holiday time away from work. Most kids are out of school for at least part of the June/July/August time period. With many of us workers trying to spend time with our families, this is the part of the year that we schedule trips and events.

Taking time away to be with family is important, but just getting away from work is healthy for all of us. Whether it’s a planned trip to some new location, visiting relatives in another city, or even just taking time to enjoy a leisure activity, we all need a break. Some of us will try to do too much during our break, arriving back at work physically worn out. Even in these situations, it’s still nice to get your mind away from the environment you face most days. I think any break is good, and especially in the era of smartphones and nearly ubiquitous connectivity, time away is good for your health.

This week, I’m wondering if any of you have plans for an adventure that you’re excited about. Are you going somewhere new or engaging in some activity you look forward to? Perhaps you don’t have big plans this year, but something you’re working towards in the future? Make us jealous and let us know what holiday break might be coming your way.

I’ve been more of a winter person for most of my life, looking to take time off from work when there’s snow on the ground, but I’ve had my share of summer trips. Usually my family will camp and take a few trips. With my children getting older, I expect to take even more trips when it’s just my wife and me. Since I can work from anywhere, I can follow along on some of her horse adventures.

This year we had planned on a long trip to Glacier National Park, but with older kids, scheduling looks to be a bit of an issue, so we’ll likely stick to camping in Colorado. However, I am honored to have been chosen to speak at the Data Platform Summit in Bangalore, India, so we’ll all have a bit of a vacation in August as we travel to a new country.

Enjoy the weekend and let me know what your plans are this year.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.4MB) podcast or subscribe to the feed at iTunes and Libsyn.

Posted in Editorial | Tagged | 2 Comments

Move a Stored Procedure to a New Schema–SQLNewBlogger

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

One of the things I needed to do recently was move an object. I was testing the WideWorldImporters database and created an object in the dbo schema. That’s the default for me, which is fine. However, in this case I wanted it in a different schema.

The way to do this is with the ALTER SCHEMA command. There is a TRANSFER option, which takes the original schema and object name.

In my case, I had the dbo.GetOpenPurchaseOrderCount procedure in my database.

2017-06-07 14_40_29-SQLQuery1.sql - (local)_SQL2016.WideWorldImporters-RR (PLATO_Steve (53))_ - Micr

I used this command to move it.

TRANSFER dbo.GetOpenPurchaseOrderCount

And then verified things moved.

2017-06-07 14_40_37-SQLQuery1.sql - (local)_SQL2016.WideWorldImporters-RR (PLATO_Steve (53))_ - Micr



This was one of those quick items where I checked the ALTER commands, thinking it was in there. I didn’t see a changeobjectschema procedure, and since this was a new skill, it was a 5 minute blog.

Posted in Blog | Tagged , , , | Leave a comment

Backup Plans for Data Loss

The British Airways computer failure has been on my mind for a few weeks. It seems like such an epic failure in 2017 from a high public company that many, many people depend on. I still struggle to believe the power explanation, and really, I’m not sure there’s any explanation that I would accept. At this point in our industry, there’s no good reason for any large, global company to not be able to restart services inside of a couple hours in a DR site. In fact, it really shouldn’t even take that long.

However, for many of us, we will have a failure or disaster at some point. It might not even be a hardware failure or system crash. It’s much more likely that a user will cause an issue. As our systems grow larger, perhaps even more loaded with transactions, we might not always be able to easily separate out good data from bad, and I would expect we’ll experience a restore.

For many of us this will mean we will lose some data from the system. Even with frequent log backups, we might end up with a short period where we can’t recover data. Most of us should have conversations with business stakeholders on what the acceptable level of data loss is, and plan to meet those requirements. We should also have plans around how to rebuild data. I wouldn’t recommend a full test on a system, but it might be worth a few conversations with those that deal with transactional data and discuss how the latest data might be recreated.

No one wants to lose data, and in many cases, there are ways to rebuild or recover the data with manual efforts. Perhaps your company has paper records, or maybe there’s an audit trail that could be used to reconstruct actions. Maybe you rely on memory or even customers to provide information again. Today I’m wondering if you’ve thought about how you might recover data in a non-technical way and what methods you’d use.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.6MB) podcast or subscribe to the feed at iTunes and Libsyn.

Posted in Editorial | Tagged , | Leave a comment

Using OPENROWSET in SQL Server on Linux

I wanted to import the million song dataset in SQL Server on Linux. There’s a github repo that has the SQL to allow you to use this with the graph database features. However, it’s built for Windows.

Linux is a slightly different beast. Once I started down this path, I had memories of working on SunOS in college, messing with permissions and moving files.

I run Ubuntu in VMWare, so I first downloaded the files to my Documents folder. That’s pretty easy. However, once there, the mssql user can’t read them. Rather than mess with permissions for my home, I decided to move these to a location where the mssql user could read them.

First, I need to use mv to move the files. However, the default location for SQL Server (/var/opt/mssql) doesn’t let me drop files in there. Instead, I need to sudo the mv.

sudo mv unique_tracks.txt /var/opt/mssql/unique_tracks.txt

I repeated this for each file.

However, I still had permissions errors. Files have their own permissions in Linux, so I needed to alter those. I decided to use chown since these are temp files the SQL Server will use and once imported, I’ll delete them.

chown mssql unique_tracks.txt

From here, I could easily run the OPENROWSET commands and get the data loaded. Now to play around with a graph.

Posted in Blog | Tagged , , , | 2 Comments

Speaking at the Data Platform Summit 2017

I am honored to be speaking at the Data Platform Summit 2017. The event takes place August 17-19, 2017 in Bangalore, India. This will be my first trip to India, and I’m excited to be traveling halfway around the world (literally 12:30 hours difference) to speak at this event.

I’ve got a pre-con on Aug 16, and then a few sessions during the main conference.  If you have the chance, register and come. I’ve heard good things about the event and I look forward to seeing some of you there.

See if you can pick me out of the Teaser Video:

Posted in Blog | Tagged , , | Leave a comment

Migrating Schema Between Platforms

PostgreSQL is a popular relational database, one that was designed to be an object relational system, based on standards, and allowing for extensibility. I’ve always thought this was a great system, exceeding SQL Server in some ways, though falling short in others. It’s popular, and certainly used enough so that Azure and Amazon Aurora include PostgreSQL compatability products.

I ran across an interesting post, where AWS has a Schema Conversion Tool that will help you migrate from Oracle to Aurora running PostgreSQL. I think this makes some sense as there are some capabilities in PostgreSQL that make moving Oracle code easy. AWS also has a Migration Service to move SQL Server to MySQL. I’m not sure how smoothly that goes, but SQL isn’t a terribly complex language and perhaps the conversion isn’t too bad, though I’m sure this isn’t painless.

I wonder how many people will choose to move their database systems to a new platform. Whether you’re on a commercial database, such as SQL Server or Oracle, or on a free one (MySQL, PostgreSQL), changing the underlying platform is disruptive for many applications. I’m sure there are some software packages that could change a driver and connection string and their code might work, but I bet that’s relatively few systems.

For most of us, the cost of change is high. New skills need to be learned and new code written. Staff may leave, requiring an organization to hire new employees, which is always a bit of a gamble. The entire process is also a disruption that prevents other work from being completed in a timely manner. Often a large ROI is needed to justify the change, though that’s not always the case as sometimes management decides to change for other reasons, regardless of cost, ignoring any ROI. I have even seen some developers psuh to make the platform change, working many extra hours because of some desire to port to a new platform for dubious reasons.

I do see plenty of companies looking to move away from Oracle systems, with a very large cost for support and maintenance. While SQL Server isn’t always cheap, it is much less expensive and Microsoft has programs to help port Oracle to SQL Server. Companies may also choose to move to MySQL or PostgreSQL from Oracle, accepting the disruption and costs with a long view towards reducing their yearly outlay for database services.

I have migrated platforms before, and it’s been an ordeal each time. I haven’t ported code to a new platform, and not sure I would. Perhaps if we were abandoning a codebase and rewriting the application I’d consider it, but in most cases, the cost of staff and time to rework code makes this an expensive proposition. If you have different thoughts or experiences, let us know today.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 4.2MB) podcast or subscribe to the feed at iTunes and Libsyn.

Posted in Editorial | Leave a comment