Brush Up on Your ETL Skills

Many of us that work with data will find requests and demands to import or export data at some point. Plenty of us have regular processes that perform these actions, and we may regularly troubleshoot or enhance these activities. In fact, I know some people have a full time, or nearly full time, position just dealing with ETL operations.

Working with data in disparate formats and the myriad of inconsistencies even when formats are known is a challenge. Integration Services is a useful tool, but many us find that we need to pre or post process data separate from a simple import or export. Some of us may prefer using T-SQL or other languages, such as R or Python, to process data rather than programming SSIS. It seems that I often find that every client wants a slightly different format or change to their data that a simple query export won’t handle.

These days, as we add in Machine Learning and other downstream processing activities, it seems that there is more and more of a need to process data beyond imports and exports. After all, it seems that the majority of the time in any ML project is spent preparing and transforming data. In addition, in Article 15 of the GDPR, there is language that notes a data subject has the right to request a copy of the data relating to them when it is being processed by an organization. I don’t know how often someone will want to get data about themselves or their organization, but I’m sure it will happen more than it happens today.

I think this means I’ll need to brush up on ETL skills, perhaps to ensure I can easily extract out a copy of an individual’s data. In fact, I probably should compile some scripts now to ensure I can let someone know what we information keep at SQLServerCentral that would fall under GDPR. I think it’s just email addresses, but I could be wrong.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.9MB) podcast or subscribe to the feed at iTunes and Libsyn.

Posted in Editorial | Tagged , | Leave a comment

Classifying Data with SSMS 17.5

I upgraded SSMS to 17.5 recently and found an interesting addition. This version has incorporated the ability to classify data. With the GDPR coming for many of us, this is a welcome addition.

This is a quick look at this feature.

Classify a Database

If I select a database and right click it in SSMS, I get a few new items in the Tasks menu (as shown).

2018-02-16 13_15_58-SQLQuery1.sql - Atlas.master (sa (51))_ - Microsoft SQL Server Management Studio

I’ll select “Classify Data”, and I get a new tab opened. I see there are some recommendations and also a list of classifications of data.

2018-02-16 13_16_28-Data Classification - NBA - Microsoft SQL Server Management Studio

There’s a getting started link, which takes me to the SQL Server Security Blog. I suspect that’s an incorrect link. I think it should go here: SQL Data Discovery and Classification.

Below this, I see a list of the recommendations. This has grabbed tables that appear to continue to contain some data that might be sensitive and require classification. One of the tenets of the GDPR  is that you know your data. You aren’t allowed to figure this out later, but rather you must proactively know what data you are collecting and processing.

2018-02-16 13_17_47-Data Classification - NBA - Microsoft SQL Server Management Studio

Here we can see a few drop downs to the right. I’ll scroll and look at these. First is the Information Type. This is listed as a name, but I have other options I can set. The list is the types of that that might be sensitive information about a data subject (a human or entity) that I need to classify.

2018-02-16 13_19_36-

Beside this is the sensitivity label. My choices here are shown below. These range from public information, which removes some of my responsibility to highly confidential and applying to the GDPR.

2018-02-16 13_21_14-Data Classification - NBA - Microsoft SQL Server Management Studio

If I’m happy with these recommendations, I can select them all (or a subset) on the left. I can click the “Accept” button to add them to the classifications I have for this database.

2018-02-16 13_22_07-Data Classification - NBA - Microsoft SQL Server Management Studio

This doesn’t save them, but adds them to the list. At the top of this tab I can see the need to “Save” my changes.

2018-02-16 13_22_13-Data Classification - NBA - Microsoft SQL Server Management Studio

Once I’ve done this, I could add more, or view a report. The report shows me this:

2018-02-16 13_22_26-Data Classification - 2_16_2018 1_22 PM - PLATO_SQL2016 - Microsoft SQL Server M

Implementation

My guess was that these are implemented as extended properties, which makes sense. That’s how many things could make SQL Server better, and I’m right. If I examine the EP for the firstname column in one table, I see this:

2018-02-16 13_26_46-Column Properties - firstname

This was the column I changed to public information. The lastname column in the same table is marked as confidential.

2018-02-16 13_26_55-Column Properties - lastname

Helpful?

Ultimately is this useful? Yes. I can see other products taking advantage of this, such as the new Data Masker from Redgate, which could let you know which columns are sensitive and not masked. I’d also expect that this is useful and important for ETL and other operations to carry this metadata to new columns that might contain transformations or movement of this data.

Posted in Blog | Tagged , , , | Leave a comment

Virtual Data

Yesterday I republished an editorial from 2014 for the holiday. The topic was production subsets of data, which has been something that many data professionals have struggled with for years. Many of us have built scripts to delete, change, obfuscate, or alter production restores as a way of providing useful, but manageable development database sets. Or maybe it’s just some of us. I’m sure more than a few of us have given up on this task and just restored production databases in entirety to test and development systems.

I changed over my career to become a fan of additively building a known dataset rather than deleting extra data. I advocate adding rows from production (properly masked/obfuscated) and maintaining this set over time as requirements change. However, this isn’t without it’s own administrative headaches. I think it’s easier, but this does require commitment from everyone to keep going over time. It’s certainly better than each developer adding their own 10 rows of data to a table for testing.

A year ago, Redgate released SQL Clone, designed to solve some of these issues. Once an image is created, new databases for test and development and be provisioned in seconds. I found this to be an amazing product that really changes how I develop against databases, though it does require me to stop getting caught up trying to undo changes or manage a single database. Instead, I need to ensure I am saving code to version control and then build the habit to drop and rebuild a baseline database.

As we’ve worked on SQL Clone, I’ve found that there are lots of companies that offer similar ways of virtualizing your data, giving you access to large, production scale systems in seconds. Data masking, obfuscation, and more are features, with some vendors requiring specific hardware. Others, like Red Gate, have software add-ons (Data Masker).  All of these products cost money, which can be an issue for many organizations, but I’m glad that this technology is growing and advancing. With GDPR and other draft legislation, many of us need to take better care of our data and build more secure architectures.

Containers are another interesting way to virtualiza data, though they don’t solve the scale issues. If you can work with a smaller data set, and maintain that, then containers might provide a fantastic way for you to learn to build, teardown, and rebuild databases in seconds.

The world of databases hasn’t changed a lot in some ways across my career, but in others, I’m amazed. Data virtualization is one of these areas, and if you haven’t trialed the technology, maybe you should give it a whirl this year.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.6MB) podcast or subscribe to the feed at iTunes and Libsyn.

Posted in Editorial | Tagged , , , | Leave a comment

Use the Proper User for Azure MySQL DB

I wrote about migrating the MySQL database from a third party provide to Azure MySQL DB recently. This was due to the provider raising the price substantially for me to host the site. However, after migrating the data, I ran into an issue.

I’d read a blog on moving WordPress data and followed the instructions, but after updating my connection string, I kept getting this at tsqltuesday.com

Error Making Database Connection

It appeared big and bold in the browser, with no additional information. There was a time I dug into MySQL, but it’s been a long time. In any case, I wasn’t sure what the issue was, so I started messing around.

In MySQL Workbench, I could add a new user with their GUI and grant privileges. I did this for a user called “tsql2sday”, with a password. However, when I updated the connection string in my Azure App Service, this didn’t work.

I tried a number of items, including manually creating my own user and assigning privileges, making this a DBA (super) user, and more. Nothing seemed to work. Some of you might have even seen the error messages as I tried to get this to work.

Eventually I checked something. In the original app service connection string, my old third party database just had the user name. This was the same for my connection in MySQL workbench. However, with my new Azure MySQL database, I needed user@dbname. For me, this was tsql2sday@tsqldb in MySQL Workbench.

Once I added the @ and host to the connection string in the App Service, I could connect. Whew, as I wasn’t looking forward to another hosting bill.

I think all the data is there, but if you find issues with tsqltuesday.com, let me know.

Posted in Blog | Tagged , , | 1 Comment

dbatools and Orphaned Users

I really like the dbatools project. This is a series of PowerShell cmdlets that are built by the community and incredibly useful for migrations between SQL Servers, but also for various administrative actions. I have a short series on these items.

One of the common issues that I would run into with refreshing development and test environments are the orphaned users. These are accounts that typically exist in production, but not in development. The logins and users are different in these environments, and often there isn’t a login on the development instance. This creates an orphaned user: one that exists in the database, but has no instance level mapping.

Cleaning up these users isn’t that hard, often with a removal of the user, mapping to a different login, or repairing this user by adding back the server login.  These aren’t difficult tasks, but the logic to write a script to quickly fix this, especially in a DR situation, may not be handy.

dbatools can help here with a few cmdlets. I’m not recommending this is any better than the T-SQL, but if you keep dbatools on your instances, this is a quick way to fix things.

With Get-DbaOrphanUser, we can quickly get a list of those users that aren’t matched on the restored server. I can limit this to an instance or a database, but it gives me a list of users that I can then pipe into one of a few other cmdlets: Repair-DbaOrphanUser and Remove-DbaOrphanUser. These two cmdlets will do, as the British say, what it says on the tin. They’ll remove or remap the users, which can make it easy to quickly get your users working again.

Note that you will want to ensure this does what you expect and run with the -WhatIf command to be sure that you aren’t altering users that you don’t want to change.

This isn’t necessarily the way I’d always clean up users, but as part of a flow that might include automated restores, data masking, and other steps, being able to access orphaned user data and repair users from  PoSh is something I’d certainly consider.

Posted in Blog | Tagged , , | Leave a comment

Template Configuration

One of my goals this year is to really spend some time learning more about Extended Events (XE). I’m somewhat embarassed that almost a decade after their introduction in SQL Server 2008, I have a fairly rudimentary grasp of the system. I’ve watched talks from Jonathan Kehayias and Erin Stellato and others. I recognize the value in a lightweight system, but for some reason I haven’t done much with XE.

As part of my learning, I’m going back to the basics with videos, articles, and documentation. In one section, I ran across the templates and how they can be used to greatly simplify the creation of new sessions. I’m a bit fan of templates, and I love the ones I use for T-SQL in SQL Prompt. I know templates are incredibly useful in plenty of other areas.

XE is more complex system. The verbiage, the complexity of filtering, and the variety of targets make this a very flexible system, but also one that can overwhelm people. When do you use the histogram target v the ring buffer? What events make the most sense to solve or diagnose your system?

Today I wonder if some of you out there can share some knowledge. Are there built in templates that you find useful for certain situations? Have you created your own that speed up your analysis of an issue? If so, it would be great if you could share some code and explain why you find certain session settings useful. If you find some templates to be problematic, perhaps share that.

As I’m learning, I think that it’s likely I’ll have my own set of events and settings that I lean on heavily. Since I can save these on an instance, and not just a workstation, that’s a huge improvement over trace. These sessions can be shared with other DBAs in my organization, which is helpful and handy. Maybe one of you will give me a new template that I can add to my toolbox.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.7MB) podcast or subscribe to the feed at iTunes and Libsyn.

Posted in Editorial | Tagged , , | Leave a comment

Enabling Guest in a Database–#SQLNewBlogger

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

The guest account exists in all your databases. This is installed by default, and guest is used to map a login that doesn’t otherwise have access to a database.

Sound scary?

It should. This would be bad if any login could connect to any database, potentially reading data using the guest account. Fortunately Microsoft has done two things. First, guest is disabled in all user databases. This is because it’s disabled in model, which is our template.

2018-02-08 08_51_24-SQLQuery8.sql - (local)_SQL2016.AdventureWorks2014 (PLATO_Steve (74))_ - Microso

Second, guest is typically assigned no rights. It’s a member of the public role, which also has no rights by default.

Enabling Guest

If you want to allow anonymous access for logins through the guest account, it’s easy. Be wary and careful of doing this and be sure you understand what rights have been granted to public if you do this. In general, I’d expect auditors and any compliance/security officers to be against this, but you should check.

The user exists already, and just needs the CONNECT permission to get enabled. You can do this with this code:

GRANT CONNECT TO guest

If you want to remove permission, use

REVOKE CONNECT FROM guest

That’s it. Remember, by default this user can’t access any objects. I would recommend you not grant rights to guest, but use roles. Either one of the built in ones, or better yet, create your own role and choose limited permissions.

SQLNewBlogger

One of the ways you can showcase your knowledge, show you’re learning, and show you’re motivated to enhance your career is blogging. This post is an example of what you could write, in your own words, about something you’ve learned.

This one took my about 5 minutes after I’d spent a little time getting guest enabled for a test project.

Posted in Blog | Tagged , , , | Leave a comment

Muddle Through

I used to work in the restaurant business. I got started after my first year at college, landing a job waiting tables in a new hotel. It was eye opening for me to see just how stressful and difficult that job can be. However, I wanted to earn money and jumped at opportunities. When we needed room service waiters to pick up shifts and elevators didn’t work, I carried trays up staircases, taking advantage of the opportunity. When a bartender didn’t show up, I volunteered to take the lunch shift. I had no idea where things were in the bar, I was 19 and didn’t know how to make drinks, and I’d make less money, but it was an opportunity that I knew would pay off at college.

It did, and at the next three jobs, I wound up me getting hired, starting work, and having either no one to train me that day or a very busy shift where I was mostly on my own to survive. I had to muddle through and learn on the fly. The ability to do that, without panicking  being overwhelmed, or giving up has served me in quite a few positions since then.

When I started working for various companies as a developer or DBA, I found myself in similar situations. Problems would arise, often the day or week I started, and I’d have to solve them. Usually with DBA positions,  I was the only one there, so I couldn’t depend on anyone else. It was  a good thing as I often found that sysadmins or developers were not managing or configuring databases in an efficient way. As I gained experience, I could make more and more of a difference earlier on at each organization.

Kevin Feasel wrote a bit about his experiences with Lucerne (near the bottom), muddling through the need to write queries. I’ve seen similar stories from other friends working with SSIS, SSRS, Redis, Azure, and more. They don’t know a lot, but they dig in and learn, making mistakes, but getting tasks done for their employer.

The ability to work through adversity, have some confidence, learn quickly, and be effective are valuable skills. Those data professionals that can do so often find more opportunities, challenges, they grow their skills, and get more compensation. Those that find reasons to avoid learning, that lack confidence in their ability to find a way to solve a problem, or are unwilling to tackle challenges often stagnate a bit. I’d like to think there are more of the former than latter in this business, but I constantly seem to find people that just look to repeat the same work they’ve done over and over for a long time.

It can be mentally difficult to start a project using technology with which you have little familiarity. It can be disconcerting to have someone ask you a question that you can’t answer because you’ve barely begun to learn. That ability to muddle through, to keep learning, accepting that you don’t have answers but can find them, knowing that some of your answers will be wrong and you’ll need to backtrack. That ability to muddle through will serve you well.

I’d urge you to develop it.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 4.0MB) podcast or subscribe to the feed at iTunes and Libsyn.

Posted in Editorial | Tagged | 2 Comments

Join me for Masking, Privacy, Protection, and Clones at SQL in the City Feb 28

Registration is open for SQL in the City 2018. Our first event is Feb 28 and I’ll be heading back to the UK for the event. This time Grant and Kathi will be there and we’ll be joined by the amazing Ike Ellis. This is an event to watch.

I’ll be covering some GDPR stuff about compliance, with good information that all DBAs and system administrators should know. This is a prove you’re doing what you have decided to do session.

In the afternoon I’ll also cover some enhancements to SQL Clone in our Privacy bundle that helps you mask data from production and build your dev/test environments in a way that can protect your sensitive data.

Some Redgate products will be used to show how to accomplish the same tasks, but all of us will be talking about core concepts and ideas that you need to know. We’d like you to consider our software if it provides you with value for the cost, but either way, you will learn about things that we all think are important.

Register today and I’ll see you in a few weeks.

Posted in Blog | Tagged , , | Leave a comment

Valentines Day for Data Professionals

It’s Valentine’s Day today, and if you’ve forgotten, you still have time to do a little shopping or show your partner that you value them. While this isn’t a public holiday in any country that I know of, many people do want to celebrate with someone special to them. In fact, if you forget about this day, chances are the next few might not be so pleasant.

For many of us, we’ll spend nearly as much time at work each week as we might spend with our families. Between a few extra hours here and there at work, commuting, and sleep, it may often feel like we spend more time at work. Hopefully that’s not the case, and if it is, you should re-evaluate your current position.

Most of us do have a relationship with our career. We’ve chosen it for some reason and are often locked into doing certain work in the short term. With today being a relationship holiday, I wanted to ask you to think about if you really love your career?

I’m not speaking about your current job and employer. You might enjoy being a developer, DBA, or other data professional while not being enamored with the position and place you work now. I’m asking for the long term, are you happy with your career choice and looking to continue moving on this path.

If you are, good for you, and I hope you find a great position that suits your life. If not, it doesn’t matter what you’ve done in the past. You can start moving forward in another direction today.

I love my job, and I’ve enjoyed the last 25 years of working with SQL Server. I don’t know if I want to do another 25, but I certainly am still in love with my choice today.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.5MB) podcast or subscribe to the feed at iTunes and Libsyn.

Posted in Editorial | Tagged | Leave a comment