Still time to enter the SQL Clone contest

You could win an Amazon Echo from Redgate. Our contest is still open for a few days, so post a comment on my blog and take your chances. I’ll be picking from the comments next week.

Posted in Blog | Tagged , , | Leave a comment

Becoming a Data Scientist

Data Science is hot. There are lots of companies excited by using machine learning and AI to enhance their applications. There are new jobs, some of them well paying, and certainly not enough people to fill them. In many ways this reminds me of previous “hot” areas, such as Novell Networking in the late 80s/early 90s. Companies wanted new CNEs and paid dearly for them. The same thing happened in the mid 90s with MCSE’s for Microsoft networks. Many of the people hired weren’t remotely qualified, having just completed some multi-week boot camp.

You could go to school. If you have completed college, there are a list of data science graduate programs that you could choose from and pursue a masters degree. There’s even a blog where someone is documenting their masters degree path to becoming a data scientist. This isn’t a quick or east path, but it is one way to gain data science skills.

If you don’t want to spend the time or expense of a formal college program, Microsoft has a data science curriculum on the EdX platform that you can complete. These are low cost programs that you can complete to get a certificate. The value of that certificate is debatable, but the same could be said for any program. A few people that are working through this program have found it to be a good set of resources that is teaching them valuable skills.

There are other options, no shortage of books, blogs, and other resources on data science and data analysis techniques. It’s up to you, however, to learn what you need to know and become competent at a level that is useful for some organization to pay you. I dislike people choosing to study a topic for a job, so I would say that if you wish to go down this path, do so because you enjoy the work and find it interesting. Build some skills, build a portfolio of data science projects, and best of luck.

Our industry has thrived for a long time on simple analysis, and I think there will be jobs in this area for some time to come. I do expect that better looking reports and dashboards are going to be expected rather than simple tables, so I’d suggest everyone work on their visualization and report polishing skills. I also think that more complex data science techniques will be in demand, though I wouldn’t expect job growth here that overwhelms current jobs. Tackle data science if you like, but be aware this isn’t a simple or easy chore. There are lots of math and statistics involved and it looks like this is more science than just data.

Steve Jones

Posted in Editorial | Tagged , | Leave a comment

DevOps Basics– git Cloning Repos

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

This continues my series on git, because I think it’s good to know what the command line is and how to use it.

Once you have git running, the thing I’ve often wanted to do is go get code from somewhere. Certainly the creation of a repo and new code is something you might do, but often you’ll be looking to get code from somewhere else, so let’s look at how we can do this.

Find a Repo

Most of the time I find a repo somewhere in the company or on the Internet. Fro example, I have a simple database structure that I’ve used for demos at Github. This is my ASimpleTalkDB Repo, which is at: https://github.com/way0utwest/ASimpleTalkDB. You can see it here:

2017-04-06 13_54_32-way0utwest_ASimpleTalkDB_ Demo repo for Presentations

Off to the right is a “clone” button, which is what we want to do. We want to perform a git clone. If you click the button, a URL is in an edit box. The URL is: https://github.com/way0utwest/ASimpleTalkDB, the same as the repo above.

Let’s clone this. First, get a folder to store code in. I started with a Documents\Github folder on my various machines. I’ll use that, and as you can see, this is a place I have a number of folders, each one a repo. I don’t have this repo set up yet.

2017-04-06 13_58_06-C__Users_way0u_Documents_GitHub

I’ll clone this from the command line. The various tools do this, but now you’ll understand how this works. First, open a command line in this folder and then type “git clone https://github.com/way0utwest/ASimpleTalkDB”. This will clone the code, as shown below:

2017-04-06 13_59_46-way0utwest_ASimpleTalkDB_ Demo repo for Presentations

By default, the name of the repo becomes a folder name, with all the code below that. If I look in Explorer, I can see this.

2017-04-06 13_59_58-C__Users_way0u_Documents_GitHub_ASimpleTalkDB

However, I can control this. I’ll delete the folder and do this again. That’s the power of a VCS. I don’t need to worry about this code, because I’ll go get it from another repo.

Once I delete the folder, I’ll re-run the git clone command, but with the name of a folder added to the end.

2017-04-06 14_01_55-cmd

These objects, 264 of them, copy to my machine in a few seconds over hotel wi-fi. This is code, just text, and it’s quick. If I change to the folder and check the status, I see it is a real repo:

2017-04-06 14_03_01-cmd

Cloning Folders

What if I have code inside the company, and not in Github, Gitlab, BitBucket, VSTS, etc. Can I clone things? Yep, you can, but don’t use this to get a copy of your own code. That’s what branches are for. We’ll talk branches later, but for now, we can assume you might have a repo in your company.

For example, let’s assume for a moment that my C:\users\%username%\source\repos folder is on the network. I can clone one of these repos like this:

2017-04-06 14_08_12-cmd

Again, this isn’t the way to get a copy of my own code to work on. This is for getting a repo that I want to work on for myself, where I’ll then merge changes back to the original repo on another machine. On my own machine, I’d just use branches.

2017-04-06 14_08_12-cmd

This will help you get some code, and I’d encourage you to copy some code down and see how it works. Go get some code from my repo and build a db if you have SQL Source Control (point this to your cloned repo), or grab something from Microsoft and play.

That’s it for this post. There are lots of places to go. I’ll talk about how to now push your code elsewhere once you’ve changed it in another post as well as how to branch and accomplish a few other things.

A few resources (more boring, but will help you learn if you want):

git clone (git)

Create a Repo (Channel 9)

Posted in Blog | Tagged , , , | Leave a comment

Python in SQL Server

The Microsoft Data Amp event was this week, with a number of SQL Server announcements. We now know this is going to be called SQL Server 2017, which is no surprise. CTP 2.0 was also released, which includes a few improvements, but one in particular caught my eye. Python code can now run inside SQL Server, meaning we are not limited to the R language for data analysis. That’s exciting to me and I think that’s a great change. It’s one that’s been rumored for some time. I like Python, and think it’s a nice, elegant language. My son has been learning it in college and I’ve spent some time playing with it after the SDTIG got me interested with their beginning Python and Data Science modules. Since then I’ve spent a little time learning R, but find it to be a cumbersome language to work within. Having code embedded in a string in a SQL Server stored procedure made things harder, and I kept hoping that Python would be added.

It’s there now and you can play with it. While you might debate whether this is a good use of SQL Server resources, I think for quick, short, specific analysis calls, it’s a good choice. Certainly complex calculations and data manipulations are not always efficient in T-SQL and having another option is great. You’ll want to be careful when you run Python or R code, but these are good options to have. There is also GPU processing support is available to take some load off your main CPU.

CTP 2 is supposedly feature complete and there are more changes than just Python. Tempdb has a better setup experience, we get the first version of graph data and query support, resumable online index rebuilds, adaptive query processing, and more. You can read the announcement, which isn’t heavy on technical details, but gives you an idea of what’s changing. I’m not much of an SSAS person, but there are some enhancements in this platform as well.

There are other improvements and announcements this week. Azure Analysis Services is generally available (no longer beta), as as Azure Data Lake Analytics and Data Store. DocumentDB is improved and some Cognitive Services are GA, and in general, Microsoft is all in on the analytics and deep learning.

The world of data in the Microsoft world is growing and changing rapidly. We’ll have a new version this year, just a little over a year after SQL Server 2016. I expect that we will continue to see more, and new changes, occurring at a rapid pace in the future. None of us will completely keep up and become experts in all areas, but we have lots of options for new learning and experimenting with advanced features. I’d encourage you to take a little time here and there and read about and play with the new features. If you have Docker, you can get a version of SQL Server 2017 up and running without installing anything on your machine to see if there are any changes that might help your organization take better advantage of  the data you work with every day.

Steve Jones

 

Posted in Editorial | Tagged , | Leave a comment

A Sore Throat from Traveling

I’ve been avoiding recording podcasts for most of the last two weeks, so there haven’t been any. My apologies to those people that like the editorial as a podcast, but I’ve been ill and my voice is slightly shot.

After SQLBits, I took ill on the trip home. I had a miserable day in airplanes and then a complete day in bed when I got home. I can’t remember the last time I lay in bed for a day and didn’t do anything productive, but I didn’t bother to even turn on a computer.

It’s been a week and a half, and I’m still a little down. My chest is a bit tight and my voice is weak. At least I’m not feverish as I spent a solid week of fever coming and going, as well as alternating chills and sweats throughout the days and nights.

With a few commitments coming up (SQL Sat Silicon Valley and a Microsoft recording scheduled), I decided to rest things as best I can. Even my assistant coaching volleyball has had me miming actions and softly talking to kids rather than trying to push through.

Traveling is hard, and apparently my two weeks abroad didn’t end well. I’m not quite sure what I caught, but it appears to be some sort of flu bug that is lingering.

Hopefully I’ll be back to full strength before too long, but if I’m a little slow or quiet at events, you’ll know why.

Posted in Blog | Tagged | Leave a comment

SQL Server on Linux (Ubuntu)–Updating from Early CTP

I’ve been testing the SQL Server on Linux for a long time. After the recent Microsoft Data Amp event, I saw the announcement for CTP-2. So I did what I’ve been doing for some time on my Ubuntu machine:

sudo apt-get update

sudo apt-get install mssql-server

However, I got a message I didn’t expect.

2017-04-19 10_59_27-Ubuntu 64-bit SQL Server .210 - VMware Workstation

A few people on Twitter noted they had updated to CTP 2, but I was still on CTP1.4.

Hmmm.

I looked back through the apt-get log and I saw this:

2017-04-19 11_00_22-Ubuntu 64-bit SQL Server .210 - VMware Workstation

I’m cutting off part of the path, since I think it’s probably NDA. No worries, apparently the old location for me hasn’t been updated with new packages, which makes sense.

I decided to check the MS docs and see how a new user would get SSoL running? At the new docs.microsoft site, I found the Install SQL Server on Ubuntu doc.

Following the instructions, I updated the GPG keys and registered the repository with curl:

curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -

curl https://packages.microsoft.com/config/ubuntu/16.04/mssql-server.list | sudo tee /etc/apt/sources.list.d/mssql-server.list

Once those were done, I re-ran my apt-get update/install and it worked.

2017-04-19 10_58_46-SQLQuery9.sql - 192.168.1.210.master (sa (51))_ - Microsoft SQL Server Managemen

If you haven’t worked with Linux, it’s not really that hard. Some of you may get asked to install or manage a system, and while you should seek help for anything production related, in dev areas, play around. Read and learn a bit about Linux (I’d recommend watching the SDTIG book club meetings I, II, III, IV on Linux), download Ubuntu (or anything) and play around. Get used to a command line and try things.

You’ll love SQL Server on Linux. It’s mostly just SQL Server, but updates and installs as simple in a test world.

Posted in Blog | Tagged , , | Leave a comment

SQL Server Telemetry

One of the things that I’ve seen blogged about and noted in the news over the last year is the amount of data being collected by various software systems. In particular, I’ve seen numerous complaints and concerns over what data Microsoft collects with its platforms. Perhaps this is because Microsoft is the largest vendor, and plenty of other software collects usage information, primarily to determine if features are being used or working correctly. I think much of this is overblown, but I can understand having sensitivity about our computer usage, especially for home operating systems.

Microsoft conducted an extensive review and published data about what is being collected and why. Windows 10 and other products have undergone review and must comply with the published policies. There’s even a Microsoft Privacy site to address concerns and explain things. It’s an easy to read policy that explains what  Microsoft is collectin, if you’re connected to the Internet (if you’re not, don’t worry, no bloating files). That’s a huge step forward in an area that is evolving and something I wouldn’t have expected to see in the past. I am glad Microsoft is making strides here, even if I may not agree with specific items in their policies. I do think that most of the companies collecting this data are doing so to improve the products, not spy on customers. I’m sure some do, but likely smaller organizations with some sort of criminal intent.

As data becomes more important, telemetry for software is potentially a data leakage vector where private, personal, or customer information might be leaked. Certainly as more speech and other customized services are used in businesses, I worry about what data could be accidentally disclosed. After all, it’s not that super powerful smart phone that is actually converting audio to text in many cases; it’s a computer somewhere in the vendor’s cloud.

With databases, this has also been a concern from some people. I’ve seen the Customer Experience Improvement Program for years and usually opted in. I’m rarely doing something sensitive and I hope that with more data, Microsoft improves the platform. That’s the stated goal, and I’d seen them talk about this a few times. The SQL Server has moved forward and published an explicit policy that spells out what and when data is collected. It was actually just updated recently and all new versions of the platform must provide this information (if anything is different) and adhere to what they disclose. There is a chance that user data could leak into a crash dump, though users have the opportunity to review data before it is sent to Microsoft. I’m not sure how many will, but they have the chance.

I would like to be sure that anything sent is secured, and perhaps have an easy way to audit the data sent in a session, but I know this entire process is evolving. One important item to note is that customers can opt-out of data collection for any paid for versions of SQL Server. That’s entirely fair, but if you have regulatory concerns, you should be sure that you don’t restore production data to development machines. You shouldn’t anyway, but just an FYI.

Usage data is going to be a part of the future of software, especially as more “services” integrate into what we think of as software. Those services aren’t always going to be under our control and certainly part of the reason many of these are inexpensive is that additional data is captured about the people using the software. I hope all companies publish and adhere to some sort of privacy statement, and maybe that’s a good start. Rather than any regulation on specific privacy that must exist, start forcing companies to publish and stick to whatever policy they choose.

Steve Jones

 

Posted in Editorial | Tagged | Leave a comment

Liable for Data Loss

When I first installed Windows 7, I was thrilled. Finally, Microsoft had slimmed down and improved the OS performance rather than continuing to bloat it larger. After Vista, Windows 7 was a welcome change. Windows 8 was different, but for me as a desktop user, it wasn’t much different. I moved to Windows 10 as a beta user and thought it performed well. A little slower than 7, but overall a good move. I was glad to get the upgrade notice on a second machine, but it was annoying. Trying to easily delay or avoid the change in the middle of some travel was hard. I certainly could sympathize with the users that complained they didn’t want the upgrade and couldn’t easily avoid it. I’m glad Microsoft changed this process a bit.

There were people that accidentally, or felt forced, to upgrade. Among those, some of them lost data and decided to sue Microsoft. Let’s leave aside the Windows upgrade process, Microsoft’s decision, and the merits of this particular case. Those are separate issues from the one I want to discuss, which is the liability for data loss. At the core of the lawsuit, the time and information that people have lost is an issue that few of us have had to deal with in our careers. At least, most of us haven’t had to worry we are liable for the issues our software might cause.

Are we moving to a place where a person, or more likely a company, is going to be held liable for the data loss from upgrades or patches? There is the ability of customers to initiate legal actions, but strong EULAs and prior legal decisions seem to indicate that much of the liability resides with customers and vendors aren’t at fault. Is that a good thing? I’m not sure, but I do think that as data becomes more important and is used to justify decisions or drive business actions, there will be a push to ensure that anyone performing data changes with their software during patches and upgrades is liable for issues.

I’m surprised we haven’t seen more of accountability from software firms to date, but I think much of the legal issues have been settled without much fanfare and strong non disclosure agreements. I’m not sure this is the best solution for anyone, as to force some improvement and better quality for software, we need to take better care of our data. I don’t want us to move slower with software development or deployment, but I do want quality to improve.

Steve Jones

Posted in Editorial | Tagged | Leave a comment

Agent Phases in VSTS Deployment

VSTS (Visual Studio Team Services) continues to grow and change over time. It seems every few months I see some new features and changes, which are good and bad. Good in that usually things improve. Bad in that I get confused and lost at times, and sometimes have to re-acquaint myself with the system.

One of the changes I noticed recently was the Deployment phases, which come as Server and Agent phases. I won’t cover server here, but I wanted to make a quick note on Agent Phases because I think they’re handy and a nice packaging concept.

Agents

There are agents that run for VSTS to do work. I’ve written about how to download an agent so tasks run inside of your network (on a server or client).  This post will look at the other side, how do I control an agent from the release process.

Once I get a new release definition, I’ll see something like this:

2017-04-02 11_18_05-New Empty Definition 02-Apr - Visual Studio Team Services

This is a blank deployment definition. To the left I see the environment I’m working with, in this case, just the generic Environment 1 name. For each environment, I can set up tasks. On the right I have the Add Tasks button, but also the “Run on Agent” item. This is the container where I will add tasks.

I used to just add tasks with VSTS 2015 here, expecting that everything would run on an agent. VSTS has hosted agents that contain a set list of software, and also downloadable agents that you run on your servers or workstations. Things changed recently to include phases.

Phases

The phases are place of execution, which allow a series of tasks to be grouped together. VSTS and TFS 2017 give you two choices. We now have Agent or Server phases. The difference between these is that I can choose where items execute, on an agent or on the server (VSTS or TFS). In addition, I can set these to execute in parallel if need be.

If I click the down arrow next to the Add Tasks, I’ll get this dialog. This allows me to set up a process that can run on the VSTS server, or one that can run on the Agent.

2017-04-02 11_18_25-New Empty Definition 02-Apr - Visual Studio Team Services

I haven’t played with server phases, but when I add an agent phase, this gives me a section under which I add tasks. Note that when I click the “Agent phase” item, I get a new blade to the right. Note that the pool for these agents is the default.

2017-04-02 11_18_43-New Empty Definition 02-Apr - Visual Studio Team Services

This threw me early on because I have a separate pool of agents for mobile development. As I upgraded some deployment definitions, I couldn’t figure out why I didn’t have any agents available to execute my tasks. It turns out I needed to change the agent pool here.

There are options here, but I haven’t really played with parallelism or demands since I typically deploy to one system at a time.

Once I’ve selected an agent, I can click Add Tasks and then add a series of tasks to that agent. If you look below, I have added a couple tasks to each of the two agent phases. These will run one phase, then the other.

2017-04-02 11_19_47-New Empty Definition 02-Apr - Visual Studio Team Services

Each of these phases is independent, and as of the time I wrote this, there’s no way to copy a phase and duplicate it.

I’ve used this to upgrade my release definitions by adding new tasks, and then copying settings between one phase and the other. This is a manual, copy/paste operation, but I can easily see if I’ve duplicated everything by going back and forth quickly.

2017-04-02 16_32_43-ST Pipeline_Mobile - Visual Studio Team Services

Once I’m done, I can just click the “x” to the right of the agent phase and delete the entire phase. I have to confirm this action, and then I’ve removed a complete phase.

Right now I haven’t see any benefit outside of the upgrade scenario, but that’s useful. I will duplicate my tasks, disable all the ones in one phase and then see if the other one runs. If it doesn’t, I can always enable the old tasks and disable the new ones.

I expect as I look to perform more complex deployments I will use these to logically separate things, and also perhaps use the server to do some things when I don’t need an agent.

Posted in Blog | Tagged , , , | Leave a comment

Making VSTS Deployment Changes to Databases without Breaking Your Application

One of the things that I do a lot is demo changes to databases with CI/CD in a DevOps fashion. However, I also want to make some application changes to my sample app without breaking things. As a result, I’ve built a few ideas that work well for both situations. I found recently these ideas can help me when I need to actually upgrade or change my CI/CD pipeline.

Note: DevOps isn’t a thing, it’s a set of principles, ideas, and culture that produces results. I will endeavor to ensure I call out the specific principles I use to adhere to DevOps ideas. In this case, automation and CD.

Release On Your Schedule

One of the principles of DevOps is that we use feedback loops to ensure information moves from right (operations) to left (development).  One way of doing this is to release more often, though this isn’t required. What we really mean is release on your schedule, when you want, and allow developers to get feedback on their work quickly.

If a developer takes a month to build a feature, they don’t need hourly or daily releases. They need a release after the month is over (assuming testing, review, etc. has taken place). If releases occur on the 5th of the month, and the developer finishes on the 6th, they must wait a month before they get feedback. What I want to ensure is that we can release on the 5th, and also on the 6th if I need to.

Upgrading my VSTS Pipeline

I wrote in another post that I planned on upgrading my Redgate DLM Automation tasks in VSTS. I was doing this on a trip to a conference, and I didn’t want to start making the demo changes I’d make as I’d have to undo them, and it’s possible that I’d forget to clean something up. I hate making that silly mistakes, so I needed a way of testing my build and release without breaking demos.

I decided to use a technique that I use in presentations when I’m talking process and not code. In those cases, what I deploy doesn’t matter, but if I change random objects, sometimes I break the application using the database. As you can see here, I needed to do a bunch of tests, and repeat some.

2017-04-01 11_52_03-ST Pipeline_Mobile - Visual Studio Team Services

 

Deploy Often

When I talk with Redgate clients, and they are starting to get comfortable with the idea of deploying on their own schedule, they will sometimes make innocuous changes that trigger a deployment they can test, without breaking things. For example, a user might take this stored procedure (partially shown):

ALTER PROCEDURE [dbo].[GetEmailErrorsByDay]
  @date DATE = null
/*
Description:

Changes:
Date       Who         Notes
———- —         —————————————————
2/14/2017  WAY0UTWESTVAIO\way0u   
*/
AS
BEGIN

IF @date IS NULL
  SELECT @date = DATEADD(DAY, -1, GETDATE())

  SELECT Errcount = COUNT(*)
   FROM dbo.EmailErrorLog
   WHERE EmailDate = @date

 

and make a small change. Perhaps they add SET NOCOUNT ON, or maybe they’ll add a comment. I’ve even see someone add this code:

AS

BEGIN

SELECT 2 = 2

These aren’t big changes, and certainly choose which stored procedure to change (one that isn’t used a lot or is critical). These are changes to test your process, gates, approvals, deployments, etc.

I decided to try something else. I tend to write this code when I want to test changes, since I can be additive with a procedure like this:

CREATE PROCEDURE Get7

as

SELECT 7

Or I can modify things with this:

ALTER PROCEDURE Get7

  @plus int

as

SELECT 7 + @plus

In either case, I can see if my changes go through. Since I often deploy to multiple environments (QA, Staging, UAT, production, etc.), and I don’t always have deployments go all the way through (see my image above), I will usually end up creating Get7, Get8, Get9 as subsequent procedures. This way I can continue to commit new changes to the VCS, get new builds, and get new releases.

I can also do this with tables. My favorite is MyTable, MyTable2, etc. I usually just have an integer MyID column, but I can add other ones to test the ALTER process. I can even use this to test data movement, static (reference/lookup) data, or anything else to do with tables.

Eventually (hopefully) I get clean deployments all the way through to all environments.

Cleanup

I sometimes get collissions, where a test will return the error that “Get10 exists”, and I’ll move on to Get11. However, I don’t want to leave those objects in all environments. After all, likely I’ll find a way to improve things in the future and I’ll want to repeat this testing.

This usually is one last deployment for me. I’ll delete these objects in dev, and then deploy the changes all the way through to production. This allows me to test my checks to prevent data loss, if I have any. Including if approvers actually read scripts Winking smile

Posted in Blog | Tagged , , , , | Leave a comment