GroupBy–Bringing DevOps to the Database

Today was my presentation in the April GroupBy conference lineup. I presented on DevOps and changing your database development to include more agile, DevOps, flexible development practices.

If you want to get the deck, the slides are here: BringingDevOpstotheDatabase.pptx.

The video is also up on the GroupBy Youtube channel

Posted in Blog | Tagged , , | 2 Comments

A Couple Weeks until VS Live–Austin

VS Live takes place in Austin on May 15-18, 2017. This is a great conference in a fun city. It’s a full stack event, with the change to learn about cloud computing, ALM/DevOps, SQL, and Visual Studio, all at the same event. You can move from room to room and learn about a variety of technologies.

If you’d like to go, register with AUSPK14 and save $400 off the full conference price.

AUSPK14

I’m speaking, covering Database CI and SQL Server 2016 Security features, and there are plenty of other great speakers on the schedule. Come join us in Austin next month, have a margarita, and learn some new ways to build better software.

Posted in Blog | Tagged , , | Leave a comment

SQL Server on Linux is Just SQL Server

I’ve been working with the SQL Server on Linux (SSoL) version for quite some time, almost a year. In all of that time, I’ve for the most part found that SQL Server is SQL Server. When I connect, run demos, check code, almost everything just works. If I didn’t bother to check @@version, this would appear to be just another SQL Server to me. That’s what a new video on Channel 9 shows as well. SQL Server is pretty much the same on both platforms. I’ve tested the Redgate tools and to all of them, SSoL is just SQL Server.

There are some differences, which is to be expected. Any operations that access the file system and require paths work a bit differently, and for those people that end up administering the product, there will be some changes to get used to. The advanced HA features are similar, but again, some work is required. However, this isn’t all bad. I’m impressed with the apt-get process (I’m testing on Ubuntu), which is way, way easier than any patching or updating process I’ve gone through on Windows. In fact, setting up an Ubuntu VM last year was easy, and installing SQL Server was about as easy as it could be.

The tooling on Linux isn’t as mature, and I don’t know when we will see a GUI client, but as I move more and more to PoSh or scripting to make changes in SQL Server, I expect more and more people to manage both Windows and Linux versions in the same way. Certainly using SSMS to write queries is a much nicer experience, and I would guess that many developers that might run SQL Server on OSX or Linux will want a Windows VM for SSMS. Of course, since Visual Studio is now on OSX, maybe we’ll see SSMS running natively on other platforms.

I don’t know how many enterprises will run SQL Server on Linux, but I’m sure there are some that will. I don’t think a lot of organizations will move from Windows to Linux, unless they have loved SQL Server enough to install a single Windows host for the database and want to get rid of it now. I do think lots of developers will run SQL Server on Linux/OSX, especially in containers, where it is really easy to get a container running on their platforms.

If you’ve experimented with SQL Server on Linux, or you are excited, let us know. If you think your organization might use this platform, let us know as well. I suspect a few of you will just because you can, which is as good a reason as any.

Steve Jones

 

Posted in Editorial | Tagged , | 1 Comment

Limiting the Max–#SQLNewBlogger

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

I was chatting with someone the other day about MAX() and how it can work within a set of rows, rather than across an entire result set. That’s a good query skill to have, so I mocked up a short demo. This will show the window functions in T-SQL, using the OVER() clause with MAX.

Let’s build a table and add some data.

CREATE TABLE counters
( countid INT IDENTITY(1,1)
, countername VARCHAR(20)
, counteryear INT
, mycounter INT
)
GO
INSERT dbo.counters
        ( countername
        , counteryear
        , mycounter
        )
    VALUES
        ( 'Test1', 2012, 1 ),
        ( 'Test1', 2013, 2 ),
        ( 'Test1', 2014, 3 ),
        ( 'Test1', 2015, 4 ),
        ( 'Test1', 2016, 5 ),
        ( 'Test1', 2017, 6 )
GO

If I now query, I will just use and ORDER BY clause in the OVER() clause. This will order all the rows from the result set by the year, and then apply a MAX to them.

SELECT
    countid,
    countername,
    counteryear,
    mycounter,
    sumofallvalues = SUM(mycounter) OVER (ORDER BY counteryear),
    maxcounter = MAX(mycounter) OVER (ORDER BY counteryear)
FROM dbo.counters
ORDER BY
    countername,
    counteryear;

The partial results are shown below. The important parts are the year and the sum/max values.
2017-04-18 06_47_36-SQLQuery3.sql - (local)_SQL2016.NBA (PLATO_Steve (71))_ - Microsoft SQL Server M

We see here that the counter increases by one until the last row. This is the data set, the year and counter increment. The max should be six, but it’s not six until the last row. Why?

The answer comes from the framing of the rows. The OVER() clause, by default use a range of unbounded preceeding and current row as it’s set of values. In this case, with an order by on the counteryear, this means that when the query engine processes the first row, the entire set of values in the window is just the row with 2012. The next row has two rows, with unbounded preceeding being 2012 and the current row of 2013. This means for each set of rows, the range consists of previous values. I’ll show here what the counteryear values are for each row and the result of the max:

  • 2012 uses the 2012 row only – max 1
  • 2013 uses the 2012 and 2013 rows – max 2
  • 2014 uses the 2012, 2013, and 2014 rows – max 3
  • 2015 uses the 2012, 2013, 2014, 2015 rows – max 4
  • 2016 uses the 2012, 2013, 2014, 2015, 2016 rows – max 5
  • 2017 uses the 2012, 2013, 2014, 2015, 2016 rows – max 6

This is a window that moves with the data, and is defined as being from the beginning of the result set to the current row.

Let’s change things. I’ll add a second set of rows, and we can see here how this might differ.

INSERT dbo.counters
 ( countername
 , counteryear
 , mycounter
 )
 VALUES
 ( 'Test2', 2012, 1 ),
 ( 'Test2', 2013, 4 ),
 ( 'Test2', 2014, 2 ),
 ( 'Test2', 2015, 8 ),
 ( 'Test2', 2016, 5 ),
 ( 'Test1', 2017, 11 )
 GO

Now, let’s query again, but we’ll change our window. First, we’ll order by name for the MAX(), since that’s what we want to show. I’ll leave SUM alone. Next, I’ll change the window to be only the previous row and the current row.

SELECT
 countid,
 countername,
 counteryear,
 mycounter,
 sumofallvalues = SUM(mycounter) OVER (ORDER BY counteryear),
 maxcounter = MAX(mycounter) OVER (PARTITION BY countername
 ORDER BY counteryear
 ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
 )
 FROM dbo.counters
 ORDER BY
 countername,
 counteryear;

Now the results are a bit different. My final ORDER BY ensures I have the tests separated. The first one looks similar in the max columns. I’ll explain the SUM in a different post.

2017-04-18 06_55_24-SQLQuery3.sql - (local)_SQL2016.NBA (PLATO_Steve (71))_ - Microsoft SQL Server M

The MAX() for test2 is different. Let’s see what happens here. Since my window is a ROWS clause, and it’s set for 1 preceeding and the current row, I get these values.

Test2, 2012 row

  • Preceeding  – no values
  • Current – 1
  • Max – 1

Test2, 2013 row

  • Preceeding  – 1 (2012 value)
  • Current – 4
  • Max – 4

Test2, 2013 row

  • Preceeding  – 4 (2013 row)
  • Current – 2
  • Max – 4

Test2, 2014 row

  • Preceeding  – 2
  • Current – 8
  • Max – 8

Test2, 2015 row

  • Preceeding  – 8
  • Current – 5
  • Max – 8

This shows that MAX  is limited to the frame I’ve applied to the window. The window is the OVER() clause, which in this case is the set of rows with the same counteryear value (the partition).

Window functions get confusing and are strange, especially if you’ve spent most of your career without them and finding tricks to use COUNT(), SUM(), and other aggregates on subsets of rows. Things get easier in SQL Server 2012+, and you should spend time playing with small sets of data and understanding windows.

SQLNewBlogger

This was about a 15-20 exercise, but it was good since I needed to stop and think about how to use and show a window function and the ranges. Good skills practice.

I’d like to see others explain this with a data set that they find interesting.

References

OVER() – https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql

MAX() – https://docs.microsoft.com/en-us/sql/t-sql/functions/max-transact-sql

Posted in Blog | Tagged , , | Leave a comment

Still time to enter the SQL Clone contest

You could win an Amazon Echo from Redgate. Our contest is still open for a few days, so post a comment on my blog and take your chances. I’ll be picking from the comments next week.

Posted in Blog | Tagged , , | Leave a comment

Becoming a Data Scientist

Data Science is hot. There are lots of companies excited by using machine learning and AI to enhance their applications. There are new jobs, some of them well paying, and certainly not enough people to fill them. In many ways this reminds me of previous “hot” areas, such as Novell Networking in the late 80s/early 90s. Companies wanted new CNEs and paid dearly for them. The same thing happened in the mid 90s with MCSE’s for Microsoft networks. Many of the people hired weren’t remotely qualified, having just completed some multi-week boot camp.

You could go to school. If you have completed college, there are a list of data science graduate programs that you could choose from and pursue a masters degree. There’s even a blog where someone is documenting their masters degree path to becoming a data scientist. This isn’t a quick or east path, but it is one way to gain data science skills.

If you don’t want to spend the time or expense of a formal college program, Microsoft has a data science curriculum on the EdX platform that you can complete. These are low cost programs that you can complete to get a certificate. The value of that certificate is debatable, but the same could be said for any program. A few people that are working through this program have found it to be a good set of resources that is teaching them valuable skills.

There are other options, no shortage of books, blogs, and other resources on data science and data analysis techniques. It’s up to you, however, to learn what you need to know and become competent at a level that is useful for some organization to pay you. I dislike people choosing to study a topic for a job, so I would say that if you wish to go down this path, do so because you enjoy the work and find it interesting. Build some skills, build a portfolio of data science projects, and best of luck.

Our industry has thrived for a long time on simple analysis, and I think there will be jobs in this area for some time to come. I do expect that better looking reports and dashboards are going to be expected rather than simple tables, so I’d suggest everyone work on their visualization and report polishing skills. I also think that more complex data science techniques will be in demand, though I wouldn’t expect job growth here that overwhelms current jobs. Tackle data science if you like, but be aware this isn’t a simple or easy chore. There are lots of math and statistics involved and it looks like this is more science than just data.

Steve Jones

Posted in Editorial | Tagged , | 4 Comments

DevOps Basics– git Cloning Repos

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

This continues my series on git, because I think it’s good to know what the command line is and how to use it.

Once you have git running, the thing I’ve often wanted to do is go get code from somewhere. Certainly the creation of a repo and new code is something you might do, but often you’ll be looking to get code from somewhere else, so let’s look at how we can do this.

Find a Repo

Most of the time I find a repo somewhere in the company or on the Internet. Fro example, I have a simple database structure that I’ve used for demos at Github. This is my ASimpleTalkDB Repo, which is at: https://github.com/way0utwest/ASimpleTalkDB. You can see it here:

2017-04-06 13_54_32-way0utwest_ASimpleTalkDB_ Demo repo for Presentations

Off to the right is a “clone” button, which is what we want to do. We want to perform a git clone. If you click the button, a URL is in an edit box. The URL is: https://github.com/way0utwest/ASimpleTalkDB, the same as the repo above.

Let’s clone this. First, get a folder to store code in. I started with a Documents\Github folder on my various machines. I’ll use that, and as you can see, this is a place I have a number of folders, each one a repo. I don’t have this repo set up yet.

2017-04-06 13_58_06-C__Users_way0u_Documents_GitHub

I’ll clone this from the command line. The various tools do this, but now you’ll understand how this works. First, open a command line in this folder and then type “git clone https://github.com/way0utwest/ASimpleTalkDB”. This will clone the code, as shown below:

2017-04-06 13_59_46-way0utwest_ASimpleTalkDB_ Demo repo for Presentations

By default, the name of the repo becomes a folder name, with all the code below that. If I look in Explorer, I can see this.

2017-04-06 13_59_58-C__Users_way0u_Documents_GitHub_ASimpleTalkDB

However, I can control this. I’ll delete the folder and do this again. That’s the power of a VCS. I don’t need to worry about this code, because I’ll go get it from another repo.

Once I delete the folder, I’ll re-run the git clone command, but with the name of a folder added to the end.

2017-04-06 14_01_55-cmd

These objects, 264 of them, copy to my machine in a few seconds over hotel wi-fi. This is code, just text, and it’s quick. If I change to the folder and check the status, I see it is a real repo:

2017-04-06 14_03_01-cmd

Cloning Folders

What if I have code inside the company, and not in Github, Gitlab, BitBucket, VSTS, etc. Can I clone things? Yep, you can, but don’t use this to get a copy of your own code. That’s what branches are for. We’ll talk branches later, but for now, we can assume you might have a repo in your company.

For example, let’s assume for a moment that my C:\users\%username%\source\repos folder is on the network. I can clone one of these repos like this:

2017-04-06 14_08_12-cmd

Again, this isn’t the way to get a copy of my own code to work on. This is for getting a repo that I want to work on for myself, where I’ll then merge changes back to the original repo on another machine. On my own machine, I’d just use branches.

2017-04-06 14_08_12-cmd

This will help you get some code, and I’d encourage you to copy some code down and see how it works. Go get some code from my repo and build a db if you have SQL Source Control (point this to your cloned repo), or grab something from Microsoft and play.

That’s it for this post. There are lots of places to go. I’ll talk about how to now push your code elsewhere once you’ve changed it in another post as well as how to branch and accomplish a few other things.

A few resources (more boring, but will help you learn if you want):

git clone (git)

Create a Repo (Channel 9)

Posted in Blog | Tagged , , , | Leave a comment

Python in SQL Server

The Microsoft Data Amp event was this week, with a number of SQL Server announcements. We now know this is going to be called SQL Server 2017, which is no surprise. CTP 2.0 was also released, which includes a few improvements, but one in particular caught my eye. Python code can now run inside SQL Server, meaning we are not limited to the R language for data analysis. That’s exciting to me and I think that’s a great change. It’s one that’s been rumored for some time. I like Python, and think it’s a nice, elegant language. My son has been learning it in college and I’ve spent some time playing with it after the SDTIG got me interested with their beginning Python and Data Science modules. Since then I’ve spent a little time learning R, but find it to be a cumbersome language to work within. Having code embedded in a string in a SQL Server stored procedure made things harder, and I kept hoping that Python would be added.

It’s there now and you can play with it. While you might debate whether this is a good use of SQL Server resources, I think for quick, short, specific analysis calls, it’s a good choice. Certainly complex calculations and data manipulations are not always efficient in T-SQL and having another option is great. You’ll want to be careful when you run Python or R code, but these are good options to have. There is also GPU processing support is available to take some load off your main CPU.

CTP 2 is supposedly feature complete and there are more changes than just Python. Tempdb has a better setup experience, we get the first version of graph data and query support, resumable online index rebuilds, adaptive query processing, and more. You can read the announcement, which isn’t heavy on technical details, but gives you an idea of what’s changing. I’m not much of an SSAS person, but there are some enhancements in this platform as well.

There are other improvements and announcements this week. Azure Analysis Services is generally available (no longer beta), as as Azure Data Lake Analytics and Data Store. DocumentDB is improved and some Cognitive Services are GA, and in general, Microsoft is all in on the analytics and deep learning.

The world of data in the Microsoft world is growing and changing rapidly. We’ll have a new version this year, just a little over a year after SQL Server 2016. I expect that we will continue to see more, and new changes, occurring at a rapid pace in the future. None of us will completely keep up and become experts in all areas, but we have lots of options for new learning and experimenting with advanced features. I’d encourage you to take a little time here and there and read about and play with the new features. If you have Docker, you can get a version of SQL Server 2017 up and running without installing anything on your machine to see if there are any changes that might help your organization take better advantage of  the data you work with every day.

Steve Jones

 

Posted in Editorial | Tagged , | Leave a comment

A Sore Throat from Traveling

I’ve been avoiding recording podcasts for most of the last two weeks, so there haven’t been any. My apologies to those people that like the editorial as a podcast, but I’ve been ill and my voice is slightly shot.

After SQLBits, I took ill on the trip home. I had a miserable day in airplanes and then a complete day in bed when I got home. I can’t remember the last time I lay in bed for a day and didn’t do anything productive, but I didn’t bother to even turn on a computer.

It’s been a week and a half, and I’m still a little down. My chest is a bit tight and my voice is weak. At least I’m not feverish as I spent a solid week of fever coming and going, as well as alternating chills and sweats throughout the days and nights.

With a few commitments coming up (SQL Sat Silicon Valley and a Microsoft recording scheduled), I decided to rest things as best I can. Even my assistant coaching volleyball has had me miming actions and softly talking to kids rather than trying to push through.

Traveling is hard, and apparently my two weeks abroad didn’t end well. I’m not quite sure what I caught, but it appears to be some sort of flu bug that is lingering.

Hopefully I’ll be back to full strength before too long, but if I’m a little slow or quiet at events, you’ll know why.

Posted in Blog | Tagged | Leave a comment

SQL Server on Linux (Ubuntu)–Updating from Early CTP

I’ve been testing the SQL Server on Linux for a long time. After the recent Microsoft Data Amp event, I saw the announcement for CTP-2. So I did what I’ve been doing for some time on my Ubuntu machine:

sudo apt-get update

sudo apt-get install mssql-server

However, I got a message I didn’t expect.

2017-04-19 10_59_27-Ubuntu 64-bit SQL Server .210 - VMware Workstation

A few people on Twitter noted they had updated to CTP 2, but I was still on CTP1.4.

Hmmm.

I looked back through the apt-get log and I saw this:

2017-04-19 11_00_22-Ubuntu 64-bit SQL Server .210 - VMware Workstation

I’m cutting off part of the path, since I think it’s probably NDA. No worries, apparently the old location for me hasn’t been updated with new packages, which makes sense.

I decided to check the MS docs and see how a new user would get SSoL running? At the new docs.microsoft site, I found the Install SQL Server on Ubuntu doc.

Following the instructions, I updated the GPG keys and registered the repository with curl:

curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -

curl https://packages.microsoft.com/config/ubuntu/16.04/mssql-server.list | sudo tee /etc/apt/sources.list.d/mssql-server.list

Once those were done, I re-ran my apt-get update/install and it worked.

2017-04-19 10_58_46-SQLQuery9.sql - 192.168.1.210.master (sa (51))_ - Microsoft SQL Server Managemen

If you haven’t worked with Linux, it’s not really that hard. Some of you may get asked to install or manage a system, and while you should seek help for anything production related, in dev areas, play around. Read and learn a bit about Linux (I’d recommend watching the SDTIG book club meetings I, II, III, IV on Linux), download Ubuntu (or anything) and play around. Get used to a command line and try things.

You’ll love SQL Server on Linux. It’s mostly just SQL Server, but updates and installs as simple in a test world.

Posted in Blog | Tagged , , | Leave a comment