SQL Server on Linux is Just SQL Server

I’ve been working with the SQL Server on Linux (SSoL) version for quite some time, almost a year. In all of that time, I’ve for the most part found that SQL Server is SQL Server. When I connect, run demos, check code, almost everything just works. If I didn’t bother to check @@version, this would appear to be just another SQL Server to me. That’s what a new video on Channel 9 shows as well. SQL Server is pretty much the same on both platforms. I’ve tested the Redgate tools and to all of them, SSoL is just SQL Server.

There are some differences, which is to be expected. Any operations that access the file system and require paths work a bit differently, and for those people that end up administering the product, there will be some changes to get used to. The advanced HA features are similar, but again, some work is required. However, this isn’t all bad. I’m impressed with the apt-get process (I’m testing on Ubuntu), which is way, way easier than any patching or updating process I’ve gone through on Windows. In fact, setting up an Ubuntu VM last year was easy, and installing SQL Server was about as easy as it could be.

The tooling on Linux isn’t as mature, and I don’t know when we will see a GUI client, but as I move more and more to PoSh or scripting to make changes in SQL Server, I expect more and more people to manage both Windows and Linux versions in the same way. Certainly using SSMS to write queries is a much nicer experience, and I would guess that many developers that might run SQL Server on OSX or Linux will want a Windows VM for SSMS. Of course, since Visual Studio is now on OSX, maybe we’ll see SSMS running natively on other platforms.

I don’t know how many enterprises will run SQL Server on Linux, but I’m sure there are some that will. I don’t think a lot of organizations will move from Windows to Linux, unless they have loved SQL Server enough to install a single Windows host for the database and want to get rid of it now. I do think lots of developers will run SQL Server on Linux/OSX, especially in containers, where it is really easy to get a container running on their platforms.

If you’ve experimented with SQL Server on Linux, or you are excited, let us know. If you think your organization might use this platform, let us know as well. I suspect a few of you will just because you can, which is as good a reason as any.

Steve Jones

 

Posted in Editorial | Tagged , | 1 Comment

Limiting the Max–#SQLNewBlogger

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

I was chatting with someone the other day about MAX() and how it can work within a set of rows, rather than across an entire result set. That’s a good query skill to have, so I mocked up a short demo. This will show the window functions in T-SQL, using the OVER() clause with MAX.

Let’s build a table and add some data.

CREATE TABLE counters
( countid INT IDENTITY(1,1)
, countername VARCHAR(20)
, counteryear INT
, mycounter INT
)
GO
INSERT dbo.counters
        ( countername
        , counteryear
        , mycounter
        )
    VALUES
        ( 'Test1', 2012, 1 ),
        ( 'Test1', 2013, 2 ),
        ( 'Test1', 2014, 3 ),
        ( 'Test1', 2015, 4 ),
        ( 'Test1', 2016, 5 ),
        ( 'Test1', 2017, 6 )
GO

If I now query, I will just use and ORDER BY clause in the OVER() clause. This will order all the rows from the result set by the year, and then apply a MAX to them.

SELECT
    countid,
    countername,
    counteryear,
    mycounter,
    sumofallvalues = SUM(mycounter) OVER (ORDER BY counteryear),
    maxcounter = MAX(mycounter) OVER (ORDER BY counteryear)
FROM dbo.counters
ORDER BY
    countername,
    counteryear;

The partial results are shown below. The important parts are the year and the sum/max values.
2017-04-18 06_47_36-SQLQuery3.sql - (local)_SQL2016.NBA (PLATO_Steve (71))_ - Microsoft SQL Server M

We see here that the counter increases by one until the last row. This is the data set, the year and counter increment. The max should be six, but it’s not six until the last row. Why?

The answer comes from the framing of the rows. The OVER() clause, by default use a range of unbounded preceeding and current row as it’s set of values. In this case, with an order by on the counteryear, this means that when the query engine processes the first row, the entire set of values in the window is just the row with 2012. The next row has two rows, with unbounded preceeding being 2012 and the current row of 2013. This means for each set of rows, the range consists of previous values. I’ll show here what the counteryear values are for each row and the result of the max:

  • 2012 uses the 2012 row only – max 1
  • 2013 uses the 2012 and 2013 rows – max 2
  • 2014 uses the 2012, 2013, and 2014 rows – max 3
  • 2015 uses the 2012, 2013, 2014, 2015 rows – max 4
  • 2016 uses the 2012, 2013, 2014, 2015, 2016 rows – max 5
  • 2017 uses the 2012, 2013, 2014, 2015, 2016 rows – max 6

This is a window that moves with the data, and is defined as being from the beginning of the result set to the current row.

Let’s change things. I’ll add a second set of rows, and we can see here how this might differ.

INSERT dbo.counters
 ( countername
 , counteryear
 , mycounter
 )
 VALUES
 ( 'Test2', 2012, 1 ),
 ( 'Test2', 2013, 4 ),
 ( 'Test2', 2014, 2 ),
 ( 'Test2', 2015, 8 ),
 ( 'Test2', 2016, 5 ),
 ( 'Test1', 2017, 11 )
 GO

Now, let’s query again, but we’ll change our window. First, we’ll order by name for the MAX(), since that’s what we want to show. I’ll leave SUM alone. Next, I’ll change the window to be only the previous row and the current row.

SELECT
 countid,
 countername,
 counteryear,
 mycounter,
 sumofallvalues = SUM(mycounter) OVER (ORDER BY counteryear),
 maxcounter = MAX(mycounter) OVER (PARTITION BY countername
 ORDER BY counteryear
 ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
 )
 FROM dbo.counters
 ORDER BY
 countername,
 counteryear;

Now the results are a bit different. My final ORDER BY ensures I have the tests separated. The first one looks similar in the max columns. I’ll explain the SUM in a different post.

2017-04-18 06_55_24-SQLQuery3.sql - (local)_SQL2016.NBA (PLATO_Steve (71))_ - Microsoft SQL Server M

The MAX() for test2 is different. Let’s see what happens here. Since my window is a ROWS clause, and it’s set for 1 preceeding and the current row, I get these values.

Test2, 2012 row

  • Preceeding  – no values
  • Current – 1
  • Max – 1

Test2, 2013 row

  • Preceeding  – 1 (2012 value)
  • Current – 4
  • Max – 4

Test2, 2013 row

  • Preceeding  – 4 (2013 row)
  • Current – 2
  • Max – 4

Test2, 2014 row

  • Preceeding  – 2
  • Current – 8
  • Max – 8

Test2, 2015 row

  • Preceeding  – 8
  • Current – 5
  • Max – 8

This shows that MAX  is limited to the frame I’ve applied to the window. The window is the OVER() clause, which in this case is the set of rows with the same counteryear value (the partition).

Window functions get confusing and are strange, especially if you’ve spent most of your career without them and finding tricks to use COUNT(), SUM(), and other aggregates on subsets of rows. Things get easier in SQL Server 2012+, and you should spend time playing with small sets of data and understanding windows.

SQLNewBlogger

This was about a 15-20 exercise, but it was good since I needed to stop and think about how to use and show a window function and the ranges. Good skills practice.

I’d like to see others explain this with a data set that they find interesting.

References

OVER() – https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql

MAX() – https://docs.microsoft.com/en-us/sql/t-sql/functions/max-transact-sql

Posted in Blog | Tagged , , | Leave a comment

Still time to enter the SQL Clone contest

You could win an Amazon Echo from Redgate. Our contest is still open for a few days, so post a comment on my blog and take your chances. I’ll be picking from the comments next week.

Posted in Blog | Tagged , , | Leave a comment

Becoming a Data Scientist

Data Science is hot. There are lots of companies excited by using machine learning and AI to enhance their applications. There are new jobs, some of them well paying, and certainly not enough people to fill them. In many ways this reminds me of previous “hot” areas, such as Novell Networking in the late 80s/early 90s. Companies wanted new CNEs and paid dearly for them. The same thing happened in the mid 90s with MCSE’s for Microsoft networks. Many of the people hired weren’t remotely qualified, having just completed some multi-week boot camp.

You could go to school. If you have completed college, there are a list of data science graduate programs that you could choose from and pursue a masters degree. There’s even a blog where someone is documenting their masters degree path to becoming a data scientist. This isn’t a quick or east path, but it is one way to gain data science skills.

If you don’t want to spend the time or expense of a formal college program, Microsoft has a data science curriculum on the EdX platform that you can complete. These are low cost programs that you can complete to get a certificate. The value of that certificate is debatable, but the same could be said for any program. A few people that are working through this program have found it to be a good set of resources that is teaching them valuable skills.

There are other options, no shortage of books, blogs, and other resources on data science and data analysis techniques. It’s up to you, however, to learn what you need to know and become competent at a level that is useful for some organization to pay you. I dislike people choosing to study a topic for a job, so I would say that if you wish to go down this path, do so because you enjoy the work and find it interesting. Build some skills, build a portfolio of data science projects, and best of luck.

Our industry has thrived for a long time on simple analysis, and I think there will be jobs in this area for some time to come. I do expect that better looking reports and dashboards are going to be expected rather than simple tables, so I’d suggest everyone work on their visualization and report polishing skills. I also think that more complex data science techniques will be in demand, though I wouldn’t expect job growth here that overwhelms current jobs. Tackle data science if you like, but be aware this isn’t a simple or easy chore. There are lots of math and statistics involved and it looks like this is more science than just data.

Steve Jones

Posted in Editorial | Tagged , | 3 Comments

DevOps Basics– git Cloning Repos

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

This continues my series on git, because I think it’s good to know what the command line is and how to use it.

Once you have git running, the thing I’ve often wanted to do is go get code from somewhere. Certainly the creation of a repo and new code is something you might do, but often you’ll be looking to get code from somewhere else, so let’s look at how we can do this.

Find a Repo

Most of the time I find a repo somewhere in the company or on the Internet. Fro example, I have a simple database structure that I’ve used for demos at Github. This is my ASimpleTalkDB Repo, which is at: https://github.com/way0utwest/ASimpleTalkDB. You can see it here:

2017-04-06 13_54_32-way0utwest_ASimpleTalkDB_ Demo repo for Presentations

Off to the right is a “clone” button, which is what we want to do. We want to perform a git clone. If you click the button, a URL is in an edit box. The URL is: https://github.com/way0utwest/ASimpleTalkDB, the same as the repo above.

Let’s clone this. First, get a folder to store code in. I started with a Documents\Github folder on my various machines. I’ll use that, and as you can see, this is a place I have a number of folders, each one a repo. I don’t have this repo set up yet.

2017-04-06 13_58_06-C__Users_way0u_Documents_GitHub

I’ll clone this from the command line. The various tools do this, but now you’ll understand how this works. First, open a command line in this folder and then type “git clone https://github.com/way0utwest/ASimpleTalkDB”. This will clone the code, as shown below:

2017-04-06 13_59_46-way0utwest_ASimpleTalkDB_ Demo repo for Presentations

By default, the name of the repo becomes a folder name, with all the code below that. If I look in Explorer, I can see this.

2017-04-06 13_59_58-C__Users_way0u_Documents_GitHub_ASimpleTalkDB

However, I can control this. I’ll delete the folder and do this again. That’s the power of a VCS. I don’t need to worry about this code, because I’ll go get it from another repo.

Once I delete the folder, I’ll re-run the git clone command, but with the name of a folder added to the end.

2017-04-06 14_01_55-cmd

These objects, 264 of them, copy to my machine in a few seconds over hotel wi-fi. This is code, just text, and it’s quick. If I change to the folder and check the status, I see it is a real repo:

2017-04-06 14_03_01-cmd

Cloning Folders

What if I have code inside the company, and not in Github, Gitlab, BitBucket, VSTS, etc. Can I clone things? Yep, you can, but don’t use this to get a copy of your own code. That’s what branches are for. We’ll talk branches later, but for now, we can assume you might have a repo in your company.

For example, let’s assume for a moment that my C:\users\%username%\source\repos folder is on the network. I can clone one of these repos like this:

2017-04-06 14_08_12-cmd

Again, this isn’t the way to get a copy of my own code to work on. This is for getting a repo that I want to work on for myself, where I’ll then merge changes back to the original repo on another machine. On my own machine, I’d just use branches.

2017-04-06 14_08_12-cmd

This will help you get some code, and I’d encourage you to copy some code down and see how it works. Go get some code from my repo and build a db if you have SQL Source Control (point this to your cloned repo), or grab something from Microsoft and play.

That’s it for this post. There are lots of places to go. I’ll talk about how to now push your code elsewhere once you’ve changed it in another post as well as how to branch and accomplish a few other things.

A few resources (more boring, but will help you learn if you want):

git clone (git)

Create a Repo (Channel 9)

Posted in Blog | Tagged , , , | Leave a comment

Python in SQL Server

The Microsoft Data Amp event was this week, with a number of SQL Server announcements. We now know this is going to be called SQL Server 2017, which is no surprise. CTP 2.0 was also released, which includes a few improvements, but one in particular caught my eye. Python code can now run inside SQL Server, meaning we are not limited to the R language for data analysis. That’s exciting to me and I think that’s a great change. It’s one that’s been rumored for some time. I like Python, and think it’s a nice, elegant language. My son has been learning it in college and I’ve spent some time playing with it after the SDTIG got me interested with their beginning Python and Data Science modules. Since then I’ve spent a little time learning R, but find it to be a cumbersome language to work within. Having code embedded in a string in a SQL Server stored procedure made things harder, and I kept hoping that Python would be added.

It’s there now and you can play with it. While you might debate whether this is a good use of SQL Server resources, I think for quick, short, specific analysis calls, it’s a good choice. Certainly complex calculations and data manipulations are not always efficient in T-SQL and having another option is great. You’ll want to be careful when you run Python or R code, but these are good options to have. There is also GPU processing support is available to take some load off your main CPU.

CTP 2 is supposedly feature complete and there are more changes than just Python. Tempdb has a better setup experience, we get the first version of graph data and query support, resumable online index rebuilds, adaptive query processing, and more. You can read the announcement, which isn’t heavy on technical details, but gives you an idea of what’s changing. I’m not much of an SSAS person, but there are some enhancements in this platform as well.

There are other improvements and announcements this week. Azure Analysis Services is generally available (no longer beta), as as Azure Data Lake Analytics and Data Store. DocumentDB is improved and some Cognitive Services are GA, and in general, Microsoft is all in on the analytics and deep learning.

The world of data in the Microsoft world is growing and changing rapidly. We’ll have a new version this year, just a little over a year after SQL Server 2016. I expect that we will continue to see more, and new changes, occurring at a rapid pace in the future. None of us will completely keep up and become experts in all areas, but we have lots of options for new learning and experimenting with advanced features. I’d encourage you to take a little time here and there and read about and play with the new features. If you have Docker, you can get a version of SQL Server 2017 up and running without installing anything on your machine to see if there are any changes that might help your organization take better advantage of  the data you work with every day.

Steve Jones

 

Posted in Editorial | Tagged , | Leave a comment

A Sore Throat from Traveling

I’ve been avoiding recording podcasts for most of the last two weeks, so there haven’t been any. My apologies to those people that like the editorial as a podcast, but I’ve been ill and my voice is slightly shot.

After SQLBits, I took ill on the trip home. I had a miserable day in airplanes and then a complete day in bed when I got home. I can’t remember the last time I lay in bed for a day and didn’t do anything productive, but I didn’t bother to even turn on a computer.

It’s been a week and a half, and I’m still a little down. My chest is a bit tight and my voice is weak. At least I’m not feverish as I spent a solid week of fever coming and going, as well as alternating chills and sweats throughout the days and nights.

With a few commitments coming up (SQL Sat Silicon Valley and a Microsoft recording scheduled), I decided to rest things as best I can. Even my assistant coaching volleyball has had me miming actions and softly talking to kids rather than trying to push through.

Traveling is hard, and apparently my two weeks abroad didn’t end well. I’m not quite sure what I caught, but it appears to be some sort of flu bug that is lingering.

Hopefully I’ll be back to full strength before too long, but if I’m a little slow or quiet at events, you’ll know why.

Posted in Blog | Tagged | Leave a comment

SQL Server on Linux (Ubuntu)–Updating from Early CTP

I’ve been testing the SQL Server on Linux for a long time. After the recent Microsoft Data Amp event, I saw the announcement for CTP-2. So I did what I’ve been doing for some time on my Ubuntu machine:

sudo apt-get update

sudo apt-get install mssql-server

However, I got a message I didn’t expect.

2017-04-19 10_59_27-Ubuntu 64-bit SQL Server .210 - VMware Workstation

A few people on Twitter noted they had updated to CTP 2, but I was still on CTP1.4.

Hmmm.

I looked back through the apt-get log and I saw this:

2017-04-19 11_00_22-Ubuntu 64-bit SQL Server .210 - VMware Workstation

I’m cutting off part of the path, since I think it’s probably NDA. No worries, apparently the old location for me hasn’t been updated with new packages, which makes sense.

I decided to check the MS docs and see how a new user would get SSoL running? At the new docs.microsoft site, I found the Install SQL Server on Ubuntu doc.

Following the instructions, I updated the GPG keys and registered the repository with curl:

curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -

curl https://packages.microsoft.com/config/ubuntu/16.04/mssql-server.list | sudo tee /etc/apt/sources.list.d/mssql-server.list

Once those were done, I re-ran my apt-get update/install and it worked.

2017-04-19 10_58_46-SQLQuery9.sql - 192.168.1.210.master (sa (51))_ - Microsoft SQL Server Managemen

If you haven’t worked with Linux, it’s not really that hard. Some of you may get asked to install or manage a system, and while you should seek help for anything production related, in dev areas, play around. Read and learn a bit about Linux (I’d recommend watching the SDTIG book club meetings I, II, III, IV on Linux), download Ubuntu (or anything) and play around. Get used to a command line and try things.

You’ll love SQL Server on Linux. It’s mostly just SQL Server, but updates and installs as simple in a test world.

Posted in Blog | Tagged , , | Leave a comment

SQL Server Telemetry

One of the things that I’ve seen blogged about and noted in the news over the last year is the amount of data being collected by various software systems. In particular, I’ve seen numerous complaints and concerns over what data Microsoft collects with its platforms. Perhaps this is because Microsoft is the largest vendor, and plenty of other software collects usage information, primarily to determine if features are being used or working correctly. I think much of this is overblown, but I can understand having sensitivity about our computer usage, especially for home operating systems.

Microsoft conducted an extensive review and published data about what is being collected and why. Windows 10 and other products have undergone review and must comply with the published policies. There’s even a Microsoft Privacy site to address concerns and explain things. It’s an easy to read policy that explains what  Microsoft is collectin, if you’re connected to the Internet (if you’re not, don’t worry, no bloating files). That’s a huge step forward in an area that is evolving and something I wouldn’t have expected to see in the past. I am glad Microsoft is making strides here, even if I may not agree with specific items in their policies. I do think that most of the companies collecting this data are doing so to improve the products, not spy on customers. I’m sure some do, but likely smaller organizations with some sort of criminal intent.

As data becomes more important, telemetry for software is potentially a data leakage vector where private, personal, or customer information might be leaked. Certainly as more speech and other customized services are used in businesses, I worry about what data could be accidentally disclosed. After all, it’s not that super powerful smart phone that is actually converting audio to text in many cases; it’s a computer somewhere in the vendor’s cloud.

With databases, this has also been a concern from some people. I’ve seen the Customer Experience Improvement Program for years and usually opted in. I’m rarely doing something sensitive and I hope that with more data, Microsoft improves the platform. That’s the stated goal, and I’d seen them talk about this a few times. The SQL Server has moved forward and published an explicit policy that spells out what and when data is collected. It was actually just updated recently and all new versions of the platform must provide this information (if anything is different) and adhere to what they disclose. There is a chance that user data could leak into a crash dump, though users have the opportunity to review data before it is sent to Microsoft. I’m not sure how many will, but they have the chance.

I would like to be sure that anything sent is secured, and perhaps have an easy way to audit the data sent in a session, but I know this entire process is evolving. One important item to note is that customers can opt-out of data collection for any paid for versions of SQL Server. That’s entirely fair, but if you have regulatory concerns, you should be sure that you don’t restore production data to development machines. You shouldn’t anyway, but just an FYI.

Usage data is going to be a part of the future of software, especially as more “services” integrate into what we think of as software. Those services aren’t always going to be under our control and certainly part of the reason many of these are inexpensive is that additional data is captured about the people using the software. I hope all companies publish and adhere to some sort of privacy statement, and maybe that’s a good start. Rather than any regulation on specific privacy that must exist, start forcing companies to publish and stick to whatever policy they choose.

Steve Jones

 

Posted in Editorial | Tagged | Leave a comment

Liable for Data Loss

When I first installed Windows 7, I was thrilled. Finally, Microsoft had slimmed down and improved the OS performance rather than continuing to bloat it larger. After Vista, Windows 7 was a welcome change. Windows 8 was different, but for me as a desktop user, it wasn’t much different. I moved to Windows 10 as a beta user and thought it performed well. A little slower than 7, but overall a good move. I was glad to get the upgrade notice on a second machine, but it was annoying. Trying to easily delay or avoid the change in the middle of some travel was hard. I certainly could sympathize with the users that complained they didn’t want the upgrade and couldn’t easily avoid it. I’m glad Microsoft changed this process a bit.

There were people that accidentally, or felt forced, to upgrade. Among those, some of them lost data and decided to sue Microsoft. Let’s leave aside the Windows upgrade process, Microsoft’s decision, and the merits of this particular case. Those are separate issues from the one I want to discuss, which is the liability for data loss. At the core of the lawsuit, the time and information that people have lost is an issue that few of us have had to deal with in our careers. At least, most of us haven’t had to worry we are liable for the issues our software might cause.

Are we moving to a place where a person, or more likely a company, is going to be held liable for the data loss from upgrades or patches? There is the ability of customers to initiate legal actions, but strong EULAs and prior legal decisions seem to indicate that much of the liability resides with customers and vendors aren’t at fault. Is that a good thing? I’m not sure, but I do think that as data becomes more important and is used to justify decisions or drive business actions, there will be a push to ensure that anyone performing data changes with their software during patches and upgrades is liable for issues.

I’m surprised we haven’t seen more of accountability from software firms to date, but I think much of the legal issues have been settled without much fanfare and strong non disclosure agreements. I’m not sure this is the best solution for anyone, as to force some improvement and better quality for software, we need to take better care of our data. I don’t want us to move slower with software development or deployment, but I do want quality to improve.

Steve Jones

Posted in Editorial | Tagged | Leave a comment