It’s Azure T-SQL Tuesday #103

tsqltuesdayI’m slightly surprised that there haven’t been more invitations about Azure, with the main development branch of SQL Server being in the Azure cloud, but I’m glad this one came along.

The invitation is from Björn Peters, and he asks us to write about Azure SQL Database or Managed Instances. We can really write anything that’s a part of the platform, which gives us lots of choices.

I’ve got two quick items.

T-SQL Tuesday

Did you know tsqltuesday.com is hosted on Azure? It’s actually at http://tsqltuesday.azurewebsites.net. When Brent gave me to the domain, and Adam wanted a quick site, I provisioned a WordPress site on Azure in a few minutes. It actually took longer to square away the initial DNS and domain name movement than getting the site up and configured. Templates and known configuration settings help to ensure that you can build a new version of xxx, very quickly.

The cost was low as well, about $3.60/month and initially $5/month for the hosted MySQL database.

This changed last year, when the MySQL hosting option was discontinued. I could have upgraded to $15/month, but decided to experiment. Azure has a MySQL as a Service PaaS offering for the same $15/mo, so I moved to that. I was somewhat amazed how easy it was to provision this database on my own and get things running. While the site costs me a little money, it’s a donation to the community, and I’m happy to see so many people participating in the monthly event.

Quick Labs

One of the things that I needed to do last year was setup a new cluster and Availability Group for a proof of concept. I have typically done this by getting some VMs ready on my desktop, making connections, etc. However, in this case, I decided to use Azure instead. I was curious about whether this was faster or easier.

It was.

Across a few days, as I learned a bit about provisioning new machines and connecting them with networking, I was able to get my set up with 3 machines, a load balancer, storage, etc. I had a working set of Availability Groups, that I could access from anywhere.

Then I tore it down and started again.

One of the great things about Azure is that you can use the Azure Resource Manager to provision new services fairly quickly. It’s not as instantaneous as you would like, though it’s not bad. I think the fact that you request actions and then wait until you are notified of something completing in the portal makes the process seem slow.

In my case, the few days became < 1 hour the second time around.

Plus, I didn’t use resources on my local machine, where I might have to balance the load with other tasks. I could also shut down my system and reduce costs. With ARM, I could even get rid of the lab and rebuild it in an hour in the future.

To me, this is the place where the cloud makes lots of sense. If you have a load that is sporadic and can be up or down on demand, then the cloud is helpful. You can remove resources you don’t need, only paying as you go.

If I were using this every day, I’d probably stick with the local system, but for a periodic use item, it makes sense to fire up the lab when I need it, rather than keeping it alive all the time.

I look forward to doing more with Azure in the future and learning even more about how to build up the resources I need, when I need them.

Posted in Blog | Tagged , , | 2 Comments

Machine Learning in the Database

When SQL Server added the ability to execute R code, the decision seemed to split the customer base into two groups. One group was impressed and thought the idea of executing R code to analyze data in the database was a good idea. They were excited and impressed by the loan classification demo. If you haven’t read about this or seen the demo, it’s very interesting, and it’s something you might take a few minutes to read or watch it.

The other group of customers felt this was a poor use of CPU cycles for a very expensive SQL Server CPU license. Running a complex analysis, training models, and other functions commonly associated with R scripts aren’t a good use of scarce resources. They would rather have R code execute on a separate server, much like any large messaging workload might be better served by a service such as AWS’ Simple Queue Service rather than Service Broker.

I tend to be in the first group, as is Dr. Low. He writes that there is a place where Machine Learning Services (MLS), with both R and Python, are a good use of resources. Not in all cases, and certainly not for all work. The difficult parts of training models and doing the hard work of coming up with new ways to perform an analysis is definitely better left to workstations and data scientists. Those actions might not be worth the resources they take.

Once the models are trained, however, the executable load of submitting parameters to a model and getting a prediction is small. SQL Server allows us to load pre-trained models into the database and just call them as needed. Plus, the R models run in a multi-threaded fashion, unlike the single threaded execution in clients such as R Studio.

As with any feature of SQL Server, it’s important to test and evaluate the real world impact of new code on production sized workloads. Not only will you want to measure the load of your model execution, but you should also measure any changes in your existing workload with the additional R or Python code load. While I wouldn’t prevent the use of MLS in SQL Server, just like SQL CLR code, I would be careful about introducing without extensive testing, including dark deployments and simulated loads.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.2MB) podcast or subscribe to the feed at iTunes and Libsyn.

Posted in Editorial | Tagged | Leave a comment

Be Prepared with Baselines

I visited a doctor recently, and he told me a measurement he’d made. I asked if it was good or bad, and he said he had no idea. The value varied too much from person to person, and without values from the past, he couldn’t really evaluate the significance of it. He will be able to in the future, now that he has a value, but there’s nothing that can be done now. At this point, he has a baseline (of sorts) and can now start to judge how things change over time.

After that visit, I started thinking about Page Life Expectancy (PLE). PLE is one of those counters that so many DBAs look at early in their career. Often they’ve read guidance that they should worry once this is below 300, which isn’t true. There are calculations for this, but they are based on your system, and really, they’re a rough rule of thumb. Really you need to measure this for your system, so that you know what a the value often is and then worry when it dips.

To do that you need a baseline. You need to measure various metrics about your system over time so that you understand what’s a normal value. Plenty of experts, like Erin Stellato and Tim Radney have written about baselines, why they’re important, and what you might want to capture. In fact, we have quite a few articles on baselines at SQLServerCentral.

If that sounds like a lot of work to you, I agree. I’ve built systems in the past that captured metrics on my instances and stored the data. I wrote reports to view data, alerts to let me know when something is breaking (or broken), and maintenance that kept data storage under control. I essentially had to be both the software developer and operations staff for my systems. That works, but I’d try to avoid repeating that effort from now on. As Tim mentions in his piece, there are better ways to do this. There are products, such as SQL Monitor and SQL Sentry, that capture this data for you, that won’t have typos, mistakes, or holes in their operation.  Some will even show you the baseline visually to see if things are withing expected ranges.

The monitoring software does lots for you, though at a price. It’s tested, and it does all the gathering, storage, basic analysis and alerting in a way that allows you to spend time on actually fixing issues, tuning queries, and providing value for your organization. I think it’s worth the cost, since I know that my time is better spent on solving problems, not writing monitoring software. You may feel the same way or youj may not. You may prefer to write your own system, or you may not have a budget and be forced to build your own. Whichever route you go, make sure you set up a baseline. You’ll appreciate having one the next time your phone rings with a call that the server is slow.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.6MB) podcast or subscribe to the feed at iTunes and Libsyn.

Posted in Editorial | Tagged | Leave a comment

Two Types of Performance Counters

I had an issue where an instance of SQL Server was only showing the XTP (In Memory) performance counters. None of the other SQL Server counters were available, so I followed the procedure I’d written about previously. Once that was completed, I restarted SQL Server and looked.

No counters, still.

Hmmm. It was then I scrolled further and realized that I had the SQL Server Agent counters, but not the database engine ones. I looked closer in the Performance folder for my instance and noticed this:

2018-05-29 21_30_31-Binn

There are two .ini files. There are

  • perf-<named instance>sqlcrt.ini
  • perf-SQLAgent<instance>sqlagtctr.ini

These refer to the counters for the database engine and the Agent subsystem. I had unknowingly copied the Agent file for the lodctr.exe call rather than the other one.

Lesson learned. I had to re-run the procedure with the engine ini file and restart the instance again.

If you need to add counters, make sure you load both and run a restart, otherwise you might incur more downtime than you expect.

Posted in Blog | Tagged , | Leave a comment

Microsoft and GitHub

I started using git with GitHub. I thought their hosted service was fantastic, and have seen quite a few companies using private repos for their internal code, including my employer, Redgate. I see companies from Five Thirty Eight to releasing their data to the MuseScore sheet music software to TensorFlow to Programmer’s Proverbs. There are no shortage of useful, inspiring, and helpful repos you can browse. There are also plenty of silly ones, like the

I have a GitHub folder on all my machines, and that’s become the place where I just stick repos. Some of them go to VSTS, some to Bitbucket, but I still always think of GitHub first when writing code. The others work great, and I’m glad we have choice, but I’ve always been fond of GitHub as a company that opened up a place to share code and data in a way that was somehow more attractive and easier than SoundForce or Codeplex or any other location. Maybe it’s the ease of git, but I really liked the site. The GitHub for Windows, not so much, but I don’t expect everything a company does to be perfect.

Microsoft is buying the company, and GitHub seems fine with this. As one of the heaviest users of GitHub, Microsoft has been releasing their code on the platform for some time. This seems like there are some synergies here, and this is a great way for Microsoft to continue to open up some of their code and support the development community on all platforms, languages, and environments. Microsoft has said that they’ll keep the open source model, though there is no shortage of concerns and complaints. There are also a fair number of jokesThe Linux Foundation isn’t upset, which should make some people rethink their concerns.

This is an interesting topic for me, and I brought up concerns over IP with some people inside my company. After all, we compete directly with Microsoft in some places, and we wouldn’t them to copy (and rewrite) our code to add to their products. That could dramatically affect our business, but no one worries about that. Decompilers and our own Reflector could be used to understand how algorithms are implemented, and certainly Microsoft has the resources to do this if they want to. They don’t, and there are other business and legal protections to prevent this.

In some sense, the actual code really isn’t as important as many of us think. Certainly it’s not in SQL Server where our object is readable by whoever controls the instance. That includes your encrypted code, which can be easily rendered readable. I’ve often thought that the value I provide with my code is that I wrote it, I support it, and I ensure you don’t burn way more time doing those things than necessary.

I think for the most part GitHub isn’t going to change for me. I’ll still post code out there for demos and presentations and even sharing some code with my kid as he learns to program. It will take time for Microsoft to assimilate parts of the company, but overall, I expect that most of the platform will remain the same for a long time, especially if Microsoft can get more enterprise customers to use it for their systems, especially their database code.

Steve Jones

Posted in Editorial | Tagged | Leave a comment

Better Comments

I assume most of your comment your code.

Well, you probably comment code most of the time.

I’d bet your comments have quite a bit of detail.

And you do this completely inconsistently.

That’s what I’d think, or maybe just what I want. Even the best developers I know will not consistently comment code. You can drift through any project on Github and see this. Those projects on GitHub might even be better documented because people know they are public. In most corporate environments I have worked in, I’ll find that when people get busy, or distracted, or even when they’re experimenting to find a solution, and they don’t write detailed comments. Usually only when someone fixes a bug, with a solution found quickly, do I get a really useful comment.

There are all sorts of ways that people think about commenting their code. I ran across a post from Annette Allen about adding comments. She noted that she has headers in her stored procedures and other comments. However, after a few months, she wasn’t sure that the comments actually helped her. I’ve had that same feeling at times when looking back at comments, both mine and others.

Do you have a method for choosing the words you use to comment code? Jeff Atwood says to comment on why you did something, not what. I’ve seen that before, especially with version control commit comments. We often can look at code to determine what it does, and if you use a VCS extensively, then you can always see the changes that were made between versions. The comments typically give me some rationale for the work I did. Over time, I find that if I think about how I’d explain my reasoning to myself in the future, I come up with a good comment.

You might feel differently, and if so, please let me know. Perhaps there are some good ideas you have about choosing comments, and perhaps you have examples. Some might include a bug or Jira number, which I like, but others may hate. These could be comments inline in code or those you use when comimtting to a VCS.

Share some thoughts today, and if you have any entries for the best code comment, drop them here.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.1MB) podcast or subscribe to the feed at iTunes and Libsyn.

Posted in Editorial | Tagged | 1 Comment

Project Euler

I like having problems to practice solving in computer science. It can be a fun hobby, and it exercises your mind. I have enjoyed the Advent of Code, though some of those are complex and hard to focus on in my busy life.

My middle son asked about Project Euler as he finished his first year of college and suggested we try some together. He’s been busy, though I have called him down as I’ve solved one or two. He’s asked I send him my code, which I’m doing in Python as practice. He’s using C++, which is something he’s studying, but he knows some Python.

These are all math based problems, and most are short in the archives. I’ve been trying to take 5-10 minutes and solve 1 a day. I’ll ping my kid when I’m working, and we might pair program a bit to get a solution.

Not terribly challenging, but it is a different type of thinking from most data work, and it can be fun. If you want to take a break from other work, maybe experiment with a new language, give Project Euler a try.

Posted in Blog | Tagged , | Leave a comment

No @@ServerName on Linux

I setup a new instance of SQL Server on Linux some time ago. At the time, the Linux machine didn’t have any Samba running, and no real “name” on the network. As a result, after installing SQL Server I got a NULL when running SELECT @@SERVERNAME.

The fix is easy. It’s what you’d do if you had the wrong name. You run:

sp_addserver ‘Ubuntu’, Local

In this case, Ubuntu was the name I gave my machine under the Samba server, since that’s the OS. I don’t plan on adding more for now, so this is fine.

After the chance, I couldn’t restart the instance from SSMS, so on the host itself, I ran a

sudo systemctl stop mssql-server
sudo systemctl start mssql-server

That completed, and my @@servername is now working.

Posted in Blog | Tagged , , | 1 Comment

Get What You Pay For?

Many items that we choose to buy have some relation between the price and the quality of the item. We often find that a more expensive appliance or automobile has better quality than a cheaper one. There’s certainly a point at which spending more doesn’t necessarily get you better quality or even value. This seems especially true today with many “over the top” products that have added features or design that don’t seem to be worth the additional cost. At the other end, there are times many of us pay less for less quality, consciously making that tradeoff. I also think that the base level of quality has risen for so many products that it might not be worth spending more than some amount for an item. They’re all close in value and performance.

I ran across a piece that looks at the value of consumer and enterprise SSDs. This isn’t a scientific analysis, but rather a short discussion of why enterprise SSDs might be more than twice as expensive as consumer ones. There are some details, such as the architecture and spare capacity, but ultimately the author notes that people purchasing drives also worry about the perception that they might have saved money with a cheaper drive, but lost data because the warranty and support are lower. Even if the same hardware were being sold by the vendor, which I’m sure happens, having enterprise support is a crutch that can provide the employee some protection if there are issues.

There certainly are hardware differences for most devices, though the buyer should beware and verify that they are actually buying enterprise level gear. The big difference for me is that the large over provisioning of enterprise devices offers more predictable performance sustained over longer periods of time. This is important, especially for databases that might get quite a bit of transaction log or tempdb activity. In those cases, we need sustained high rates of I/O.

There are no shortage of white papers, articles, reviews, and vendor claims about their particular hardware. This is a very competitive and also highly profitably space. If you’ve had to purchase a new SAN device at any point, you know these are expensive and complex. Making the decision is stressful as it can take lots of time ot evalute, lots of resources to get things installed, and so it’s not easy to go back on the purchase. In addition, many of us that have to make a recommendation are usually asking management to spend 6 or 7 figures on a device. If it doesn’t perform, we will take the blame and may find our choice to be career limiting.

I don’t envy anyone that needs to make hardware decisions for their database or organization. There are so many companies in the storage space now that it’s a large project just to evaluate the various products and try to make a decision. It’s complex enough these days that I can see why moving to the cloud is attractive for some. While we can’t control everything and we may have large monthly bills, we typically don’t ever have to face that extremely large capital purchase decision.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 4.5MB) podcast or subscribe to the feed at iTunes and Libsyn.

Posted in Editorial | Tagged | Leave a comment

The June 2018 SQL in the City is coming

We’ve got the schedule up and you can register today. Once again I’m heading to the UK, along with Grant, to broadcast a series of sessions that we hope will help you learn a bit more about building better database software and protecting data.  Kathi’s recording her session as she’ll be in Hawaii that week (she declined to trade with me).

The theme this time is Compliant Database DevOps. Lots of people are struggling and coming to terms with the GDPR changes. I’d argue, as would others, that really most of the world ought to have better data handling, so we’ll give you some thoughts and ideas here of how to do this. We’ll use some Redgate products, but we’re hoping you get the overall concepts and ideas as well.

Join me on June 20 and get a one day conference from the comfort of your own office.

Posted in Blog | Tagged , , | Leave a comment