Extreme SQL

I have made a career of working with SQL and databases. Usually I’ve looked for interesting companies and people, but I’ve avoided extreme situations. For me, that often is very large, or very real time environments. I once declined a job for a 13TB database on SQL Server 6.5. My suspicion is that job would have taken me away from my young children and wife far too often.

Facebook has a lot of users, and a lot of queries they run. With over 1billion daily users and hundreds of TBs of daily uploads, they really need strong databases. While they have multiple databases, and that includes SQL ones, they have struggled with analytic queries in the past. They started using Presto as a solution, an open source query engine for running analytic queries against data in different storage locations like RDBMSes or in something like Hive/HDFS. This sounds like what Polybase does for SQL Server.

The problem with any engine at Facebook’s scale is the load. While they like Presto, they needed to make it work better. They initially built a caching layer that required users to build ETL jobs to load data into SSDs attached to the Presto cluster. However, they outgrew this and ended up turning to a distributed file system called Alluxio.

The article linked above talks a bit about how this works, and allows users to query petabytes of data. Most of us have users that often don’t qualify their queries completely, so we expect that some queries that might need to scan 100GB end up reading much more until the users tune them appropriately.

The thing I found interesting in here is that some queries were taking up to 10s, which users found unacceptable. The move to Alluxio gave them a 30-50% boost, which doesn’t sound like a lot. 5-7s over 10 isn’t a great savings to me. The reduction in reads, is impressive, which is good, but I wonder to what expect there is some management and tuning needed here to ensure the cache works well.

I have no desire to work on these extreme systems, but I am glad someone does. The lessons and tricks learned here often trickle down to improve the daily performance many of us see in our smaller systems. I think that the Hyperscale work Microsoft is doing, and the Big Data Clusters, are fascinating ways of organizing SQL Server based servers, and some of that tech will likely trickle down and help us continue to improve our smaller systems’ performance over time.

Steve Jones

Note: Podcasts are suspended for a week as I deal with the PASS Summit.

Posted in Editorial | Tagged , | Comments Off on Extreme SQL

Daily Coping 12 Nov 2020

I started to add a daily coping tip to the SQLServerCentral newsletter and to the Community Circle, which is helping me deal with the issues in the world. I’m adding my responses for each day here.

Today’s tip is to give yourself a boost. Try a new way of being physically active.

My weekly activities are usually yoga, weight lifting, and swimming. I may use the rowing machine or bike, but those are less frequent. Horseback riding rarely, but skiing is usually something I start thinking about this time of year as a 1-2x /wk activity.

In trying something new, I decided to add in some walking. I stopped running a few years ago, but with being sick this fall, I need something that works me, but isn’t too taxing. Something I can also include the dogs in, so walking it is.

I’m adding in a walk a week, or hoping to, and trying to get some different type of movement, especially as I hope to travel in 2021 somewhere and do more walking/hiking.

Posted in Blog | Tagged , , | Comments Off on Daily Coping 12 Nov 2020

The Pace of Data Platform Change

I was watching Vicky Harp in the 2020 October GroupBy conference recently talking about her challenges of working with the SQL Server platform as it’s grown. It’s a good keynote, with Vicky noting that she started with SQL Server 2000 and the journey to today is incredible. There’s a great view of her world at the 10:00 mark (after the start).

She had this quote, which I found really thoughtful. This is something I think is true and something that many of us don’t like.

“Usually growth comes at the expense of the previous comfort of safety.” – Josh Waitzkin, The Art of Learning.

What’s your response? Run, fight? Embrace change? Resist doing anything different? I think many of us want to think that we easily embrace change, but think about the last time someone wanted to change something at work? Reorg, new protocol, etc. Did you resist and think it was silly or go along and give it a chance? For many people, it’s the former.

To be fair, it is for me as well, and I think many people that have some success in their career often want to stick with the things they are experienced in. That’s not necessarily a problem, but it is worth investigating and embracing some new things, just to see if they might be better.

The data platform is certainly one of those technologies in my life that has changed dramatically, and the pace is sometimes overwhelming. At this point, it’s hard to keep up, and hard to understand sometimes if new tech is better or worse. I’ve started to try and assume there is some good reason why Microsoft makes some changes, and then experiment, test, and evaluate the tech.

Not just once when something is released or I encounter it, but by also watching what others do and then learning where they’ve had success or failure. ADF is one of those areas where I initially dismissed it as a poor port of SSIS, but I’ve come to appreciate some of the ways in which this is an improvement to many flows, especially with hybrid workflows.

The data platform is an exciting place to work these days, and I hope you embrace some of the changes and see where a new technology might improve your environment. Of course, lots of traditional features out there work very well, and it’s not worth changing just to change. Make sure there is value and an improvement in some way, beyond you just enjoying working with something shiny and new.

Steve Jones

Note: Podcasts are suspended for a week as I deal with the PASS Summit.

Posted in Editorial | Tagged , | Comments Off on The Pace of Data Platform Change

Daily Coping 11 Nov 2020

I started to add a daily coping tip to the SQLServerCentral newsletter and to the Community Circle, which is helping me deal with the issues in the world. I’m adding my responses for each day here.

Today’s tip is to change your normal routine today and notice how you feel.

It’s PASS Virtual Summit week, and I have some commitments for the talks I’m doing. However, my talk today is in the afternoon, so I have some time this am. I’m going to change things up and go for a morning walk and get some exercise, then come in an work later into the evening.

We’ll see if I like the chance. I typically prefer to work early and stop, but I’ll try something new today.

Posted in Blog | Tagged , , | Comments Off on Daily Coping 11 Nov 2020