2024 PASS Data Community Summit Prep

Next week is the 2024 PASS Data Community Summit in Seattle. I’ll be traveling Monday to the event to see a few thousand fellow data professionals, developers, managers, and more. I expect quite a few people to be traveling this weekend and enjoying SQL Saturday Oregon and SW Washington or just getting to Seattle for Pre-cons on Monday, but I’m slightly worn out so shortening my trip a bit.

First Timers

If you’ve never been to the Summit before (this is the 26th), then there is an FAQ available. I would highly recommend checking out Edwin Sarmiento’s First Timer Guide. It’s a great resource.

Also, make sure you aren’t this guy (or girl).

Summit Sessions

There are a lot of sessions available. I got asked for recommendations, and it was almost overwhelming, even when I skipped pre-cons. All the pre-cons are by great speakers and experts, so no recommendations there. I’m also not recommending lightning talks as those are just for fun.

Here are some I like, though I am biased towards DevOps and forward-thinking topics.

There are lots more sessions, but those are ones I’d want to see (or watch later).

After Hours

Don’t go out alone. I know some of you need a break or need to re-charge. Heck, I do at times, but minimize this at the Summit. Take advantage of the time to meet others and spend time with them. Build bonds, network, learn more about our industry or just about other people.

There are community events during and at night. Go on a walk with others, pick a place for dinner on Mon/Tue/Thur/Fri, or ask people to do something in Seattle. I’m not a big nightlife person, so don’t ask me, but I do recommend the Underground Tour if you’ve never done it. There are great museums around if you have an off day, especially the Museum of Flight.

I’ll be at the SQL Server Central / Datavail Casino Party Tuesday night and the Redgate event Thursday. Wednesday I’ll be at the expo hall reception and then likely go to bed. Dinner with a friend planned for Monday already.

Enjoy the Summit if you’re there. If you’re not, take my advice above at any event and don’t be an introvert for a few days. You’ll be tired, but you will grow from the experience.

Posted in Blog | Tagged , , | Comments Off on 2024 PASS Data Community Summit Prep

A New Word: Bye-over

bye-over – n.  the sheepish casual vibe between two people who’ve shred an emotional farewell but then unexpectedly have a little extra time together, wordlessly agreeing to pretend that they’ve already moved on.

I used to feel a bye-over was very awkward. I’d said goodbye, they had, and we’re delayed. I think the advent of Uber and similar services, where we wait more often on trips, has made it easier to say goodbye and then walk away a few steps to wait.

However, I would also say I’ve just learned to keep chatting with someone if there’s some delay, or even just come back and chat with others.

From the Dictionary of Obscure Sorrows

Posted in Blog | Tagged , | Comments Off on A New Word: Bye-over

Counting Groups with Window Functions: #SQLNewBlogger

I looked at row_number() in a previous post. Now I want to build on that and do some counting of rows with COUNT() and the OVER clause. I’ll show how this differs a bit from a normal aggregate.

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers. This is also part of a series on Window Functions.

The Raw Data

Let’s use AdventureWorks as a sample database. In the Sales.SalesOrderHeader table there are lots of customer orders, each with a Customer ID. I’ll limit my results to these customers, which has a spread of orders:

WHERE soh.CustomerID IN (11011, 11015, 11019, 11012)

We’ll use this WHERE clause to help us decide

Counting Orders

If I wanted to count the number of orders each customer hasplaced, I can use this code:

SELECT
   soh.CustomerID
, COUNT (*) AS OrderCount
FROM Sales.SalesOrderHeader AS soh
WHERE
   soh.CustomerID IN ( 11011, 11015, 11019, 11012 )
GROUP BY soh.CustomerID;

I see these results:

2024-10_0175

We can see that each of these customers has different numbers of orders.  If I want a simple summary, this is the best code. However, what if I want to add other columns?

Supposed I want to add not only the customer ID, but also the shipdate. I’m wondering when orders were shipped. If I use the code above, I’d need to add this in the column list, and the group by. If I do this, I get this code:

SELECT
   soh.CustomerID
, soh.ShipDate
, COUNT (*) AS OrderCount
FROM Sales.SalesOrderHeader AS soh
WHERE
   soh.CustomerID IN ( 11011, 11015, 11019, 11012 )
GROUP BY
   soh.CustomerID
, soh.ShipDate;

And this result:

2024-10_0176

The results are cut off, but they repeat with a 1 for each row. The front end can sum these, but it’s easy to make a mistake here. Especially if there are filters.

With an OVER() clause, my aggregate changes a bit. I would use this code, where I am paritioning, or essentially grouping, the data by the customer ID. I don’t have a group by, so I don’t need to add other fields in two places. Here is the code:

SELECT
   soh.CustomerID
, soh.ShipDate
, COUNT (*) OVER (PARTITION BY soh.CustomerID) AS OrderCount
FROM Sales.SalesOrderHeader AS soh
WHERE
   soh.CustomerID IN ( 11011, 11015, 11019, 11012 );

And the results.

2024-10_0177

I have the count repeated, but if I am showing this in the front end, I can hide that column in a table, but I can also access the total count from any row.

Differences with Group By

Let’s say I decide to add something else, like the account number and the SalesOrderNumber. If I do this in a GROUP BY, I need to add this to the column list and the group by, as shown here:

SELECT
   soh.CustomerID
, soh.AccountNumber
, soh.SalesOrderNumber
, soh.ShipDate
, COUNT (*) AS OrderCount
FROM Sales.SalesOrderHeader AS soh
WHERE
   soh.CustomerID IN ( 11011, 11015, 11019, 11012 )
GROUP BY
   soh.CustomerID
, soh.ShipDate
, soh.AccountNumber
, soh.SalesOrderNumber;
GO

The change the Window function is as shown:

SELECT
   soh.CustomerID
, soh.AccountNumber
, soh.SalesOrderNumber
, soh.ShipDate
, COUNT (*) OVER (PARTITION BY soh.CustomerID) AS OrderCount
FROM Sales.SalesOrderHeader AS soh
WHERE
   soh.CustomerID IN ( 11011, 11015, 11019, 11012 );

Maintenance is easy, and what’s more, I don’t have to worry about ordering my group by in different ways. If I do care about grouping, then I can alter the partitioning, but often I just want to add other columns without making the code more complex in a second place.

The results for the group by have the fields, but all 1s in the count again.

2024-10_0179

The window function has the counts for each customer.

2024-10_0178

Small things, but those points of maintenance can be annoying and they can cause problems. For complex data, the other fields might not group well, or the order of grouping might change what is shown.

There are more things to be concerned about here, but one of the big things might be a running total, where I count the orders over time. I’d like results like this:

2024-10_0180

My window function code is:

SELECT
   soh.CustomerID
, soh.AccountNumber
, soh.SalesOrderNumber
, soh.ShipDate
, COUNT (*) OVER (PARTITION BY soh.CustomerID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS OrderCount
FROM Sales.SalesOrderHeader AS soh
WHERE
   soh.CustomerID IN ( 11011, 11015, 11019, 11012 );

The regular code with group by is this:

select soh1.customerid, soh1.AccountNumber, soh1.SalesOrderNumber, soh1.ShipDate
, count(*) 'running_total'
from sales.salesorderheader soh1
inner join sales.salesorderheader soh2 on soh2.salesorderid <= soh1.salesorderid
                                    and soh2.customerid = soh1.customerid
group by soh1.customerid, soh1.AccountNumber, soh1.SalesOrderNumber, soh1.ShipDate
order by soh1.customerid, soh1.ShipDate

Not bad, but the big difference is in resources used. The statistics IO below shows the difference. Can you guess which is which?

2024-10_0183

Hint, the one using less reads is the OVER() code.

SQL New Blogger

Setting up a good scenario is a little tricky, and that took time. I messed with a few data sets that helped explain this to me, and you, in a way that made sense. I have a few other scenarios, which I will write about because an OVER() isn’t magic and it might not be the right choice.

Those will be other blogs.

This is a good way to showcase that you understand part of how this works. There’s a whole series to be written on different aspects of how the OVER() clause works. Spending 15-30 minutes each time you experiment helps showcase your knowledge and learning to employers.

That’s what employers need: experimenting and learning.

Posted in Blog | Tagged , , , | 1 Comment

The Cloud Security Problem

Your management gets a great demo from a cloud vendor and decides that the organization needs to implement the new service/application/etc. quickly. Your team tries to comply, furiously learning and experimenting with integrations, software changes, infrastructure configuration, and more. Things get deployed are working. Clients and management are happy with the new capabilities and you breathe a sigh of relief.

After a bit of time there’s a security issue and all of a sudden there’s blame pouring down on everyone. The vendor takes a hit because it’s a public security problem, but the reality might be that your organization didn’t completely understand how to configure strong security. The public doesn’t blame your organization, but internally your team don’t know how to make changes to ensure future security.

That’s a bit of what happened with the Snowflake customer hacks, and a good description of some of the issues is in this piece from Joey D’Antoni. Snowflake didn’t necessarily have bad security, but they allowed customers to have bad security and also limited the options for customers to implement stronger security.

I think about this all the time as I look at the challenges of security that many organizations face. In my decades of working in different situations, the one thing I know is that pressure to move fast often creates security shortcuts. Much of the early security problems with SQL Server could easily be traced to a) allowing installs with no password, b) developers not understanding the security model and granting sysadmin (or using sa) in their applications, and c) management agreeing that this was OK because a deadline needed to be met.

In all these situations, workers had the best of intentions to go fix things later, but there was rarely the time or energy to do so. As someone who forced apps to change away from sa in a large org, and required separate accounts and passwords for servers, I can tell you no one liked me and I got a lot of pressure to leave things alone. That is until I could explain the risks to managers with security people present. Even then there was no shortage of people who wanted me to assume the risk of apps using sa and let things be.

Cloud security is pretty good from the major vendors. Most of the smaller co-location facilities I’ve worked with in the past also had good security. It was the clients that caused problems, often because of a lack of knowledge or the desire to hurry.

Joey says Entra is a great system. I agree, and I love the SSO capabilities I’ve seen implemented. I also know that trying to set things up and configure them has been difficult for me, and I think I can be pretty sharp about lots of technology. Joey has patiently answered many of my ignorant questions because I didn’t understand how some part of Entra (AAD) works. I think lots of tech people don’t understand this and don’ t necessarily have a Joey to call on.

Most of us working in technology need more security education, better habits, and the patience to implement strong security. That includes management. At the same time, I wish that the solutions out there were easier to understand, or maybe better explained for those of us who need to use some portion of the authorization and authentication systems in our software.

Unfortunately, security software is still software and those vendors push out changes and updates quickly as well, leaving education and documentation to the user. This is a bad situation that continually arises with regular issues in many organizations.

Steve Jones

Listen to the podcast at Libsyn, Spotify, or iTunes.

Note, podcasts are only available for a limited time online.

Posted in Editorial | Tagged | 2 Comments