I did some work a 3-4 years ago, learning about the Windowing functions and enjoying them so much I built a few presentations on them. In learning about them, and trying to understand them, I found some challenges, and it took some experimentation to actually understand how the functions work in small data sets.
I noticed last week that SQLServerCentral had re-run a great piece from Kathi Kellenberger on LAST_VALUE, which is worth the read. There’s a lot in there to understand, so I thought I’d break things down a bit.
The important thing to understand with window functions is that there is a frame at any point in time when the data is being scanned or processed. I’m not sure what the best term to use is.
Let’s look at the same data set Kathi used. For simplicity, I’ll use a few images of her dataset, but I’ll examine the SalesOrderID. I think that can be easier than looking at the amounts.
Here’s the base dataset for two customers, separated by CustomerID and ordered by the OrderDate. I’ve included amount, but it’s really not important.
Now, if I do something like query for LAST_VALUE with a partition of CustomerID and ordered by OrderDate, I get this set. The partition divides the set up into the two customer sets. Without an ORDER BY, these sets would exist as the red set and blue set, but in no particular order. The ORDER BY functions as it does in any query, guaranteeing the same order every time.
Now, let’s look at the framing of the partition. I have a few choices, but at any point, I have the current row. So my processing looks like this, with the arrow representing the current row.
The next row is this one:
Then this one (the last one for this customer)
Then we move to the next customer.
When I look at any row, if I use “current row” in my framing, then I’m looking at, and including, the current row. The rest of my frame depends on what else I have. I could have UNBOUNDED PRECEEDING and UNBOUNDED FOLLOWING in there.
If I used UNBOUNDED PRECEEDING and CURRENT ROW, I’d have this frame, in green, for the first row. It’s slightly offset to show the difference.
However, if I had CURRENT ROW and UNBOUNDED FOLLOWING, I’d have this frame (in green).
In this last case, the frame is the entire partition.
What’s the last value? In the first case, the last part of that frame is the current SalesOrderID (43793). That’s the only row in the frame. In the second frame, the last one is 57418, the last row in the frame, and partition.
What if we move to the next row? Let’s look at both frames. First, UNBOUNDED PRECEEDING and CURRENT ROW.
Now the frame is the first two rows. In this case, the last value is again the current row (51522). Below, we switch to CURRENT ROW and UNBOUNDED FOLLOWING.
Now the frame is just the last two rows of the partition and the last value is the same (57418).
There’s a lot more to the window functions, and I certainly would recommend either Kathi’s book (Expert T-SQL Window Functions in SQL Server) or Itzik’s book (Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions). Either one will help. We’ve also got some good articles at SQLServerCentral on windowing functions.