I’ve been playing with the window functions in T-SQL a bit, and I find them very interesting. They certainly solve some problems very well, in a way that’s much easier than has been available in T-SQL.
However there are some things you need to understand. One of these things is the framing of the window, and the data processed. A quick example will show some of this.
Imagine that I create a table and insert some data:
CREATE TABLE HomeRuns ( hrid INT IDENTITY(1,1) , player VARCHAR(200) , team VARCHAR(200) , hrdate DATE , HRcount TINYINT CONSTRAINT hr_IDX PRIMARY KEY (hrid) ); GO
We add a few row, which I’ll keep short.
INSERT HomeRuns (player, team, hrdate, HRcount) VALUES ('Troy', 'COL', '4/7/2013', 1) , ('Troy', 'COL', '4/18/2013', 1) , ('Troy', 'COL', '4/22/2013', 1) , ('Derek', 'NYY', '5/7/2013', 1) , ('Derek', 'NYY', '6/24/2013', 1) , ('Nelson', 'BAL', '3/31/2013', 1) , ('Nelson', 'BAL', '4/2/2013', 1) , ('Nelson', 'BAL', '4/20/2013', 1) , ('Lonnie', 'CLE', '5/9/2013', 3) ; GO
This is a small set of data. Let’s imagine that I want to count the total home runs by team in each month. If I try to do this with windowing, I’ll get something like this:
select team , datename( mm, hrdate) , HR.hrdate , sum(HR.HRcount) over (partition by month(HR.hrdate), team) from dbo.HomeRuns HR
When I run that, I see all the rows returned, which isn’t what we expect from aggregates. However it’s also not any kind or running total or examination of the data in a row by row processing that many window functions perform.
That’s not completely accurate, but it’s a feeling many people get when starting with these structures. However the results highlight something. Let’s look at them:
We can see that for April, for Baltimore, we see 2 rows, with a total of 2 for each row. There was one home run hit on each day, and the total is 2, but both rows are processed as one window.
Why?
That’s because the framing, the section of the partition that’s examined by the window aggregate, is the “"RANGE” of the partition by default. The entire partition, so all rows in front of, and behind, the current row, are used for the results.
This is a simple example, but it does show that you need to be aware of the default, which I don’t love. I wish the default were ROWS, and I’ll talk about that another time.
Steve,
This is the old way
select Team,
datename(mm,hrdate) as Mname,
sum(hrcount) as hrcount
from #HomeRuns
Group by team,datename(mm,hrdate)
order by team
This is the new way
select [Team],
[hrdate],
[hrcount]
from (
select [Team] = src2.[Team],
[hrdate] = src2.[hrdate],
[hrcount] = sum(src2.HRcount) over (partition by datename( mm, hrdate),team),
RowNumber = row_number() over (partition by datename( mm, hrdate),team order by team )
from #HomeRuns as src2 with (nolock)
) src
where src.RowNumber = 1
Order by Team
Thanks
Oded Dror
LikeLike
I have a blog post about this as well:
http://blogs.lessthandot.com/index.php/datamgmt/dbprogramming/mssqlserver/beware-the-defaults-in-windowing-functions/
I’m rereading the book of Itzik Ben-gan about windowing functions. There he mentions that ROWS can make use of a specialized in-memory table for intermediary results in the execution plan, while RANGE can not; it has to use the tradiational table in tempdb.
LikeLike