I’ve been playing with the window functions in T-SQL a bit, and I find them very interesting. They certainly solve some problems very well, in a way that’s much easier than has been available in T-SQL.
However there are some things you need to understand. One of these things is the framing of the window, and the data processed. A quick example will show some of this.
Imagine that I create a table and insert some data:
CREATE TABLE HomeRuns ( hrid INT IDENTITY(1,1) , player VARCHAR(200) , team VARCHAR(200) , hrdate DATE , HRcount TINYINT CONSTRAINT hr_IDX PRIMARY KEY (hrid) ); GO
We add a few row, which I’ll keep short.
INSERT HomeRuns (player, team, hrdate, HRcount) VALUES ('Troy', 'COL', '4/7/2013', 1) , ('Troy', 'COL', '4/18/2013', 1) , ('Troy', 'COL', '4/22/2013', 1) , ('Derek', 'NYY', '5/7/2013', 1) , ('Derek', 'NYY', '6/24/2013', 1) , ('Nelson', 'BAL', '3/31/2013', 1) , ('Nelson', 'BAL', '4/2/2013', 1) , ('Nelson', 'BAL', '4/20/2013', 1) , ('Lonnie', 'CLE', '5/9/2013', 3) ; GO
This is a small set of data. Let’s imagine that I want to count the total home runs by team in each month. If I try to do this with windowing, I’ll get something like this:
select team , datename( mm, hrdate) , HR.hrdate , sum(HR.HRcount) over (partition by month(HR.hrdate), team) from dbo.HomeRuns HR
When I run that, I see all the rows returned, which isn’t what we expect from aggregates. However it’s also not any kind or running total or examination of the data in a row by row processing that many window functions perform.
That’s not completely accurate, but it’s a feeling many people get when starting with these structures. However the results highlight something. Let’s look at them:
We can see that for April, for Baltimore, we see 2 rows, with a total of 2 for each row. There was one home run hit on each day, and the total is 2, but both rows are processed as one window.
That’s because the framing, the section of the partition that’s examined by the window aggregate, is the “"RANGE” of the partition by default. The entire partition, so all rows in front of, and behind, the current row, are used for the results.
This is a simple example, but it does show that you need to be aware of the default, which I don’t love. I wish the default were ROWS, and I’ll talk about that another time.