Default Framing–Window Functions

I’ve been playing with the window functions in T-SQL a bit, and I find them very interesting. They certainly solve some problems very well, in a way that’s much easier than has been available in T-SQL.

However there are some things you need to understand. One of these things is the framing of the window, and the data processed. A quick example will show some of this.

Imagine that I create a table and insert some data:

CREATE TABLE HomeRuns
( hrid INT IDENTITY(1,1)
, player VARCHAR(200)
, team VARCHAR(200)
, hrdate DATE
, HRcount TINYINT
CONSTRAINT hr_IDX PRIMARY KEY (hrid)
);
GO

We add a few row, which I’ll keep short.

INSERT HomeRuns (player, team, hrdate, HRcount)
 VALUES ('Troy', 'COL', '4/7/2013', 1)
      , ('Troy', 'COL', '4/18/2013', 1)
      , ('Troy', 'COL', '4/22/2013', 1)
      , ('Derek', 'NYY', '5/7/2013', 1)
      , ('Derek', 'NYY', '6/24/2013', 1)
      , ('Nelson', 'BAL', '3/31/2013', 1)
      , ('Nelson', 'BAL', '4/2/2013', 1)
      , ('Nelson', 'BAL', '4/20/2013', 1)
      , ('Lonnie', 'CLE', '5/9/2013', 3)
;
GO

This is a small set of data. Let’s imagine that I want to count the total home runs by team in each month. If I try to do this with windowing, I’ll get something like this:

select 
  team
  , datename( mm, hrdate)
  , HR.hrdate
  , sum(HR.HRcount) over (partition by month(HR.hrdate), team)
 from dbo.HomeRuns HR

When I run that, I see all the rows returned, which isn’t what we expect from aggregates. However it’s also not any kind or running total or examination of the data in a row by row processing that many window functions perform.

That’s not completely accurate, but it’s a feeling many people get when starting with these structures. However the results highlight something. Let’s look at them:

We can see that for April, for Baltimore, we see 2 rows, with a total of 2 for each row. There was one home run hit on each day, and the total is 2, but both rows are processed as one window.

Why?

That’s because the framing, the section of the partition that’s examined by the window aggregate, is the “”RANGE” of the partition by default. The entire partition, so all rows in front of, and behind, the current row, are used for the results.

This is a simple example, but it does show that you need to be aware of the default, which I don’t love. I wish the default were ROWS, and I’ll talk about that another time.

2 Responses to Default Framing–Window Functions

Oded Dror says:

November 16, 2014 at 8:49 am

Steve,

This is the old way

select Team,
datename(mm,hrdate) as Mname,
sum(hrcount) as hrcount
from #HomeRuns
Group by team,datename(mm,hrdate)
order by team

This is the new way
select [Team],
[hrdate],
[hrcount]

from (
select [Team] = src2.[Team],
[hrdate] = src2.[hrdate],
[hrcount] = sum(src2.HRcount) over (partition by datename( mm, hrdate),team),
RowNumber = row_number() over (partition by datename( mm, hrdate),team order by team )
from #HomeRuns as src2 with (nolock)
) src
where src.RowNumber = 1
Order by Team

Thanks
Oded Dror

LikeLike
Koen Verbeeck (@Ko_Ver) says:

November 19, 2014 at 3:37 am

I have a blog post about this as well:
http://blogs.lessthandot.com/index.php/datamgmt/dbprogramming/mssqlserver/beware-the-defaults-in-windowing-functions/

I’m rereading the book of Itzik Ben-gan about windowing functions. There he mentions that ROWS can make use of a specialized in-memory table for intermediary results in the execution plan, while RANGE can not; it has to use the tradiational table in tempdb.

LikeLike

Comments are closed.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31