Default Framing–Window Functions

I’ve been playing with the window functions in T-SQL a bit, and I find them very interesting. They certainly solve some problems very well, in a way that’s much easier than has been available in T-SQL.

However there are some things you need to understand. One of these things is the framing of the window, and the data processed. A quick example will show some of this.

Imagine that I create a table and insert some data:

CREATE TABLE HomeRuns
( hrid INT IDENTITY(1,1)
, player VARCHAR(200)
, team VARCHAR(200)
, hrdate DATE
, HRcount TINYINT
CONSTRAINT hr_IDX PRIMARY KEY (hrid)
);
GO

We add a few row, which I’ll keep short.

INSERT HomeRuns (player, team, hrdate, HRcount)
 VALUES ('Troy', 'COL', '4/7/2013', 1)
      , ('Troy', 'COL', '4/18/2013', 1)
      , ('Troy', 'COL', '4/22/2013', 1)
      , ('Derek', 'NYY', '5/7/2013', 1)
      , ('Derek', 'NYY', '6/24/2013', 1)
      , ('Nelson', 'BAL', '3/31/2013', 1)
      , ('Nelson', 'BAL', '4/2/2013', 1)
      , ('Nelson', 'BAL', '4/20/2013', 1)
      , ('Lonnie', 'CLE', '5/9/2013', 3)
;
GO

This is a small set of data. Let’s imagine that I want to count the total home runs by team in each month. If I try to do this with windowing, I’ll get something like this:

select 
  team
  , datename( mm, hrdate)
  , HR.hrdate
  , sum(HR.HRcount) over (partition by month(HR.hrdate), team)
 from dbo.HomeRuns HR

When I run that, I see all the rows returned, which isn’t what we expect from aggregates. However it’s also not any kind or running total or examination of the data in a row by row processing that many window functions perform.

That’s not completely accurate, but it’s a feeling many people get when starting with these structures. However the results highlight something. Let’s look at them:

window_1

We can see that for April, for Baltimore, we see 2 rows, with a total of 2 for each row. There was one home run hit on each day, and the total is 2, but both rows are processed as one window.

Why?

That’s because the framing, the section of the partition that’s examined by the window aggregate, is the “"RANGE” of the partition by default. The entire partition, so all rows in front of, and behind, the current row, are used for the results.

This is a simple example, but it does show that you need to be aware of the default, which I don’t love. I wish the default were ROWS, and I’ll talk about that another time.

About way0utwest

Editor, SQLServerCentral
This entry was posted in Blog and tagged , . Bookmark the permalink.

2 Responses to Default Framing–Window Functions

  1. Oded Dror says:

    Steve,

    This is the old way

    select Team,
    datename(mm,hrdate) as Mname,
    sum(hrcount) as hrcount
    from #HomeRuns
    Group by team,datename(mm,hrdate)
    order by team

    This is the new way
    select [Team],
    [hrdate],
    [hrcount]

    from (
    select [Team] = src2.[Team],
    [hrdate] = src2.[hrdate],
    [hrcount] = sum(src2.HRcount) over (partition by datename( mm, hrdate),team),
    RowNumber = row_number() over (partition by datename( mm, hrdate),team order by team )
    from #HomeRuns as src2 with (nolock)
    ) src
    where src.RowNumber = 1
    Order by Team

    Thanks
    Oded Dror

    Like

  2. I have a blog post about this as well:
    http://blogs.lessthandot.com/index.php/datamgmt/dbprogramming/mssqlserver/beware-the-defaults-in-windowing-functions/

    I’m rereading the book of Itzik Ben-gan about windowing functions. There he mentions that ROWS can make use of a specialized in-memory table for intermediary results in the execution plan, while RANGE can not; it has to use the tradiational table in tempdb.

    Like

Comments are closed.