Running Sum w Window Function: is ROWS BETWEEN... clause required?

  • When creating a simple query to compute a running sum/total for each row using a window function [SUM() OVER(...)], I noticed that I forgot to include a clause to limit the frame from the first row to the current row.

    Nonetheless, the query returned the correctly calculated running total for each row, and I need to figure out:

    Was this just a fluke?  Or is this always true?  Is this behavior/outcome dependable?  (I'll revise production code regardless, unless it's emphatically clear this is intended behavior; I would like to know, though.)

    Here's a complete script to run.  It returns 2 computed columns based on a windowed SUM() -- 1 column with and 1 without the "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" clause.

    DROP TABLE IF EXISTS dbo.running_total;
    CREATE TABLE running_total
    (
    MeasurementDateTime DATETIME,
    MeasurementPoint VARCHAR(12),
    MeasuredValue DECIMAL(3,1)
    );

    INSERT INTO running_total
    (MeasurementDateTime, MeasurementPoint, MeasuredValue)
    VALUES
    ('2020-12-01', 'A1', 2.0),
    ('2020-12-02', 'A1', 3.0),
    ('2020-12-03', 'A1', 4.3),
    ('2020-11-01', 'A1', 2.0),
    ('2020-11-02', 'A1', 3.0),
    ('2020-11-03', 'A1', 4.3),
    ('2020-12-01', 'B1', 12.0),
    ('2020-12-02', 'B1', 13.0),
    ('2020-12-03', 'B1', 14.3)
    ;

    SELECT
    MeasurementPoint, MeasurementDateTime, MeasuredValue,
    -- Running sum, WITHOUT using the ROWS BETWEEN UNBOUNDED PRECEDING clause
    SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime) As RunSum_Without_Unbounded,
    -- ... and running sum, WITH that clause
    SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) As RunSum_WITH_Unbounded
    FROM
    running_total
    ORDER BY
    MeasurementPoint, MeasurementDateTime;

    When I run this (SQL Server 2019 Developer Edition), I get back identical results in the two computed columns.

    Thanks!

    Rich

  • From BOL:

    ROWS/RANGE that limits the rows within the partition by specifying start and end points within the partition. It requires ORDER BY argument and the default value is from the start of partition to the current element if the ORDER BY argument is specified.

    You can review the documentation here: https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-ver15

    This is the default behavior since you specified an ORDER BY.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey thank you and a sheepish apology for my not checking BOL first!

    🙂

    That wording is unusually clear and definitive, isn't it?

    Happy New Year!

    Rich

  • This was removed by the editor as SPAM

  • There may be a difference. If you do not specify ROWS / RANGE BETWEEN ... the default is RANGE BETWEEN. With your definition, it is possible to have duplicates for the columns MeasurementDateTime, MeasurementPoint and it shows the difference between ROWS and RANGE. Rows are the series currently being processed. RANGE is all rows with the same value in the columns MeasurementDateTime, MeasurementPoint, as the row currently being processed. The example shows the difference. If you only want one row in the result for each value of MeasurementDateTime, MeasurementPoint, the last statement must be used. RANGE is an unfortunate default - but it is chosen by MS!

    DROP TABLE IF EXISTS dbo.running_total;
    CREATE TABLE dbo.running_total
    (
    MeasurementDateTime DATETIME,
    MeasurementPoint VARCHAR(12),
    MeasuredValue DECIMAL(3,1)
    );

    INSERT INTO dbo.running_total
    (MeasurementDateTime, MeasurementPoint, MeasuredValue) VALUES
    ('2020-12-01', 'A1', 2.0),
    ('2020-12-02', 'A1', 3.0),
    ('2020-12-03', 'A1', 4.3),
    ('2020-11-01', 'A1', 2.0),
    ('2020-11-02', 'A1', 3.0),
    ('2020-11-03', 'A1', 4.3),
    ('2020-12-01', 'B1', 12.0),
    ('2020-12-02', 'B1', 13.0),
    ('2020-12-03', 'B1', 14.3);

    SELECT
    MeasurementPoint, MeasurementDateTime, MeasuredValue,
    -- Running sum, WITHOUT using the ROWS BETWEEN UNBOUNDED PRECEDING clause
    SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime) AS RunSum_Without_Unbounded,
    -- ... and running sum, WITH ROWS BETWEEN
    SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunSum_WITH_RowsUnbounded,
    -- ... and running sum, WITH RANGE BETWEEN
    SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunSum_WITH_RangeUnbounded
    FROM
    dbo.running_total
    ORDER BY
    MeasurementPoint, MeasurementDateTime;
    GO
    -- allowed in relation to definitions
    INSERT INTO dbo.running_total
    (MeasurementDateTime, MeasurementPoint, MeasuredValue) VALUES
    ('2020-11-01', 'A1', 9.0),
    ('2020-11-01', 'A1', 19.0),
    ('2020-11-01', 'A1', 29.0);
    GO
    SELECT
    MeasurementPoint, MeasurementDateTime, MeasuredValue,
    -- Running sum, WITHOUT using the ROWS BETWEEN UNBOUNDED PRECEDING clause
    SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime) AS RunSum_Without_Unbounded,
    -- ... and running sum, WITH ROWS BETWEEN
    SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunSum_WITH_RowsUnbounded,
    -- ... and running sum, WITH RANGE BETWEEN
    SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunSum_WITH_RangeUnbounded
    FROM
    dbo.running_total
    ORDER BY
    MeasurementPoint, MeasurementDateTime;
    GO
    SELECT
    MeasurementPoint, MeasurementDateTime, MeasuredValue,
    -- Running sum, WITHOUT using the ROWS BETWEEN UNBOUNDED PRECEDING clause
    SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime) AS RunSum_Without_Unbounded,
    -- ... and running sum, WITH ROWS BETWEEN
    SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunSum_WITH_RowsUnbounded,
    -- ... and running sum, WITH RANGE BETWEEN
    SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunSum_WITH_RangeUnbounded
    FROM
    (SELECT MeasurementDateTime, MeasurementPoint, SUM(MeasuredValue) AS MeasuredValue
    FROM dbo.running_total
    GROUP BY MeasurementDateTime, MeasurementPoint) AS running_total
    ORDER BY
    MeasurementPoint, MeasurementDateTime;
  • That's a great explanation and amplification of the query, thank you.  I made a "mistake" in my simplified example, as my original project table does in fact have a PK on MeasurementDateTime and MeasurementPoint -- no multiple rows per point/datetime are allowed.  I didn't include a PK here.

    But I'm glad I didn't include the PK in my code posted above, b/c your answer underscores a difference between ROWS and RANGE that I hadn't fully considered or appreciated.  It won't matter for my project (b/c of the PK), but your examples might prevent me from making an error in the future that I might otherwise have missed.

    Thank you,

    Rich

  • This was removed by the editor as SPAM

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply