Query Help - Sum by Month

  • I have a table of values by date. I have a requirement to group and sum them by month (to use to plot on a chart). I'm not really sure how to go about grouping by a range. That is if I have:

    01/03/2012 $300

    01/12/2012 $250

    02/05/2012 $200

    02/07/2012 $300

    02/15/2012 $400

    I need it grouped as two rows

    01/2012 $550

    02/2012 $900

    Thanks for any help.

    Sean

  • DDL and sample data would have made this a lot easier. 🙂

    Basically this is a normal group by situation except that you need to do some date math to group the months together.

    This should do it.

    create table #Something

    (

    MyDate datetime,

    SomeValue int

    )

    insert #Something

    select '01/03/2012', 300 union all

    select '01/12/2012', 250 union all

    select '02/05/2012', 200 union all

    select '02/07/2012', 300 union all

    select '02/15/2012', 400

    select dateadd(mm, datediff(mm, 0, MyDate), 0), sum(SomeValue)

    from #Something

    group by dateadd(mm, datediff(mm, 0, MyDate), 0)

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • ET DATEFORMAT MDY

    SELECT

    DateRange = RIGHT('0'+CAST(MONTH([Date]) AS VARCHAR(2)),2)+'/'+

    CAST(YEAR([Date]) AS VARCHAR(4)),

    SUMAmount = SUM(Amount)

    FROM (

    SELECT '01/03/2012', $300 UNION ALL

    SELECT '01/12/2012', $250 UNION ALL

    SELECT '02/05/2012', $200 UNION ALL

    SELECT '02/07/2012', $300 UNION ALL

    SELECT '02/15/2012', $400

    ) d ([Date], Amount)

    GROUP BY YEAR([Date]), MONTH([Date])

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks for the replies. Out of curiosity, this is the approach I took. Is there an issue with it?

    select ROUND(sum(ContractValue),2) ContractValue, MONTH(ContractDate), YEAR(ContractDate)

    From Reporting.dbo.PrgSum

    Group by YEAR(ContractDate), MONTH(ContractDate)

  • Sean Grebey (8/6/2013)


    Thanks for the replies. Out of curiosity, this is the approach I took. Is there an issue with it?

    select ROUND(sum(ContractValue),2) ContractValue, MONTH(ContractDate), YEAR(ContractDate)

    From Reporting.dbo.PrgSum

    Group by YEAR(ContractDate), MONTH(ContractDate)

    Only that the MONTH() function returns an integer, so if you want to return the period as a string like '01/2012', you'll need to use the code from ChrisM@Work's SELECT statement.

    Jason Wolfkill

  • wolfkillj (8/7/2013)


    Sean Grebey (8/6/2013)


    Thanks for the replies. Out of curiosity, this is the approach I took. Is there an issue with it?

    select ROUND(sum(ContractValue),2) ContractValue, MONTH(ContractDate), YEAR(ContractDate)

    From Reporting.dbo.PrgSum

    Group by YEAR(ContractDate), MONTH(ContractDate)

    Only that the MONTH() function returns an integer, so if you want to return the period as a string like '01/2012', you'll need to use the code from ChrisM@Work's SELECT statement.

    Wouldn't a simple convert be better than multiple functions for that?

    SELECT

    DateRange = RIGHT(CONVERT(char(10), [Date], 103), 7),

    SUMAmount = SUM(Amount)

    FROM (

    SELECT CONVERT(date, '20120103'), $300 UNION ALL

    SELECT '20120112', $250 UNION ALL

    SELECT '20120205', $200 UNION ALL

    SELECT '20120207', $300 UNION ALL

    SELECT '20120215', $400

    ) d ([Date], Amount)

    GROUP BY RIGHT(CONVERT(char(10), [Date], 103), 7)

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (8/7/2013)


    wolfkillj (8/7/2013)


    Sean Grebey (8/6/2013)


    Thanks for the replies. Out of curiosity, this is the approach I took. Is there an issue with it?

    select ROUND(sum(ContractValue),2) ContractValue, MONTH(ContractDate), YEAR(ContractDate)

    From Reporting.dbo.PrgSum

    Group by YEAR(ContractDate), MONTH(ContractDate)

    Only that the MONTH() function returns an integer, so if you want to return the period as a string like '01/2012', you'll need to use the code from ChrisM@Work's SELECT statement.

    Wouldn't a simple convert be better than multiple functions for that?

    SELECT

    DateRange = RIGHT(CONVERT(char(10), [Date], 103), 7),

    SUMAmount = SUM(Amount)

    FROM (

    SELECT CONVERT(date, '20120103'), $300 UNION ALL

    SELECT '20120112', $250 UNION ALL

    SELECT '20120205', $200 UNION ALL

    SELECT '20120207', $300 UNION ALL

    SELECT '20120215', $400

    ) d ([Date], Amount)

    GROUP BY RIGHT(CONVERT(char(10), [Date], 103), 7)

    The plans for both show a sort (100% of cost) to support the streaming aggregate. Isn't it your turn to scale up a million-row test, Luis? 😉

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • After several tests, It became clear that Chris query is 1.5 times faster, at least on my development server. Although, both run in about one second for a million rows. Here's the test code I used. If someone can clean the buffer before each query would be better, but I can't do it in here.

    IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL

    DROP TABLE #SomeTestTable

    ;

    --===== Declare some obviously named variables

    DECLARE @NumberOfRows INT,

    @StartDate DATETIME,

    @EndDate DATETIME,

    @Days INT, --This is still the "range"

    @StartValue INT,

    @EndValue INT,

    @Range INT

    ;

    --===== Preset the variables to known values

    SELECT @NumberOfRows = 1000000,

    @StartDate = '2010', --Inclusive

    @EndDate = '2020', --Exclusive

    @Days = DATEDIFF(dd,@StartDate,@EndDate),

    @StartValue = 100,

    @EndValue = 900,

    @Range = @EndValue - @StartValue + 1

    ;

    --===== Create "random constrained" integers within

    -- the parameters identified in the variables above.

    SELECT TOP (@NumberOfRows)

    dDate = ABS(CHECKSUM(NEWID())) % @Days + @StartDate,

    Amount = CAST( ROUND( RAND(CHECKSUM(NEWID())) * @Range, 2) + @StartValue AS decimal( 7,2))

    INTO #SomeTestTable

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    SET STATISTICS TIME ON

    SELECT

    DateRange = RIGHT('0'+CAST(MONTH([dDate]) AS VARCHAR(2)),2)+'/'+

    CAST(YEAR([dDate]) AS VARCHAR(4)),

    SUMAmount = SUM(Amount)

    INTO #Dummy

    FROM #SomeTestTable

    GROUP BY YEAR([dDate]), MONTH([dDate])

    SELECT

    DateRange = RIGHT(CONVERT(char(10), [dDate], 103), 7),

    SUMAmount = SUM(Amount)

    INTO #Dummy2

    FROM #SomeTestTable

    GROUP BY RIGHT(CONVERT(char(10), [dDate], 103), 7)

    SET STATISTICS TIME OFF

    DROP TABLE #Dummy

    DROP TABLE #Dummy2

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • A little challenge:

    Assume there is a clustered index on the date column.

    Can you create a query that exploits the clustered index to achieve even better performance?

  • Stefan_G (8/7/2013)


    A little challenge:

    Assume there is a clustered index on the date column.

    Can you create a query that exploits the clustered index to achieve even better performance?

    Back to you, Stefan. Get a streaming aggregate *without* a sort. Should be straightforward if your table is clustered on date 😉


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Stefan_G (8/7/2013)


    A little challenge:

    Assume there is a clustered index on the date column.

    Can you create a query that exploits the clustered index to achieve even better performance?

    Borrowing from Luis' test script;

    Create a million-row test table

    IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL

    DROP TABLE #SomeTestTable;

    --===== Declare some obviously named variables

    DECLARE @NumberOfRows INT,

    @StartDate DATEtime,

    @EndDate DATEtime,

    @Days DECIMAL(10,2), --This is still the "range"

    @StartValue INT,

    @EndValue INT,

    @Range INT;

    --===== Preset the variables to known values

    SELECT @NumberOfRows = 1000000,

    @StartDate = '2010', --Inclusive

    @EndDate = '2020', --Exclusive

    @Days = DATEDIFF(dd,@StartDate,@EndDate),

    @StartValue = 100,

    @EndValue = 900,

    @Range = @EndValue - @StartValue + 1;

    --===== Create "random constrained" integers within

    -- the parameters identified in the variables above.

    SELECT TOP (@NumberOfRows)

    dDate = DATEADD(DAY,ABS(CHECKSUM(NEWID())) % @Days,@StartDate),

    --dDate = DATEADD(minute,ABS(CHECKSUM(NEWID())) % 1440,DATEADD(day,ABS(CHECKSUM(NEWID())) % @Days,@StartDate)),

    Amount = CAST(ROUND(RAND(CHECKSUM(NEWID())) * @Range, 2) + @StartValue AS decimal( 7,2))

    INTO #SomeTestTable

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2;

    CREATE CLUSTERED INDEX ucx_dDate ON #SomeTestTable (dDate)

    Set up a test script

    --

    -- Q1

    dbcc freeproccache

    dbcc dropcleanbuffers

    SELECT

    DateRange = RIGHT('0'+CAST(MONTH([dDate]) AS VARCHAR(2)),2)+'/'+

    CAST(YEAR([dDate]) AS VARCHAR(4)),

    SUMAmount = SUM(Amount)

    FROM #SomeTestTable

    GROUP BY YEAR([dDate]), MONTH([dDate])

    go 10

    --

    -- Q2

    dbcc freeproccache

    dbcc dropcleanbuffers

    SELECT

    DateRange = RIGHT(CONVERT(char(10), [dDate], 103), 7),

    SUMAmount = SUM(Amount)

    FROM #SomeTestTable

    GROUP BY RIGHT(CONVERT(char(10), [dDate], 103), 7)

    go 10

    Open profiler, restrict to current spid & EventClass = 'SQL:BatchCompleted'. Run the test script. Here are the results:

    Query CPU Reads Duration (ms)

    Q1 668 3729 408

    Q2 1217 3729 689

    Notice that the plans for the two queries are very similar and have an equal cost relative to the batch

    Both show a hash aggregate. You might expect a streaming aggregate since the table is clustered on date and we’re grouping by date elements, but SQL Server cannot (yet) perform a streaming aggregate based on elements of date. The hash table for the aggregate requires memory and each query has a memory grant of 3392KB.

    The sample data set consists of 10 years’ worth of dates each duplicated about 275 times. A high ratio of source rows to result rows suggests a preaggregate step might offer some improvement:

    SELECT

    dDate = dDate,

    SUMAmount = SUM(Amount)

    FROM #SomeTestTable

    GROUP BY dDate

    This returns 3652 rows with a CPU of 340 and duration of about 380ms. A quick look at the plan shows a streaming aggregate as you would expect, and no memory grant. Substituting the table name with this query in my original query gives this jobbie;

    SELECT

    DateRange = RIGHT('0'+

    CAST(MONTH([dDate]) AS VARCHAR(2)),2)+'/'+

    CAST(YEAR([dDate]) AS VARCHAR(4)),

    SUMAmount = SUM(SUMAmount)

    FROM (

    SELECT

    dDate = dDate,

    SUMAmount = SUM(Amount)

    FROM #SomeTestTable

    GROUP BY dDate

    ) s

    GROUP BY YEAR([dDate]), MONTH([dDate]) with a CPU and duration almost identical to the subquery alone. The 3652 rows from the inner select are processed by a second streaming aggregate requiring a sort and consequent memory grant – this time a more modest 1040KB.

    Job's a good'un.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Nice!

    My solution looks like this:

    ;with cte1 as (

    -- get all interesting months

    select dateadd(month, N, '20100101') FirstDay

    from tsqlc_Tally

    where N < 120

    )

    select RIGHT(CONVERT(char(10), FirstDay, 103), 7), SumAmount

    from cte1

    cross apply (

    select sum(Amount) as SumAmount from #SomeTestTable where dDate >= FirstDay and dDate < dateadd(month, 1, FirstDay)

    ) t

    tsqlc_Tally is a tally table

    For the random sample data both solutions have almost identical execution times on my machine.

    My solution would be better if the date field for example was an actual timestamp with millisecond resolution. This would mean that the your preaggregation would not be very useful.

  • That's well sneaky Stefan! This mod should make it quicker, too:

    ;with cte1 as (

    -- get all interesting months

    select

    FirstDay = dateadd(month, N, '20100101'),

    qq = dateadd(month, 1, dateadd(month, N, '20100101'))

    from Tally0

    where N < 120

    )

    select RIGHT(CONVERT(char(10), FirstDay, 103), 7), SumAmount

    from cte1

    cross apply (

    select sum(Amount) as SumAmount

    from #SomeTestTable

    where dDate >= FirstDay

    and dDate < qq

    ) t

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply