Sliding aggregations

  • I need to retrieve a bunch of payments from a table and show how much has been paid since the inception of a record. I need to go 15 years out. For example, consider the tables below. Table1 holds all the unique records that could be paid for. Table2 holds all the payments for these records.

    Table1 ( key1, key2, startdate )

    Table2 ( key1, key2, amount , paiddate)

    Nums ( n )

    Lets say Table1 and Table2 look like

    Table1

    '1', 'a', '10/31/2003'

    '1', 'b', '4/1/2005'

    Table2

    '1', 'a', 3412.86, 2003-11-13

    '1', 'a', 500.00, 2008-06-04

    '1', 'a', -254.86, 2008-06-04

    '1', 'a', -100.00, 2006-05-22

    '1', 'a', -125.00, 2006-07-05

    '1', 'a', -100.00, 2006-09-26

    '1', 'a', -50.00, 2006-12-20

    '1', 'a', -100.00, 2006-08-17

    '1', 'a', -100.00, 2004-08-16

    '1', 'a', -100.00, 2004-10-15

    '1', 'a', -100.00, 2004-11-18

    '1', 'a', -100.00, 2005-01-18

    '1', 'a', -100.00, 2005-05-13

    '1', 'a', -100.00, 2005-06-15

    '1', 'a', -100.00, 2005-07-18

    '1', 'a', -100.00, 2005-08-05

    '1', 'a', -13.00, 2005-09-15

    '1', 'a', -20.00, 2007-08-13

    '1', 'a', -500.00, 2007-02-26

    '1', 'a', -100.00, 2008-03-25

    '1', 'a', -100.00, 2004-02-17

    '1', 'a', -100.00, 2004-05-17

    '1', 'a', -100.00, 2004-03-17

    '1', 'a', -100.00, 2004-04-15

    '1', 'a', -100.00, 2004-06-15

    '1', 'a', -100.00, 2004-07-15

    '1', 'a', -100.00, 2004-09-15

    '1', 'a', -100.00, 2004-12-17

    '1', 'a', -100.00, 2005-02-17

    '1', 'a', -100.00, 2005-03-14

    '1', 'a', -100.00, 2005-04-19

    '1', 'a', -50.00, 2006-11-21

    '1', 'a', -100.00, 2004-01-12

    After grouping up the data I have the following results

    key1, key2, yearlyAmount, year

    '1', 'a', 3412.86, 2003

    '1', 'a', -1200.00, 2004

    '1', 'a', -813.00, 2005

    '1', 'a', -525.00, 2006

    '1', 'a', -520.00, 2007

    '1', 'a', 145.14, 2008

    I need to get a running total yearly out 15 years. Even if I don't have data for the 15 years I need to produce this data. I do this because I am going to pivot the data later. After doing this I get.

    key1, key2, runningTotal, year

    '1', 'a', 3412.86, 2003

    '1', 'a', 2212.86, 2004

    '1', 'a', 1399.86, 2005

    '1', 'a', 874.86, 2006

    '1', 'a', 354.86, 2007

    '1', 'a', 500.00, 2008

    '1', 'a', 500.00, 2009

    '1', 'a', 500.00, 2010

    '1', 'a', 500.00, 2011

    '1', 'a', 500.00, 2012

    '1', 'a', 500.00, 2013

    '1', 'a', 500.00, 2014

    '1', 'a', 500.00, 2015

    '1', 'a', 500.00, 2016

    '1', 'a', 500.00, 2017

    The methods I have actually work, but takes quite a while to do. I am getting 174,287 rows of data from Table1. There are over a million rows in Table2.

    Below is my logic.

    1. Retrieve all records from Table1 that need to be processed. Cross join this with the Nums table to get 15 year per row. Put these in a temp table

    2. Join onto Table2, get a sum by year for each record and update this in our temp table.

    3. Join the temp table back onto itself to update the running total.

    4. Pivot the temp table into the deisred result.

    It seems like step 2-3 take the most time with step 3 being the worst.

    Can anyone else suggest a better way to do this? It seems like it is taking way too long. I would assume I can somehow do this without doing updates because this is what is killing my speed.

  • I'd do it this way:

    1) Group the data by year into a staging table (key1, key2, yearlyAmount, year).

    2) Add a column [RunningTotal]

    3) Add a clustered index on key1, key2, year (assuming key1 and key2 need to be used to differentiate the data).

    4) Use the "quirky update" method to get the running total.

    5) To pivot the data, either use CrossTab or DynamicCrossTab as described in my signature (depending on whether the years to report will be fixed or variable). The even more efficient way would be to pivot the data at the app layer...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Here is an excellent article[/url] for utilizing the "quirky" update to perform running totals. Lutz has already pointed you to the best articles for pivoting your data. (You might notice that all of these referenced articles were written by the same guy...)

    You're a first-time poster here, so let me give you a few tips for how to get better, faster answers.

    You know, the people that help out here are all un-paid volunteers. Providing the DDL scripts (CREATE TABLE, CREATE INDEX, etc.) for the tables affected, and INSERT statements to put some test data into those tables that shows your problem will go a long way in getting people to look at your issue and help you out. Please include code for what you have already tried. Don't forget to include what your expected results should be, based on the sample data provided. As a bonus to you, you will get tested code back. For more details on how to get all of this into your post, please look at the first link in my signature.

    You did a pretty good job of giving us information, but it's not in a format where we can just cut-and-paste it into SSMS and start working with it - there is a lot of stuff to do to get it to that point. Most people will just skip right over your post if they have to do all that work.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Wayne, thanks for the tip. I have included some scripts to create 1000 rows of data and for the 2 approaches I have below. The first approach is definately slower, but it is hard to tell with just 1000 rows. I will try to get a random script generated and attach it.

    Approach 1 ( Uses a temp table and updates )

    ------------------------------------------------------------------------------

    1. Get the raw data cross join this with our Nums table and put it into a temp table. This creates 15 rows for each piece of raw data from Table1.

    2. Get yearly amounts for each piece of raw data and update this in our temp table.

    3. Use our temp table to get a running total by year and update this column in our temp table.

    4. Pivot the data

    Approach 2 ( Uses CTEs to accomplish what we need )

    ------------------------------------------------------------------------------

    1. Get our payment data grouped by year and put it into a CTE

    2. Use the CTE from #1 and get a running total

    3. Use the CTE from #2 and UNIONS to insert our missing years before our first payment and after our last payment. This should now give us 15 years of data for each row in Table1.

    4. Partition our CTE from #3 by our unique id in Table1 to generate our column names for the pivot

    5. Pivot the data.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply