Recursive CTE for Item supersession question

  • Greetings all.

    I have used CTEs so I am familiar with their usage however, I need to create a recursive CTE.

    Before I dive in and pull whats left of my hair out, I want to make sure this is the correct situation to use this in.

    Currently I am creating data for my company's forecasting software. The data on this works fine and is as expected by the end users however, we have run into a snag regarding super-session (new items superseding old items).

    I will try to create some fake data here soon so we can test this out but I like I said earlier I want to make sure this is the right situation for a recursive CTE, or perhaps I need a different method.

    So lets say our current Item is abc10. Last years Item is abc9, and the year prior is Abc8. Next years will be abc11. For our forecasting software, we want all history (invoice data) 'rolled up' into Abc10. Additionally, once it is all rolled up into Abc10, we will create an additional data set that will roll up Abc10 data into Abc11 for future forecasting. So we have 2 data sets, all invoices summarized at Abc10 level, then that summary at Abc11 level.

    The way the data is listed now is there is one item master record for each item, and there is a column stating which item it supersedes. So for instance Abc10's record would have Abc9's item code in the supersedes column.

    So now this is the part that I think requires the recursive cte...

    Would this be possible via a recursive cte? If so how would I go about building that?

    Thanks in advance for the help!

    I will try to make some fake data here to work with shortly.

    Link to my blog http://notyelf.com/

  • Can you set up sample tables and sample data for us to work on your query?

    Also, are u asking how to design the table structure or how to retreive the values u alreay have in your system?

  • You might want to take a good look at the following articles before you use a recursive CTE.

    http://qa.sqlservercentral.com/articles/Data+Generation/87901/

    http://qa.sqlservercentral.com/articles/T-SQL/74118/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I will put along some sample data here today. Sorry this is not the only critical thing I have to do today! πŸ˜€

    I can put together some sample data and the expected results along here shortly.

    In regards to the structure vs how to retrieve...all the tables are in place and structured, I just need to return the data in an expected state πŸ™‚

    Link to my blog http://notyelf.com/

  • Good, with the sample data, sample structure, and the desired result, im sure we will be able to help you out!

    And don forget to take a peek at the article Jeff points to. Those are awesome creations πŸ™‚

  • Here is the code that creates the sample data

    IF (SELECT OBJECT_ID('TEMPDB..#FakeItems')) is not null

    DROP TABLE #FakeItems

    CREATE TABLE #FakeItems (Item_Code nvarchar(20), Superseeds nvarchar(20))

    GO

    INSERT #FakeItems

    SELECT 'Abc8','' UNION ALL

    SELECT 'Abc9','Abc8' UNION ALL

    SELECT 'Abc10','Abc9' UNION ALL

    SELECT 'Abc11','Abc10'

    GO

    IF (SELECT OBJECT_ID('TEMPDB..#FakeInvoice')) is not null

    DROP TABLE #FakeInvoice

    CREATE Table #FakeInvoice (Item_Code nvarchar(20), Period int, Sales int, qty int)

    GO

    INSERT #FakeInvoice

    SELECT 'Abc8',201201,904,5 UNION ALL

    SELECT 'Abc8',201202,430,3 UNION ALL

    SELECT 'Abc8',201203,812,10 UNION ALL

    SELECT 'Abc9',201201,993,5 UNION ALL

    SELECT 'Abc9',201202,736,7 UNION ALL

    SELECT 'Abc9',201203,285,9 UNION ALL

    SELECT 'Abc10',201201,330,2 UNION ALL

    SELECT 'Abc10',201202,560,5 UNION ALL

    SELECT 'Abc10',201203,802,4

    GO

    -- just returning the above data

    SELECT * FROM #FakeItems

    SELECT * FROM #FakeInvoice

    --------------------------------------

    The expected results are as follows

    1st data set returns

    Item Code Period Sales Qty

    abc10 201201 2227 12

    abc10 201202 1726 15

    abc10 201203 1899 23

    2nd data set returns

    abc11 201201 2227 12

    abc11 201202 1726 15

    abc11 201203 1899 23

    It should be noted that Abc11 is 'known' to be a future item as it has no data in the invoice table.

    Hopefully that is clear!

    I have read the other 2 articles that Jeff posted before but I read them again just in case :). If there is a better method for this than a recursive CTE, then I am all for it!

    Link to my blog http://notyelf.com/

  • One last question, how to determice abc10 and abc11 are base for the data sets 1 and 2 respectively ?

  • CELKO (4/9/2012)


    This is more of a temporal problem than a recursive problem.

    A useful idiom is a report period calendar. It gives a name to a range of dates.

    CREATE TABLE Report_Periods

    (report_name VARCHAR(30) NOT NULL PRIMARY KEY,

    report_start_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,

    report_end_date DATE NOT NULL,

    CONSTRAINT date_ordering

    CHECK (report_start_date <= report_end_date),

    etc);

    These report periods can overlap; a fiscal quarter will be contained in the range of its fiscal year. There can be gaps between them; we have to wait a year between each β€œAnnual Going out Of Business Sale!” and there might be long stretches of time without any special sales. But we want to know their ranges so that the table is fairly constant once it is created.

    I like the MySQL convention of using double zeroes for months and years, That is 'yyyy-mm-00' for a month within a year and 'yyyy-00-00' for the whole year. The advantage is that it will sort with the ISO-8601 data format required by Standard SQL.

    Can you get it from here or do you need more help?

    Unfortunately, Mr. Celko, this isn't MySQL. This is MS SQL Server and it doesn't have the same capabilities as MySQL or Oracle.

  • My apologies Mr. Celko but I disagree with you, not on the grounds that you're applying MySQL to a SQL Server problem, but rather to agree with the OP that this is a classic recursive CTE problem as it applies to the hierarchy of superceded items.

    My solution illustrates this.

    DECLARE @FakeItems TABLE (Item_Code nvarchar(20), Superseeds nvarchar(20))

    INSERT @FakeItems (Item_Code, Superseeds)

    SELECT 'Abc8',NULL UNION ALL

    SELECT 'Abc9','Abc8' UNION ALL

    SELECT 'Abc10','Abc9' UNION ALL

    SELECT 'Abc11','Abc10'

    DECLARE @FakeInvoice TABLE (Item_Code nvarchar(20), Period int, Sales int, qty int)

    INSERT @FakeInvoice

    SELECT 'Abc8',201201,904,5 UNION ALL

    SELECT 'Abc8',201202,430,3 UNION ALL

    SELECT 'Abc8',201203,812,10 UNION ALL

    SELECT 'Abc9',201201,993,5 UNION ALL

    SELECT 'Abc9',201202,736,7 UNION ALL

    SELECT 'Abc9',201203,285,9 UNION ALL

    SELECT 'Abc10',201201,330,2 UNION ALL

    SELECT 'Abc10',201202,560,5 UNION ALL

    SELECT 'Abc10',201203,802,4

    ;WITH ItemHier AS (

    SELECT Item_Code, Superseeds

    FROM @FakeItems

    UNION ALL

    SELECT h.Item_Code, i.Superseeds

    FROM @FakeItems i

    INNER JOIN ItemHier h ON h.Superseeds = i.Item_Code

    )

    SELECT h.Item_Code, Period, SUM(Sales) AS Sales, SUM(Qty) AS Qty

    FROM @FakeInvoice i

    INNER JOIN ItemHier h ON i.Item_Code = h.Superseeds OR

    (h.item_code = i.item_code and h.Superseeds IS NULL)

    WHERE h.Item_Code = 'abc10'

    GROUP BY h.Item_Code, Period

    ORDER BY h.Item_Code, Period

    To get the second results set, just substitute h.item_code = 'abc10' with h.item_code = 'abc11' or if you want the both, make it h.item_code IN ('abc10', 'abc11')

    Edit: Note: Being overly picky here but superseeds should be correctly spelled supercedes.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (4/9/2012)


    My solution illustrates this.

    DECLARE @FakeItems TABLE (Item_Code nvarchar(20), Superseeds nvarchar(20))

    INSERT @FakeItems (Item_Code, Superseeds)

    SELECT 'Abc8',NULL UNION ALL

    SELECT 'Abc9','Abc8' UNION ALL

    SELECT 'Abc10','Abc9' UNION ALL

    SELECT 'Abc11','Abc10'

    DECLARE @FakeInvoice TABLE (Item_Code nvarchar(20), Period int, Sales int, qty int)

    INSERT @FakeInvoice

    SELECT 'Abc8',201201,904,5 UNION ALL

    SELECT 'Abc8',201202,430,3 UNION ALL

    SELECT 'Abc8',201203,812,10 UNION ALL

    SELECT 'Abc9',201201,993,5 UNION ALL

    SELECT 'Abc9',201202,736,7 UNION ALL

    SELECT 'Abc9',201203,285,9 UNION ALL

    SELECT 'Abc10',201201,330,2 UNION ALL

    SELECT 'Abc10',201202,560,5 UNION ALL

    SELECT 'Abc10',201203,802,4

    ;WITH ItemHier AS (

    SELECT Item_Code, Superseeds

    FROM @FakeItems

    UNION ALL

    SELECT h.Item_Code, i.Superseeds

    FROM @FakeItems i

    INNER JOIN ItemHier h ON h.Superseeds = i.Item_Code

    )

    SELECT h.Item_Code, Period, SUM(Sales) AS Sales, SUM(Qty) AS Qty

    FROM @FakeInvoice i

    INNER JOIN ItemHier h ON i.Item_Code = h.Superseeds OR

    (h.item_code = i.item_code and h.Superseeds IS NULL)

    WHERE h.Item_Code = 'abc10'

    GROUP BY h.Item_Code, Period

    ORDER BY h.Item_Code, Period

    To get the second results set, just substitute h.item_code = 'abc10' with h.item_code = 'abc11'

    My solution was very vry similar to this one! So im refraining from posting it πŸ™‚

  • Cold Coffee: ITem 10 is the current most item that has data in the invoice table

    I will create seperate scripts one for summing everything up that is 'current', and the second one will take that and do the same thing to 'future' planned items.

    Celko: no it is not a temporal problem. For the sake of this sample set I added things in this simple matter but these calculations will be over time. I can safely say this is not a temporal problem. Also it is not a MsSql problem like Lynn had mentioned πŸ™‚

    Dwain: that works as expected thank you! One caveat however, is the filter is going to be an issue since this will span across some 50k items. I just used this as a sample set :). I DO however have a sample set of all the 'current' items so I believe this might put me on the right path and I am pretty sure I can work my way from here! I will let you know if I can't though!

    Link to my blog http://notyelf.com/

  • One caveat however, is the filter is going to be an issue since this will span across some 50k items. I just used this as a sample set :). I DO however have a sample set of all the 'current' items so I believe this might put me on the right path and I am pretty sure I can work my way from here! I will let you know if I can't though!

    I don't want to come across as an expert on recursive CTEs here (believe me, I am not) but I think that the issue that will impact performance is not so much the number of items, but rather the number of items that are superceded.

    As you said, you'll need to run some tests to see what happens.

    I did happen to take a look at the Execution Plan and I noticed that the solution I proposed does 2 table scans on the Fake_Items table and one on the Fake_Invoices table. Make sure you look into proper indexing if the query runs slow to see if those table scans can be converted to index seeks.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • shannonjk (4/9/2012)


    I will put along some sample data here today. Sorry this is not the only critical thing I have to do today! πŸ˜€

    Not to worry. Wasn't critical for me either. πŸ˜€

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I've not sussed the problem yet but the recursive method is going to get pretty darned slow as Dwain indicated. The recursive CTE essentially causes a cross join (Cartesian Product) between the number of unique items in the Item_Code column and the number of rows in the fake invoice table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Using the provided data except as Temp Tables instead of Table Variables, here's a recursive CTE solution without the Cartesian Product problem. Notice the "HRoot" column which allowed me to greatly simplify the outer query, as well. Just add the correct indexes and this should really fly for you. This will work for any value of Item_Code from the "FakeItems" table. The sums will go up to and including that Item_Code.

    CREATE TABLE #FakeItems(Item_Code nvarchar(20), Superseeds nvarchar(20))

    INSERT #FakeItems (Item_Code, Superseeds)

    SELECT 'Abc8',NULL UNION ALL

    SELECT 'Abc9','Abc8' UNION ALL

    SELECT 'Abc10','Abc9' UNION ALL

    SELECT 'Abc11','Abc10'

    CREATE TABLE #FakeInvoice (Item_Code nvarchar(20), Period int, Sales int, qty int)

    INSERT #FakeInvoice

    SELECT 'Abc8',201201,904,5 UNION ALL

    SELECT 'Abc8',201202,430,3 UNION ALL

    SELECT 'Abc8',201203,812,10 UNION ALL

    SELECT 'Abc9',201201,993,5 UNION ALL

    SELECT 'Abc9',201202,736,7 UNION ALL

    SELECT 'Abc9',201203,285,9 UNION ALL

    SELECT 'Abc10',201201,330,2 UNION ALL

    SELECT 'Abc10',201202,560,5 UNION ALL

    SELECT 'Abc10',201203,802,4

    ;

    WITH ItemHier AS

    (

    SELECT Item_Code, Superseeds, HRoot = Item_Code

    FROM #FakeItems

    WHERE Item_Code = 'Abc10'

    UNION ALL

    SELECT i.Item_Code, i.Superseeds, h.HRoot

    FROM #FakeItems i

    INNER JOIN ItemHier h ON h.Superseeds = i.Item_Code

    )

    SELECT h.HRoot, i.Period, Sales = SUM(i.Sales), Qty = SUM(i.Qty)

    FROM #FakeInvoice i

    INNER JOIN ItemHier h ON i.Item_Code = h.Item_Code

    GROUP BY h.HRoot, i.Period

    ORDER BY h.HRoot, i.Period

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 1 through 15 (of 35 total)

You must be logged in to reply to this topic. Login to reply