Starting with "WITH"

  • SQLRNNR (11/6/2011)


    Jeff Moden (11/5/2011)


    SQLRNNR (11/3/2011)


    Yes recursive CTEs can be slow - you need to be sure you have defined your recursion well enough.

    I have seen them perform marvelously as well. For instance we were able to take a tree recursion down from 30 minutes using a loop method to under 30 seconds for hierarchies of 700+ levels and 1.5 million people.

    There is also potential problems for introducing circular references with a recursive cte that are easier to avoid in a while loop or cursor so you have to watch for that.

    Hi Jason,

    Heh... Tom and I are at the beginning of what promises to be a very interesting conversation (I love working with Tom... I learn something new everytime) about recursion and he's cited your post as an example. Before I can respond to Tom's fine and highly appropriate observation, I need to know just a bit more about your adventure in the quote above. If you have the time, I sure would appreciate it.

    1. Did you traverse all 1.5 million "people nodes" in your code?

    2. What was the purpose of the traversal? In other words, what did the code do with the hierarchy?

    3. How was the loop code accomplishing the same thing? We know it was a loop but did it do something like Celko's "push stack" method or was it simply "read-a-row, write-a-row" or ???

    4. I'm definitely not familiar with what you did and my meager questions are certainly not all encompassing, so any amplifying information above and beyond the questions above will be greatly appreciated.

    Thanks for your time, Jason.

    1. Yes

    2. To write out the hierarchical tree to a table so other queries could use that rather than rebuild the tree each time a report needed to be run. Also, we used the same code in building out compensation/bonus earnings (think MLM).

    3. Read a row/write a row.

    4. No problems. I continue to learn with the recursive CTEs as I go. I am merely average when it comes to programming them and learn something new with each new recursion it seems. 😉

    Thanks, Jason.

    By the way, do you still work for an MLM company? I ask because I have a bit of code you might be interested in... especially if it's for a "Uni-Level" model for compensation/bonus (although it'll handle just about any model that can be represented with a "DAG" type of Adjacency List).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • SQL Kiwi (11/6/2011)


    Recursive CTEs can be made to perform extremely well;...{snip}...And there's also a 'super-fast' DISTINCT (by me!) that Jeff may recall:

    http://qa.sqlservercentral.com/Forums/FindPost1013407.aspx

    That is a remarkable bit of code, indeed. The method in the code is what's remarkable because it very quickly isolates the very low number of unique values. But it's only faster than SELECT DISTINCT if there are, in fact, a lot of dupes. It's much slower than DISTINCT in the presence of a lot of unique values and it's because recursive CTEs are a bit of RBAR on steroids.

    Test table setup code (relatively few duplicates created):

    --==============================================================================

    -- Paul's original test table modified to contain a lot of unique values

    -- rather than a lot of duplicates.

    --==============================================================================

    USE tempdb;

    GO

    DROP TABLE dbo.Test;

    GO

    CREATE TABLE

    dbo.Test

    (

    data INTEGER NOT NULL,

    );

    GO

    CREATE CLUSTERED INDEX c ON dbo.Test (data);

    GO

    --===== Very few duplicated values

    INSERT dbo.Test WITH (TABLOCK)

    (data)

    SELECT TOP (500000)

    ROW_NUMBER() OVER (ORDER BY (SELECT 0)) % 400000

    FROM master.sys.columns C1,

    master.sys.columns C2,

    master.sys.columns C3;

    GO

    Test code to compare the rCTE to a normal DISTINCT.

    --===== Declare a variable that will be used to take the

    -- time to display out of the picture.

    DECLARE @Bitbucket INT

    ;

    PRINT '========== rCTE Replacement for DISTINCT ======================================='

    SET STATISTICS TIME,IO ON;

    WITH RecursiveCTE

    AS (

    SELECT data = MIN(T.data)

    FROM dbo.Test T

    UNION ALL

    SELECT R.data

    FROM (

    -- A cunning way to use TOP in the recursive part of a CTE :)

    SELECT T.data,

    rn = ROW_NUMBER() OVER (ORDER BY T.data)

    FROM dbo.Test T

    JOIN RecursiveCTE R

    ON R.data < T.data

    ) R

    WHERE R.rn = 1

    )

    SELECT @Bitbucket = Data

    FROM RecursiveCTE

    OPTION (MAXRECURSION 0);

    SET STATISTICS TIME,IO OFF;

    PRINT '========== Normal DISTINCT ====================================================='

    SET STATISTICS TIME,IO ON;

    SELECT DISTINCT

    @Bitbucket = Data

    FROM dbo.Test

    ;

    SET STATISTICS TIME OFF;

    Here are the results... all rCTE's are similar in that they produce a large number of reads for what they do for the very reasons that Craig Freedman cited in his good article.

    ========== rCTE Replacement for DISTINCT =======================================

    Table 'Worktable'. Scan count 2, logical reads 2400001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Test'. Scan count 400001, logical reads 1200485, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 24188 ms, elapsed time = 31349 ms.

    ========== Normal DISTINCT =====================================================

    Table 'Test'. Scan count 1, logical reads 908, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 515 ms, elapsed time = 507 ms.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (11/6/2011)


    But it's only faster than SELECT DISTINCT if there are, in fact, a lot of dupes. It's much slower than DISTINCT in the presence of a lot of unique values and it's because recursive CTEs are a bit of RBAR on steroids.

    Yes - this code was my attempt to write a 'skip scan', which is a very targeted optimization, with poor worst-case performance.

    I don't agree that recursive CTEs are RBAR by the way: they're simply set-iterative, a bit like Hugo's 'lasagne' running total code. A well-written rCTE can be very good; a badly written one can be awful - but this state of affairs is hardly unique to recursive CTEs 🙂

    Chris, now would be a good time...;-)

  • SQL Kiwi (11/6/2011)


    I don't agree that recursive CTEs are RBAR by the way: they're simply set-iterative, a bit like Hugo's 'lasagne' running total code.

    I have to agree to disagree then... rCTEs are nothing like Hugo's 'lasagne' code. Craig Freeman's explanation of how rCTEs operate is spot on. It reads a row from the secondary spool, deletes it, and then operates on that value. If there's anything left on the "stack" (secondary spool), it reads a row from the secondary spool, deletes it, and then operates on that value ad infinitum until the "stack" is clear. Then it loads the "stack" with a fresh batch (next level, in his hierarchical code) and repeats... one row at a time and with the reads to go with it. A good WHILE loop will do the same thing in about the same amount of time but with less reads.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (11/6/2011)


    SQLRNNR (11/6/2011)


    Jeff Moden (11/5/2011)


    SQLRNNR (11/3/2011)


    Yes recursive CTEs can be slow - you need to be sure you have defined your recursion well enough.

    I have seen them perform marvelously as well. For instance we were able to take a tree recursion down from 30 minutes using a loop method to under 30 seconds for hierarchies of 700+ levels and 1.5 million people.

    There is also potential problems for introducing circular references with a recursive cte that are easier to avoid in a while loop or cursor so you have to watch for that.

    Hi Jason,

    Heh... Tom and I are at the beginning of what promises to be a very interesting conversation (I love working with Tom... I learn something new everytime) about recursion and he's cited your post as an example. Before I can respond to Tom's fine and highly appropriate observation, I need to know just a bit more about your adventure in the quote above. If you have the time, I sure would appreciate it.

    1. Did you traverse all 1.5 million "people nodes" in your code?

    2. What was the purpose of the traversal? In other words, what did the code do with the hierarchy?

    3. How was the loop code accomplishing the same thing? We know it was a loop but did it do something like Celko's "push stack" method or was it simply "read-a-row, write-a-row" or ???

    4. I'm definitely not familiar with what you did and my meager questions are certainly not all encompassing, so any amplifying information above and beyond the questions above will be greatly appreciated.

    Thanks for your time, Jason.

    1. Yes

    2. To write out the hierarchical tree to a table so other queries could use that rather than rebuild the tree each time a report needed to be run. Also, we used the same code in building out compensation/bonus earnings (think MLM).

    3. Read a row/write a row.

    4. No problems. I continue to learn with the recursive CTEs as I go. I am merely average when it comes to programming them and learn something new with each new recursion it seems. 😉

    Thanks, Jason.

    By the way, do you still work for an MLM company? I ask because I have a bit of code you might be interested in... especially if it's for a "Uni-Level" model for compensation/bonus (although it'll handle just about any model that can be represented with a "DAG" type of Adjacency List).

    I'd love to see it.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Jeff Moden (11/6/2011)


    I have to agree to disagree then... rCTEs are nothing like Hugo's 'lasagne' code. Craig Freeman's explanation of how rCTEs operate is spot on. It reads a row from the secondary spool, deletes it, and then operates on that value. If there's anything left on the "stack" (secondary spool), it reads a row from the secondary spool, deletes it, and then operates on that value ad infinitum until the "stack" is clear. Then it loads the "stack" with a fresh batch (next level, in his hierarchical code) and repeats... one row at a time and with the reads to go with it. A good WHILE loop will do the same thing in about the same amount of time but with less reads.

    Couple of things there.

    First, as you mentioned just recently, not all logical reads are created equal. Logical reads against a spool (which will generally be in memory) frequently count rows rather than the usual 8KB pages. This was a design decision because it was felt that reporting logical pages would not be very useful for an object the user has no access to.

    The second thing is that rCTEs are 'a bit' like Hugo's code because one of the ways to make them fly is to ensure the lowest level of the joins seeks directly to a row rather than joining to a base table per iteration. Some of Chris' examples use a pre-computed RANK to achieve this - and that's the similarity with Hugo's method I was referring to. So long as the number of iterations at the lowest level is relatively small, and you do something vaguely set-based at that level, you get the 'lasagne' effect.

    A 'good WHILE loop' can indeed approach the performance of a well-written rCTE in some cases (Hugo's method uses WHILE, remember) but that doesn't mean we should automatically assign the poorest kinds of cursor-type behaviour and performance to either implementation.

  • Perhaps rCTE's provide some incentive for tightening up the definition of RBAR. I've always read RBAR as meaning row by row coding. rCTE's are not - they're row by row processing performed by the engine. The current definition of RBAR would class 40,000 clustered index seeks (get one from here, now get one from there...) as RBAR on crystal meth.

    I'm still raking through former posts to find some good examples to back up Paul's generous statement, but in the meantime, consider this: you wouldn't use a rCTE for generating rows/numbers for splitting strings, or for performing a simple running total. Why not? Because Jeff and others have published performance figures of rCTE's compared with other methods for performing the same tasks, and the rCTE's "really suck", running in 3x the time of the fastest code. We know the tool to use for the job because we've read the articles. That's not the point though. The point is this -anyone who cares enough about their programming to spend quality time reading performance articles and threads is already thinking performance before writing the first line of code, and is likely to try alternative query schemes if performance isn't up to a perceived standard. It's all the other folks you need to concern yourself with - the youngsters and the care-nots. They'll use an rCTE for counting or performing a running total because it's easy to write.

    It's that simple. rCTE's are a powerful and expensive tool, and most folks don't know when best to deploy them or how to tinker with the code to squeeze out every ounce of performance. rCTE's = RBAR? Pah, dogma.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I agree with Chris.

    Because I inadvertantly started (this very entertaining and informative discussion) by making a "suckish" comment, I feel compelled to admit that I have in fact used recursive CTEs in inline table valued functions to drive "loop" logic that feeds a small list of parameters to a query that results in tens of thousands of rows. This divide and conquer strategy worked well, but the RBAR was limited to less than a dozen rows, each of which was fed to a query which produced thousands of rows.

    Used correctly recursive CTEs are powerful tools and have their place in the toolbox.

    Used incorrectly, they can and will suck. Novice users need to be aware of this, that's all.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • The Dixie Flatline (11/7/2011)


    I agree with Chris.

    Because I inadvertantly started (this very entertaining and informative discussion) by making a "suckish" comment, I feel compelled to admit that I have in fact used recursive CTEs in inline table valued functions to drive "loop" logic that feeds a small list of parameters to a query that results in tens of thousands of rows. This divide and conquer strategy worked well, but the RBAR was limited to less than a dozen rows, each of which was fed to a query which produced thousands of rows.

    Used correctly recursive CTEs are powerful tools and have their place in the toolbox.

    Used incorrectly, they can and will suck. Novice users need to be aware of this, that's all.

    You and Chris have both hit the nail on the head. I don't care if people think rCTEs are RBAR or not (although I will continue to claim they are compared to a normal SELECT). I don't really care if the number of reads goes to the moon or not. rCTEs are like anything else... they can be used incorrectly just as cursors are. They have their uses. Just like Cursors and WHILE loops, I just want to make sure that people understand that the use of rCTEs should probably be avoided if there's a decent set based method available. The exception is rCTEs that count... my recommendation is that they should be avoided at all costs.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I would happily refer to rCTEs (and nested loops plans in general) as RBR rather than RBAR; it's the 'A' that I have issues with 🙂

    Mind you, everything's RBR when you get right down to the nuts and bolts of it...

  • SQL Kiwi (11/7/2011)


    I would happily refer to rCTEs (and nested loops plans in general) as RBR rather than RBAR; it's the 'A' that I have issues with 🙂

    Mind you, everything's RBR when you get right down to the nuts and bolts of it...

    Yeah - when you start looking at it that way. 😀

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Jeff Moden (11/7/2011)


    ...like Cursors and WHILE loops, I just want to make sure that people understand that the use of rCTEs should probably be avoided if there's a decent set based method available. The exception is rCTEs that count... my recommendation is that they should be avoided at all costs.

    Perfick, couldn't agree more.

    For your entertainment, here are a few of the rCTE threads which Paul was referring to. Notice that he modestly sidelines his own observation that a unique clustered index is key to performance with "this row / last row" rCTE's.

    That’s not a misprint uncle Jeff

    http://qa.sqlservercentral.com/Forums/FindPost873955.aspx

    Handshaking

    http://qa.sqlservercentral.com/Forums/FindPost900646.aspx

    Upside down QU

    http://qa.sqlservercentral.com/Forums/FindPost925065.aspx

    Jeff’s gonna hate me for this

    http://qa.sqlservercentral.com/Forums/FindPost1009302.aspx

    N rows at a time

    http://qa.sqlservercentral.com/Forums/FindPost1170015.aspx

    Most, if not all, of the rCTE's posted in these threads could be replaced by a more performant QU, but the last one (N rows at a time) would be quite difficult. Two others from this year remain lost in the archives. The OP in one of them was unable to create anything in the db. Game over for QU. The other worked with exactly 4 rows in each iteration.

    To balance things up a little and show I'm not biased against the QU, here's the most challenging QU code block I've written. I didn't get around to writing the rCTE version.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (11/8/2011)


    Jeff Moden (11/7/2011)


    ...like Cursors and WHILE loops, I just want to make sure that people understand that the use of rCTEs should probably be avoided if there's a decent set based method available. The exception is rCTEs that count... my recommendation is that they should be avoided at all costs.

    Perfick, couldn't agree more.

    For your entertainment, here are a few of the rCTE threads which Paul was referring to. Notice that he modestly sidelines his own observation that a unique clustered index is key to performance with "this row / last row" rCTE's.

    That’s not a misprint uncle Jeff

    http://qa.sqlservercentral.com/Forums/FindPost873955.aspx

    Handshaking

    http://qa.sqlservercentral.com/Forums/FindPost900646.aspx

    Upside down QU

    http://qa.sqlservercentral.com/Forums/FindPost925065.aspx

    Jeff’s gonna hate me for this

    http://qa.sqlservercentral.com/Forums/FindPost1009302.aspx

    N rows at a time

    http://qa.sqlservercentral.com/Forums/FindPost1170015.aspx

    Most, if not all, of the rCTE's posted in these threads could be replaced by a more performant QU, but the last one (N rows at a time) would be quite difficult. Two others from this year remain lost in the archives. The OP in one of them was unable to create anything in the db. Game over for QU. The other worked with exactly 4 rows in each iteration.

    To balance things up a little and show I'm not biased against the QU, here's the most challenging QU code block I've written. I didn't get around to writing the rCTE version.

    Outstanding feedback, Chris. That's quite a collection and some of the text is, in fact, quite surprising, indeed. I know I previously responded to some of those posts but I'm going to bookmark those threads and do a reread.

    And thanks for crossing out "probably". 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Everything may be RBR, but I always thought the purpose was to keep it from being RBAR, Paul 😉

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • The Dixie Flatline (11/7/2011)


    I agree with Chris.

    Because I inadvertantly started (this very entertaining and informative discussion) by making a "suckish" comment, I feel compelled to admit that I have in fact used recursive CTEs in inline table valued functions to drive "loop" logic that feeds a small list of parameters to a query that results in tens of thousands of rows. This divide and conquer strategy worked well, but the RBAR was limited to less than a dozen rows, each of which was fed to a query which produced thousands of rows.

    Used correctly recursive CTEs are powerful tools and have their place in the toolbox.

    Used incorrectly, they can and will suck. Novice users need to be aware of this, that's all.

    You should have been a polititian, Bob. "I only tried it once, as a freshman. It wasn't lit and I didn't inhale":-D


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Viewing 15 posts - 16 through 30 (of 55 total)

You must be logged in to reply to this topic. Login to reply