Hidden RBAR: Triangular Joins

  • TheSQLGuru (1/16/2009)


    Jeff, it strikes me that pretty much all of your examples here and in other threads have a clustered index on the key of concern. What happens when it is a non-clustered index? Or worse, no index? Often times people don't have the luxury of putting indexes on a column, especially clustered one's. Maybe it is time for some benchmarking on 'real world' tables. I think other mechanisms (including cursors and while loops and non-breaking set logic) may come to the fore in such cases as being more efficient.

    Actually, I pretty much covered that in the article in the running total article... copy it to another table if you can't make any changes to the original. In fact, the creation of a Temp Table using SELECT/INTO and doing all the processing on the temp table still takes a lot less than any form of While Loop including firehose cursors. I think you find that no other "mechanisms will come to the fore in such cases as being more efficient". And, since I don't actually write any "C", any one who want's to write a CLR for a given test set and then run tests between the "quirky" update and the CLR are more than welcome. I'll take all comers in that area.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Tempdb constraints come to mind here. 🙂

    I sure do wish someone with CLR knowhow (and some free time) would take the opportunity to do some CLR code you could integrate into your testing on your box to give apples-apples benchmarks. Damn I wish I had the time!!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (1/16/2009)


    Tempdb constraints come to mind here. 🙂

    I sure do wish someone with CLR knowhow (and some free time) would take the opportunity to do some CLR code you could integrate into your testing on your box to give apples-apples benchmarks. Damn I wish I had the time!!

    Heh... what... you don't think there's any TempDB constraints when using a cursor or While Loop? Any of the conditions that you mention that would require a temp table for the quirky update would also require as much or more of tempdb to use a cursor or While loop.

    In the "real world" as you call it, a large table requiring grouped running totals would either be setup with a trigger or some such to keep the running total up to date as new rows were added (or {gasp} modified) or they'd produce the running totals by customer on demand if they needed "on demand" or on nightly runs. If it's nightly runs and, for some reason, they don't allow for enough TempDB, the run could be split by client or the could play in a sandbox somewhere.

    About the CLR... you don't have the time to write a simple CLR??? 😛 I thought they were supposed to be so easy to write and impliment that no-one would even want to write T-SQL anymore. 😀

    Sorry... that wasn't a swipe at you, Kevin... that's a swipe at anyone that thinks CLR's are better than a bloody cursor for anything except for RegEx and some file handling. :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (1/16/2009)


    TheSQLGuru (1/16/2009)


    Tempdb constraints come to mind here. 🙂

    I sure do wish someone with CLR knowhow (and some free time) would take the opportunity to do some CLR code you could integrate into your testing on your box to give apples-apples benchmarks. Damn I wish I had the time!!

    Heh... what... you don't think there's any TempDB constraints when using a cursor or While Loop? Any of the conditions that you mention that would require a temp table for the quirky update would also require as much or more of tempdb to use a cursor or While loop.

    In the "real world" as you call it, a large table requiring grouped running totals would either be setup with a trigger or some such to keep the running total up to date as new rows were added (or {gasp} modified) or they'd produce the running totals by customer on demand if they needed "on demand" or on nightly runs. If it's nightly runs and, for some reason, they don't allow for enough TempDB, the run could be split by client or the could play in a sandbox somewhere.

    About the CLR... you don't have the time to write a simple CLR??? 😛 I thought they were supposed to be so easy to write and impliment that no-one would even want to write T-SQL anymore. 😀

    Sorry... that wasn't a swipe at you, Kevin... that's a swipe at anyone that thinks CLR's are better than a bloody cursor for anything except for RegEx and some file handling. :hehe:

    1) there is less tempdb storage overhead to a cursor (some of them anyway) than putting the entire set of data in tempdb.

    2) your definition of "real world" is actually better described as "fantasy real world".

    3) CLR IS easy to write and implement if you do it even semi-regularly, especially what we are looking for here (see the example from below).

    4) CLR 'cursors' have been PROVEN to be significantly faster than TSQL cursors for iterative read operations on data. See here and the associated initial link at the top: http://sqlblog.com/blogs/hugo_kornelis/archive/2007/11/28/poor-men-see-sharp-ndash-more-cursor-optimization.aspx

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (1/17/2009)


    1) there is less tempdb storage overhead to a cursor (some of them anyway) than putting the entire set of data in tempdb.

    Heh, ok, you said it, now prove it.

    2) your definition of "real world" is actually better described as "fantasy real world".

    Fine... why? What's your great definition of real world for such things? I know, I know... you're going to say you can't use TempDB because the table is too large, then say you can't use a Sandbox because they won't allow it, then say you can't modify the original table even by putting an index on it, and then challenge me to use the quirky update. Yeah, those are real world restrictions for a company that deserves a cursor. Write the bloody damned cursor and get away from them as fast as you can.

    3) CLR IS easy to write and implement if you do it even semi-regularly, especially what we are looking for here (see the example from below).

    Fine... let's see some code. You've got time for rhetoric like this... write some code.

    4) CLR 'cursors' have been PROVEN to be significantly faster than TSQL cursors for iterative read operations on data. See here and the associated initial link at the top: http://sqlblog.com/blogs/hugo_kornelis/archive/2007/11/28/poor-men-see-sharp-ndash-more-cursor-optimization.aspx%5B/quote%5D

    Yes... PROVEN to be significantly faster than [font="Arial Black"]TSQL cursors for iterative[/font]. THAT's the problem... people can't think of set based code and they resort to some bloody declared cursor or While Loop in T-SQL and then point out that CLRs are "significantly faster". Yeah... they're significantly faster against the worst form of code you could ever write in SQL and then they claim it's the right thing to do. CLR's fix the wrong problem... write set based code instead.

    Besides... I need to see the crap code someone raced a CLR against... people that can't think of set based solutions are usually pretty crappy cursor writers, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (1/17/2009)


    TheSQLGuru (1/17/2009)


    1) there is less tempdb storage overhead to a cursor (some of them anyway) than putting the entire set of data in tempdb.

    Heh, ok, you said it, now prove it.

    2) your definition of "real world" is actually better described as "fantasy real world".

    Fine... why? What's your great definition of real world for such things? I know, I know... you're going to say you can't use TempDB because the table is too large, then say you can't use a Sandbox because they won't allow it, then say you can't modify the original table even by putting an index on it, and then challenge me to use the quirky update. Yeah, those are real world restrictions for a company that deserves a cursor. Write the bloody damned cursor and get away from them as fast as you can.

    3) CLR IS easy to write and implement if you do it even semi-regularly, especially what we are looking for here (see the example from below).

    Fine... let's see some code. You've got time for rhetoric like this... write some code.

    4) CLR 'cursors' have been PROVEN to be significantly faster than TSQL cursors for iterative read operations on data. See here and the associated initial link at the top: http://sqlblog.com/blogs/hugo_kornelis/archive/2007/11/28/poor-men-see-sharp-ndash-more-cursor-optimization.aspx%5B/quote%5D

    Yes... PROVEN to be significantly faster than [font="Arial Black"]TSQL cursors for iterative[/font]. THAT's the problem... people can't think of set based code and they resort to some bloody declared cursor or While Loop in T-SQL and then point out that CLRs are "significantly faster". Yeah... they're significantly faster against the worst form of code you could ever write in SQL and then they claim it's the right thing to do. CLR's fix the wrong problem... write set based code instead.

    Besides... I need to see the crap code someone raced a CLR against... people that can't think of set based solutions are usually pretty crappy cursor writers, as well.

    1) IIRC for some of your running total solutions and other set-based examples here (especially the ones using update @var = @var ... trick) you put the entire set of data in temp table then update it. A cursor/while loop solution could only have to put key (and amount to be fastest), which seems to be less data. I could well be misremembering though, but I do believe it will be less storage required.

    2) Your original statement of real world specified triggers to maintain totals, batch processes perhaps. That isn't real world to me, it is 'best case'. Many facilities do not have such and calculate what they need when they need it. Thus my calling your real world fantasy world. Nothing to do with tempdb here. Just reality as I have experienced it in 10+ years of consulting.

    3) the sample code (which is indeed very simple) was in the post I referenced.

    4) My point about CLR cursors being faster than tsql cursors responded precisely to your statement "that's a swipe at anyone that thinks CLR's are better than a bloody cursor for anything except for RegEx and some file handling.". They are faster - just a statement of fact. I said nothing nor implied nothing about them being optimal or best-case or appropriate for the solution required. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Besides... I need to see the crap code someone raced a CLR against... people that can't think of set based solutions are usually pretty crappy cursor writers, as well.

    Sorry, posted by accident before I finished responding.

    Did you actually read the link I sent - fully, including the link at the top to the original article (which I explicitly called out in my post)?? Obviously not. If you had you would have seen both the various and sundry cursor code runs with different settings as well as the set-based solution which was shown in the benchmarking done to be much faster than all other solutions. The entire purpose of the 2 posts was for cursor benchmarking, but the writer explicitly stated that a set based solution was available and much preferred. And I assure you that Hugo is neither a crappy set-based coder nor a crappy cursor writer.

    There are processes out there that for whatever reasons cannot be refactored as set-based solutions and thus it is helpful to know the most optimal way to code those processes using some row-by-row logic. I believe that knowledge is a useful addition to the public domain.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (1/17/2009)


    1) IIRC for some of your running total solutions and other set-based examples here (especially the ones using update @var = @var ... trick) you put the entire set of data in temp table then update it. A cursor/while loop solution could only have to put key (and amount to be fastest), which seems to be less data. I could well be misremembering though, but I do believe it will be less storage required.

    I don't put anymore into a TempTable than what folks put into a cursor. But, I get your point. I'll make sure to do a memory and TempDB analysis of all methods I show in the article.

    2) Your original statement of real world specified triggers to maintain totals, batch processes perhaps. That isn't real world to me, it is 'best case'. Many facilities do not have such and calculate what they need when they need it. Thus my calling your real world fantasy world. Nothing to do with tempdb here. Just reality as I have experienced it in 10+ years of consulting.

    Heh... understood. But I've also covered the "real world" and I still get folks that insist it isn't their real world. I even covered calculating "what they need when they need it" and still get lambasted by those who don't live in that world. The real key here is that people have to figure out what their real world is and either learn how to work with it or how to change it. And, if you're gonna knock rings, I've been doing this for 13 years. Does it make me right. Nope. It's just the same as you... I've seen a lot of things and what people perceive to be their real world is frequently on their impression. To think outside the box, you must first realize... you're in a box.

    3) the sample code (which is indeed very simple) was in the post I referenced.

    Sorry... my bad. I'll go back and check. But, I've gotta ask again, if it's that simple and you already have an example, and you have the time for all these posts, how come you don't have time to do some testing? 😉

    4) My point about CLR cursors being faster than tsql cursors responded precisely to your statement "that's a swipe at anyone that thinks CLR's are better than a bloody cursor for anything except for RegEx and some file handling.". They are faster - just a statement of fact. I said nothing nor implied nothing about them being optimal or best-case or appropriate for the solution required. 🙂

    Ah... ok and understood. What I don't want people to think is that CLR's are ever going to be a replacement for doing things correctly in T-SQL... and cursors are almost never the correct way.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • TheSQLGuru (1/17/2009)


    Did you actually read the link I sent - fully, including the link at the top to the original article (which I explicitly called out in my post)?? Obviously not.

    Correct... my bad. I'll got back and look at someone else's article. I have to admit, I was hoping that you'd come up with something on your own.

    And I assure you that Hugo is neither a crappy set-based coder nor a crappy cursor writer.

    If he's writing cursors instead of setbased code... well, we'll just let that be for a minute.

    There are processes out there that for whatever reasons cannot be refactored as set-based solutions and thus it is helpful to know the most optimal way to code those processes using some row-by-row logic.

    Yep... already admitted that. CLR's are better at most RegEx and file handling. With those two exceptions (so far), and with only rare exception, people can usually at least tie a CLR solution with T-SQL. Some people just give up too easily because the don't really know T-SQL and fall back on what they do know. Nothing wrong with that, but, with the exceptions already noted, it doesn't make CLR's superior in any way, shape, or fashion. In fact, they are frequently the worst solution performance wise.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I've checked in your recent posts on this thread and I'm not seeing a link that you posted. I'm probably just missing it... would you do me the favor of posting on this current thread again so I can go take a look at it? Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Yep... already admitted that. CLR's are better at most RegEx and file handling. With those two exceptions (so far), and with only rare exception, people can usually at least tie a CLR solution with T-SQL. Some people just give up too easily because the don't really know T-SQL and fall back on what they do know. Nothing wrong with that, but, with the exceptions already noted, it doesn't make CLR's superior in any way, shape, or fashion. In fact, they are frequently the worst solution performance wise.

    Dammit you are SOOOO missing my point here!! :blink: Just drop the frickin' set-based mantra for 30 seconds, will you??? Some processes cannot be made set-based for whatever reason (and there are numerous reasons). Many coders (especially .NET coders doing database stuff) don't know how to take looping logic and make it set-based. Sure, it would be nice if neither of those statements were true, but they both most certainly ARE. Sure it would be nice if the coders would learn how to convert 150 lines of cursor code to a set based statement that is 10 or 100X more efficient, but not all will or even CAN. They may not have time to figure it out, no inclination or care, not be smart enough, etc.

    Given that, knowledge of what type of cursor/looping logic is most efficient (ACCEPTING THAT IT IS STILL LOOPING LOGIC AND FOR WHAT EVER REASON IS NOT AND WILL NOT BE MADE SET BASED) is a good thing for the public to know. Knowing that a dynamic cursor is less efficient than fast_forward is less efficient than CLR is important knowledge. Either you accept that or not, but I can't make my point any clearer.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I understand what you're trying to say, my dear ol' and trusted friend... but there is no time in the past nor anytime in the future where I have or will intentionally teach people how to do things wrong... and with very rare exceptions, teaching someone to use a cursor, While Loop, or recursive CTE is teaching them how to do something dreadfully and totally wrong. The reason why some people simply cannot grasp the set based paradigm is because they keep practicing cursors and CLR's and other forms of RBAR.

    You're in "the business"... how many times have you made a performance improvement by changing set based code to a cursor, While Loop, recursive CTE, or CLR? I believe we both know the answer to that. In the area of things like running totals, you may have made an improvement by changing some code, that didn't have any of those three things, to some form of RBAR, but that's only because what looked like set based code had triangular joins and other forms of hidden RBAR in it.

    Go back and look at my articles... I carefully explain the RBAR methods already... then I compare them with the set based methods. My set based mantra, as you called it, has brought a lot of people out of a lot of performance problems and I'm going to stick with it. I'll never teach someone how to use any form of RBAR when a set based method is available except so they can see how bad it is, themselves.

    Sure it would be nice if the coders would learn how to convert 150 lines of cursor code to a set based statement that is 10 or 100X more efficient

    I have to try... always...

    ...but not all will or even CAN. They may not have time to figure it out, no inclination or care, not be smart enough, etc.

    Then, perhaps they should take up selling shoes. Not everyone is cut out for writing T-SQL (or even working with computers) and I'm not going to help those that claim to have no time or no inclination to learn, to stay in a business they're not fit for. For those not smart enough, I'm trying to educate them and I'm not going to waste time showing them the wrong way.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Joe Celko (1/17/2009)


    If SQL Server was up to ANSI/ISO Standards, we would have POSIX style regular expression in the SIMILAR TO predicate, like other SQLs. This is one reason that SQL Sever is called a "Lesser SQL" in the pure RDBMS circles.

    Tell me this... which database engine is completely "up to ANSI/ISO Standards"?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • 1) http://sqlblog.com/blogs/hugo_kornelis/archive/2007/11/28/poor-men-see-sharp-ndash-more-cursor-optimization.aspx. Don't forget to examine the original post mentioned at the top.

    2) to Joe: there are several implementations for regex that developers (both XP-based and CLR-based) can download and install from the web that work just fine on sql server. Please do us all a favor and take your pontificating about ANSI this and Standards that back to the NNTP forums, or best away from SQL Server specific forums altogether.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (1/17/2009)


    2) to Joe: there are several implementations for regex that developers (both XP-based and CLR-based) can download and install from the web that work just fine on sql server. Please do us all a favor and take your pontificating about ANSI this and Standards that back to the NNTP forums, or best away from SQL Server specific forums altogether.

    Heh... You just said what I was thinking... except you said it about a 1,000 times nicer.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 151 through 165 (of 255 total)

You must be logged in to reply to this topic. Login to reply