Hidden RBAR: Triangular Joins

  • TheSQLGuru (1/17/2009)


    1) http://sqlblog.com/blogs/hugo_kornelis/archive/2007/11/28/poor-men-see-sharp-ndash-more-cursor-optimization.aspx. Don't forget to examine the original post mentioned at the top.

    Thanks Kevin, and will do.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (1/17/2009)


    TheSQLGuru (1/17/2009)


    1) http://sqlblog.com/blogs/hugo_kornelis/archive/2007/11/28/poor-men-see-sharp-ndash-more-cursor-optimization.aspx. Don't forget to examine the original post mentioned at the top.

    Thanks Kevin, and will do.

    Oh... those... I've read those before... and, I don't get your point at all. In the article whose URL is...

    http://sqlblog.com/blogs/hugo_kornelis/archive/2007/11/21/curious-cursor-optimization-options.aspx

    He very clearly and correctly starts out right with...

    But there is still a small category of problems where a cursor will outperform a set-based solution. The introduction of ranking functions in SQL Server 2005 has taken a large chunk out of that category – but some remain. For those problems, it makes sense to investigate the performance effects of the various cursor options.

    ... but then he never uses an example where the code DOESN'T have a clear and highly performant set based solution. In fact, he lists those very set based solutions. What was the point? Just to demonstrate what an optimal cursor was and how a set based solution just absolutely blows the doors off even the fastest equivalent cursor solution?

    I'm not trying to be a smart a$$, Kevin but, you're correct in one of your previous statements... I am totally missing the point here.

    Same goes with the article on the poor-mans cursor except that he also get's into CLR's... he keeps saying that Adam Machanic was right, right up to the end were he proves that he's not...

    After compiling and deploying the code above, I once more ran 5 tests. The average execution time for this version was 12,215 milliseconds, almost 150% more than the cursor version. My guess is that this huge increase in time is not a result of the update command itself, but a result of the requirement to pre-load the data in a DataSet and then iterate over that. I did not test it, but I expect to see a similar problem if a cursor requires reading some additional data, based on the data read in the cursor – this, too, would require the CLR version to employ a DataSet instead of simply looping over a SqlDataReader.

    Conclusion

    Adam’s suggestion to use CLR makes sense, but only for cases where no additional data access, either reading or modifying, is required when processing the rows in the cursor. As soon as the latter becomes a requirement, the CLR version has to switch from using a SqlDataReader to using SqlDataAdapter.Fill, and performance suffers horribly.

    He also proves that, like many of my suggestions in T-SQL where you cannot modify the real world table, he copies the data set for the CLR to use and updates that instead...

    So that leaves me with only one other option – use SqlDataAdapter.Fill method to copy the entire results of the query to a DataSet, then loop over it and process its rows one by one. This results in the CLR version doing a lot more work and using a significant amount of memory. The fact that we no longer update the row we have just read, but rather read them all and only then update them all means that there is also an increased chance that the row is no longer in the data cache and hence has to be read from disk a second time for the update, effectively doubling the amount of physical I/O (though this did not happen in my case).

    Again... I just don't understand what you're trying to tell me and why I should "drop the set based mantra for 30 seconds". The cursor is slower than the quirky update. The poor-mans cursor is slower than the quirky update. The CLR is slower than the quiry update. Why the hell should I teach people how to improve on them when the RBAR code that Hugo optimized is still slower than the quirky update?

    Sorry if I'm being a bit thick... I just don't get what you're trying to tell me.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Again... I just don't understand what you're trying to tell me and why I should "drop the set based mantra for 30 seconds". The cursor is slower than the quirky update. The poor-mans cursor is slower than the quirky update. The CLR is slower than the quiry update. Why the hell should I teach people how to improve on them when the RBAR code that Hugo optimized is still slower than the quirky update?

    Sorry if I'm being a bit thick... I just don't get what you're trying to tell me.

    And you apparently never will. I am done beating this dead horse.

    As for the Dataset being bloated and slow, yep, sure is. But Adam Machanic states in the comments section that he built a light-weight wrapper that performs just a bit slower than datareader that can be used as a replacement for the update issue.

    Anyway, signing off on this one.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • You give up too easy.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I've been playing with SSIS recently and have a love/hate relationship with it.

    I've a feeling that some of the facilities do RBAR type operations rather than set based operations.

    I know the old DTS used RBAR when a lookup transform was used and I guess SSIS still does but has anyone noticed any of the other stuff getting big performance drop outs?

    I'm also somewhat wary of CLR stuff. Its fine for simple stuff and scalar functions but for more complex stuff such as the new geographic types in SQL2008 it works find up to a point and then boom, performance goes down like a sack of spuds.

  • Joe Celko (1/17/2009)


    If SQL Server was up to ANSI/ISO Standards, we would have POSIX style regular expression in the SIMILAR TO predicate, like other SQLs. This is one reason that SQL Sever is called a "Lesser SQL" in the pure RDBMS circles.

    This one goes into my "joke of the day" basket 😉

    But indeed, if it had been implemented, it might have been better.

    Certainly not all dbms are 100% ansi/iso and even if they were....

    not all DBA nor devs would 100% use it according to the standards.

    (lack of knowledge, lack of "when to use, where to use", "not my cup of tea, I just use the db", ... )

    Any dbms is filled with "guidelines", "never say never", "always say maybe",...

    All we can strive for is to keep the line as straight as possible to get from A to B, aligned to the actual infrastructure (dbms) we have.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Joe Celko (1/19/2009)


    >>I'm also somewhat wary of CLR stuff. Its fine for simple stuff and scalar functions but for more complex stuff such as the new geographic types in SQL2008 it works find up to a point and then boom, performance goes down like a sack of spuds. <<

    Have seen Ben-Gan's tests on even short CLR functions? Not good because RBAR. What we don't have yet is the stats on maintaining 42 different CLR languages in one schema. Even the basic functions like MOD() or data types like Boolean vary from procedural language to procedural language. I don't want to be the DBA that has to do that kind of crazy quilt.

    See here http://sqlblog.com/blogs/hugo_kornelis/archive/2007/11/28/poor-men-see-sharp-ndash-more-cursor-optimization.aspx (and the associated cursor benchmarking post referenced at the top) for proof that CLR 'cursor' is significantly faster than best TSQL cursor.

    Who gives a crap about umpteen languages being available? The developer or DBA who writes the CLR need only be concerned about the language that they wrote the CLR object in. And if it is a third-party item such as a regex library they won't need to be cocerned about the developement language at all.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I was planning on staying out of this part of the argument, discussion, tirade, whatever, but then you have to remember that it's me.

    I'm a developer. To me SQL is just another language that I can use, right? That's a flame magnet. I know about "set based" and try to use that. As much as I can use SQL to cut down on the amount of rows my application code has to deal with is a good thing. I have put into practice many things the Joe and Jeff have taught here.

    We do most of our application code in VB Dot Net. It was pretty early on that we found out that Dot Net code runs just about the same without regard to the front end language. Yes there used to be execution efficiency differences between C++ and VB. Given competent coders on both sides C# vs VB races based on execution speeds mostly end in dead heats.

    Yes, maintaining code in multiple front end languages is a nightmare and should be avoided. It's not speed as the consideration though. To make my point please look up what the initials CLR stand for and what that means.

    Now here is what I propose: Use set based queries every chance that you can and use as efficient processing for the remainder until you figure out a better way.

    ATBCharles Kincaid

  • Charles Kincaid (1/19/2009)


    Given competent coders on both sides C# vs VB races based on execution speeds mostly end in dead heats.

    Hi Charles... thanks for joining in. Just curious about your statement above... do you mean the tests were done between CLR's and Cursors/While Loops for those races? Also, do you have any links for those races?

    Now here is what I propose: Use set based queries every chance that you can and use as efficient processing for the remainder until you figure out a better way.

    My only problem with that is... no one ever goes back to fix something that's "working" until it becomes a performance problem... and then, it's a nightmare that first rears its ugly head on high demand days like "month end". The difference between writing "working" code and "working, high performance, scalable" code is pretty small and I've found that most folks simply give up on SQL in general way too early.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Sorry, Jeff. I was not clear enough. My contention is that a C# CLR proc and a VB CLR proc would run the same as each other.

    You are right in the other regard. The temptation is to never revisit code that is working. We have, case in point, an application that runs real quick if there are 100 products in the product table and slow as anything with 15000 items in the product table. This is a handheld app where we did not have the luxury of saying "Don't sell this to a customer with that many products." We had to find a better way.

    Your point is to design it for 6 MILLION products right from the start. And I agree.

    ATBCharles Kincaid

  • The temptation is to never revisit code that is working.

    This is my big concern with agile development. "We will deal with it in the next itteration". The problem is that there is always something more pressing to do in the next itteration.

    It is only when there is a problem that someone realises is caused by building a huge edifice on sand that there is a move to fix it. By that time there are so many things that depend on it that changing it is all but impossible.

    Before anyone says anything, yes I know that you should separate out the layers, design stuff so that can be hidden behind interfaces etc, but the "leave it until the next itteration crowd" also left the design a decent interface to the next itteration.

  • Charles Kincaid (1/19/2009)


    Sorry, Jeff. I was not clear enough. My contention is that a C# CLR proc and a VB CLR proc would run the same as each other.

    You are right in the other regard. The temptation is to never revisit code that is working. We have, case in point, an application that runs real quick if there are 100 products in the product table and slow as anything with 15000 items in the product table. This is a handheld app where we did not have the luxury of saying "Don't sell this to a customer with that many products." We had to find a better way.

    Your point is to design it for 6 MILLION products right from the start. And I agree.

    Outstanding. Thanks, Charles.

    Just to continue, one of things being debated on this thread is the idea that CLR cursors are faster than T-SQL cursors... Ok, maybe that goes without saying... maybe not.

    My question would be, if you're interested in performance, why would you use either? The tired old saw about people not being able to think of set based solutions either because they can't, won't, or simply don't have enough time because of some insane schedule just doesn't hold any ground with me. If people are in the business of writing code, then they need to learn to use the tools of the trade, like T-SQL. Someone taking the time to write a CLR cursor to replace a T-SQL cursor just seems incredibly counter productive. Even if it's possible to "tune" either type of cursor for a 6 million row task, wouldn't it be better to find the [font="Arial Black"]correct [/font]set based solution that would blow the doors off of both? If you (not you, Charles... anyone) actually know your trade, it really doesn't take long at all and will frequently, dare I say "usually", take less time than writing a CLR.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • You are welcome. One has to have more tools in the belt in order to avoid "if all you have is a hammer, everything looks like a nail".

    I was watching an old John Wayne movie and saw a guy open a pack of smokes with a radial arm saw. Point? Any tool can be misused. Then there is the push for the better tool. The phone company, when there was just one, used wire wrap to mate cables to the frame. Hand breaking work spinning that tool 12 turns for every wire in a 50 pair cable. One of the dudes on one of my dad's crews took a broken wrap tool and chucked it in a "brace and bit". Then the phone company came up with a better frame connector using the punch down tool that connects and trims at the same time. The better tool wins!

    I'm an old card punch COBOL programmer. I used to TEACH that decrementing loops are faster than incrementing loops but the fastest way is to avoid the loop in the first place. I used to use string arrays to buffer lines from a text file when I had to check the checksum in the footer before processing all the lines. Now I use a Queue object as it does exactly what I want with less work.

    It's hard for some of us to give up the old ways. Just like you said, why change it if it works? Well then ask people if they stuck with "Bubble" after "Shell-Metzner" or "Q"?

    ATBCharles Kincaid

  • 1) some one said this "Your point is to design it for 6 MILLION products right from the start. And I agree. ". Meanwhile, while your group is flopping and twitching trying to do that (which may well be way beyond their skillset and abilities) another competitor has put out a product that is functional with 10K and have taken over the market place. "academic pursuits" do NOT make good business sense in MANY situations, especially in the software application market!

    2)

    Jeff Moden (1/19/2009)

    My question would be, if you're interested in performance, why would you use either? The tired old saw about people not being able to think of set based solutions either because they can't, won't, or simply don't have enough time because of some insane schedule just doesn't hold any ground with me. If people are in the business of writing code, then they need to learn to use the tools of the trade, like T-SQL. Someone taking the time to write a CLR cursor to replace a T-SQL cursor just seems incredibly counter productive. Even if it's possible to "tune" either type of cursor for a 6 million row task, wouldn't it be better to find the [font="Arial Black"]correct [/font]set based solution that would blow the doors off of both? If you (not you, Charles... anyone) actually know your trade, it really doesn't take long at all and will frequently, dare I say "usually", take less time than writing a CLR.

    Sorry Jeff, but you are completely disconnected from reality on this topic. And since I have tried umpteen times to get you to realize that I won't bother with it again. But you are just absolutely wrong here. People, even smart ones, cannot shift from procedural logic to set-based logic simply, easily nor quickly. Period. And because they still have real work to do and code to write and not enough time to do either they must proceed with what they know. Optimal, no - realistic, completely.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • >>Which regular expression system? grep(), sgrep(), xgrep(), POSIX, et al? Now, if I had Standard SQL there would be the SIMILAR TO predicate which is based on the POSIX specs and it would port to other SQLs and run inside the engine.

    But (ONCE AGAIN JOE) this is a SQL SERVER FORUM! I guarantee you that the developers for SQL Server are way more familiar with the .NET implementation of RegEx than any other 'flavor'. And they almost uniformly couldn't give a care in the world for either your standards nor portability. Speaking of flavors of regex, why is it that you think POSIX is "the one flavor to rule all other flavors"? Just because it is part of some syntax in some standard that isn't applicable to this forum's database server?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 166 through 180 (of 255 total)

You must be logged in to reply to this topic. Login to reply