Hidden RBAR: Triangular Joins

  • TheSQLGuru (1/19/2009)


    Sorry Jeff, but you are completely disconnected from reality on this topic. And since I have tried umpteen times to get you to realize that I won't bother with it again. But you are just absolutely wrong here. People, even smart ones, cannot shift from procedural logic to set-based logic simply, easily nor quickly. Period. And because they still have real work to do and code to write and not enough time to do either they must proceed with what they know. Optimal, no - realistic, completely.

    No... I get you, Kevin... it's you that doesn't get it... you don't let someone weld unless they know how to weld. People who cannot make the shift from procedural logic to set-based logic shouldn't be in the business. Period.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Joe Celko (1/19/2009)


    >> But (ONCE AGAIN JOE) this is a SQL SERVER FORUM! I guarantee you that the developers for SQL Server are way more familiar with the .NET implementation of RegEx than any other 'flavor'. <<

    For most of the CHECK() constraints you will write in the real SQL world, the subset is core grep() without any of the other goodies -- repeated digits, alphas from a limited set of combination and a few punctuation marks to validate almost any standard industry code. The only bad news is that you have to replace | with a CASE expression for the more complex stuff in T-SQL.

    >> And they almost uniformly couldn't give a care in the world for either your standards nor portability. <<

    Like, I said, code monkeys, who believe that they will fall off the edge of the Earth if they leave their village. It is a terrible shock to find that Oracle and DB2 dominate the DB world and they are more Standards aware than T-SQL dialect.

    The DBAs, data architects, professional SQL programmers, etc have to watch out for the enterprise as a whole. They move data from other sources that are not Microsoft --gasp! They talk to the web which is built on LAMP platforms (no MS at all !). There is still more data in COBOL than any other programming language (see that issue of Dr. Dobbs last year? ).

    >> Speaking of flavors of regex, why is it that you think POSIX is "the one flavor to rule all other flavors"? Just because it is part of some syntax in some standard that isn't applicable to this forum's database server? <<

    Yeah, just because it is in those stinkin' ANSI/ISO Standards that keep getting in the way of cowboy coding. Damn! next you'll have to document your code and it is downhill from there 🙂

    On a more serious note, one solution I saw for some really complex regular expressions was a CLR call to a POSIX implementation, with the comment that when T-SQL is up to Standards the call can be replaced with a native SIMILAR TO predicate. Based on experience with DB2 (SQL/PSM function versus native predicate), the author estimated 1-2 orders of magnitude improvement in performance.

    I really have to laugh when I see people talking about the "real world". Big reason, it is different every where you go. Big thing I was taught while in school was to write structured code, yes even in COBOL. I get into the "real world" and the COBOL application I am helping support is truly some of the most horrid spaghetti code I have seen. No chance to rewrite it, we had to support it and extend the application. New subprocedures, yes, those were written using structured methods that made them easier to maintain and enhance. The older code, we ended up writing to the lowest common denominator.

    As for a standards, they are great and everyone has one. I work with Microsoft SQL Server, and guess what, I'll write code that best uses the database engine. If that code isn't 100% Standard SQL (ie 100% portable to and other RDBMs) oh well. That isn't what I am being paid to write or support. If we were to change to Oracle, or DB2, or Informix then I'll have to rewrite my code. I don't see us changing any time in the near or foreseeable future, and I really don't think we should approach what we do with the thought that "oh, this may change if we change systems so I better not use this or that feature of this product".

    You go ahead and write your 100% standard code, but until every product is 100% standard, I'm sure most professionals will also use system dependent features where it makes sense for the application they are writing/supporting.

    And by the way, I don't see that happening in the near future either.

  • Right there with you Lynn.

    To me ansi is the guideline, "dialect" will get used whenever ansi doesn't perform as it should.

    IMO that's where any DBMS will try to make a difference. As we all know, the big brands pretty much are capable of doing the same things, it is just that top x% that will make a difference, if you need performance, uptime, ... in the upper segment.

    If every db-schema were to be developped with the same accuracy, decent DA, good design statisics to get to a decent physical translation to the local rdbms, and applications were written with the same attitude, testing and QA cycles, ... overall performance and stability would be excelent, no matter the political stuff.

    It is hard to change an old habit, switching from VSAM , IMS to DB2, oracle, SQLServer, (a long time ago) has indeed been a long learning process,

    But is was worth every second and I'm still enjoying the fruits ....

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Jeff Moden (1/17/2009)


    ...CLR's are better at most RegEx and file handling. With those two exceptions (so far)....

    The other place I've found CLRs to be "better" is using CLR as opposed to OLE automation when a proc needs to access DLLs outside of SQL.

    The sp_OACreate, sp_OAMethod, et all, method of calling outside code is pretty inefficient, generally unstable, doesn't clean up after itself well, and is a pain to implement compared to CLR.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • TheSQLGuru (1/19/2009)


    1) some one said this "Your point is to design it for 6 MILLION products right from the start. And I agree. ". Meanwhile, while your group is flopping and twitching trying to do that (which may well be way beyond their skillset and abilities) another competitor has put out a product that is functional with 10K and have taken over the market place. "academic pursuits" do NOT make good business sense in MANY situations, especially in the software application market!

    2)

    Jeff Moden (1/19/2009)

    My question would be, if you're interested in performance, why would you use either? The tired old saw about people not being able to think of set based solutions either because they can't, won't, or simply don't have enough time because of some insane schedule just doesn't hold any ground with me. If people are in the business of writing code, then they need to learn to use the tools of the trade, like T-SQL. Someone taking the time to write a CLR cursor to replace a T-SQL cursor just seems incredibly counter productive. Even if it's possible to "tune" either type of cursor for a 6 million row task, wouldn't it be better to find the [font="Arial Black"]correct [/font]set based solution that would blow the doors off of both? If you (not you, Charles... anyone) actually know your trade, it really doesn't take long at all and will frequently, dare I say "usually", take less time than writing a CLR.

    Sorry Jeff, but you are completely disconnected from reality on this topic. And since I have tried umpteen times to get you to realize that I won't bother with it again. But you are just absolutely wrong here. People, even smart ones, cannot shift from procedural logic to set-based logic simply, easily nor quickly. Period. And because they still have real work to do and code to write and not enough time to do either they must proceed with what they know. Optimal, no - realistic, completely.

    This is, simply put, just an ad hominem attack.

    I've had to put in more late evenings and weekends and skip more lunches fixing code that was written based on your philosophy than I care to count.

    My "favorite" example was a hierarchy crawl that used nested cursors in a recursive multi-table UDF. It was tested on 5 rows of data and then taken into production. Less than a week after the site it was used for went live, it's AVERAGE run-time was up to over 8 minutes!

    I missed lunch and replaced it with a single recursive CTE, and took it down to a max run time of 11 seconds, average run time of 2 seconds. Then, over the following month, while working on other projects, I replaced it with a hybrid version of Joe Celko's nested sets hierarchy, and took max run time down to under 100 milliseconds and average run time down to 14 milliseconds.

    If it had been built correctly in the first place, it would have prevented emergencies that had the potential of chasing customers away! And building it correctly in the first place would have taken less than a day, but the refactor-after-the-fact took a month to complete.

    So, please avoid the ad hominem. If you have an example, from your "real world", where customers are perfectly happy with web pages that don't work, where poorly researched, badly written, unscalable code was better than doing it right the first time, please provide it.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I agree, in parts, with everybody here. Yes, I have fixed badly designed code. Some of it was my own. ("What was I thinking when I did this?") I have learned a lot, but I'm not done yet.

    I don't like switching tools. (As an aside I was just forced to switch to Firefox from IE 7 because IE crashes on certain pages. It's just my IE and our geeks will fix it someday.) But when a better one becomes available its good to use it if it saves time or performs better.

    I've been pushing to get expected data sizes into the project specifications. Watch the sneaky. "We sold you an application that is specked for 100 products and 5 customers per route driver. Your business has grown. Congratulations! You now need to handle 6000 products, 100 customers per driver, and 45 new routes. Each route driver needs a handheld computer and each truck needs a hardened PC wired into the vehicle bus. Our hardware folks will be calling you but lets talk about your software upgrade." :w00t:

    I'd much rather say, "Congratulations! Our hardware folks will be calling you but your software is ready to handle the additional load right now. There are per user licenses so if you can get me the PO for those today I can have them installed overnight." Not only do I win right then but the next time they need some business problem solved they think of us first.

    Look handheld LASER barcode scanners are great. Go play with a Lorax sometime. I can scan the soda can in my hand and then scan the barcode over the loading dock door 30 feet (10 meters) away. 😎 But then there are newer CCD imagers that can do the same thing, and read my 2D code on my UPS, and take a picture of the damaged box that came off the truck (and in color), and does not have a LASER so no worries about eye safety.

    It's the same in SQL. 2005 is better than 2000 (and I LOVE 2000). 2008 may have better reporting tools than 2005. Set based design is better than any row by row.

    Then there are realities. I have a big customer that right now can't upgrade from 2000. So new data types, CTE's, and better reporting tools just don't matter. I still have to solve their problems. I just have to make sure that I'm not the road block.

    ATBCharles Kincaid

  • So, please avoid the ad hominem. If you have an example, from your "real world", where customers are perfectly happy with web pages that don't work, where poorly researched, badly written, unscalable code was better than doing it right the first time, please provide it.

    Who said anything about 'happy customers", optimal processes/code/design, not having lots of extra work to do on maintenance, not having to deal with performance issues, etc, etc?? I sure has heck didn't (or didn't mean to) and IIRC I did mention stuff being NOT optimal. Never even said that devs "should" write bad code. Never said that it wouldn't cause issues, including loss of customers, late nights, etc.

    My point, and it goes directly to what Jeff said about programmers who can't or won't learn set-based capabilities as shouldn't even be working or some such, is that real world coders do and always will write suboptimal code, suboptimal designs, etc. That truly is reality in every company I have consulted at and every dev shop I have ever heard about in any detail. Helping them very simply improve their really bad stuff to be less bad stuff but still be something they can comprehend/design/write in a timely fashion so they can accomplish their objectives is an important goal.

    I think it is foolhardy, unrealistic, inappropriate, misguided, etc, to state that people who can't do things 'right' or 'best' shouldn't be working. And that (nor anything else I have said - such as helping them improve their capabilities yet not forcing them to shift to the 'optimal' mechanism) is NOT an "ad hominem" attack. It is completely logical and rational and directly to the point of the argument.

    Clearly "what we have here is a failure to communicate". 🙂 You, Jeff and others just are not understanding that I am NOT espousing that it wouldn't be best for all concerned if all developers could design optimal stuff and write optimal code. Nor am I saying that it wouldn't be better if they LEARNED to do that if they don't already know. I am simply stating that the "real world" has innumeral people who, again for many different reasons, don't and will never do those things. But if they are GOING to write a cursor because that is all they know then knowing that fast_forward is faster than dynamic or that CLR (if they work in that) is faster than TSQL in some cases can make an improvement in their abilities that is EASILY ACHIEVABLE.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Ah! Much clearer. Yet the contention of some is that you are teaching them to ride a better horse rather than introducing them to a car. OK, the analogy sucks because of the pollution aspect so let's not go there. I just can't, at the moment, come up with a better one.

    I'll give you credit for making better cursors and exploring the process of wringing the last ounce of performance out of concept X (what ever that might be).

    ATBCharles Kincaid

  • Charles Kincaid (1/20/2009)


    Ah! Much clearer. Yet the contention of some is that you are teaching them to ride a better horse rather than introducing them to a car. OK, the analogy sucks because of the pollution aspect so let's not go there. I just can't, at the moment, come up with a better one.

    I'll give you credit for making better cursors and exploring the process of wringing the last ounce of performance out of concept X (what ever that might be).

    Actually, your analogy works just about perfectly. Horses produce more pollution per person-mile than cars do. (The only advantage horses have over cars in this regard is that horse factories produce less pollution than car factories + foundries + mines + transporting car parts + lead/acid batteries + etc. The horse makes more pollution than the car, but making a new horse is very eco-friendly!)

    Really, though, I think a better one is he's teaching people to use a better pogo stick. You're bouncing higher/further, but you still get there slowly, and have a horrible tendency to break your own neck, and heaven help you if you have to go up any stairs!

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I understand the argument about not developing for 6 million products because getting a product to market, even with diminished functionality, will allow you to grab the high ground.

    I disagree with it, but I understand it. My job is now to clean up the results of such an approach. I thought I had seen every possible example of bad database design/code but I've come across a few new ones, and that is after 20 years in the business. Recursive procedure calls where the procedure calls itself within a cursor?:w00t: The SQL Injection proc from hell, object names with mathematical symbols in them, rampant denormalisation, you name it, it is there to be cleaned up.

    Microsoft wasn't the first to develop......well just about anything really.

    First spreadsheet? Hell even Lotus didn't do that.

    First word processor? Wasn't that Wordstar or Word Perfect or maybe an ancestor?

    What Microsoft focussed on was how people use their products. Under the hood their products may not have been as technically good as competitors products but I can remember thinking that Microsoft Multiplan and Microsoft Chart were far easier to use than Lotus 123 and CA Supercalc.

    If you are going to take the chuck the 10K product version out of the door you better make sure the 6 million product version is going to be ready very shortly.

  • Here's the problem with the "rush to get the thing out the door any old way" philosophy on software: How would you feel if you found out you were flying on a plane where Boeing held to that policy? How about a car built that way? How about if you bought a house where the contractor said, "well, lots of people have trouble understanding how to plumb walls, so we just eyeball them so we can build more houses faster", with the understanding that, if/when your house collapses with you in it, they'll come along and rebuild the necessary walls, and version 3.0 of your house will have insulation on the electrical wiring!

    But, you might reply, those are things that people rely on for their very lives! My piece of commercial software is so much less important than that! If that's the excuse, then how would you feel if the bank your paycheck gets direct deposited to suddenly informed you that, due to software that was built by semi-skilled people who were in a rush, your paychecks for the past year all accidentally included automatic donations to the National Luddite and Misanthropy Association?

    The software isn't even important enough to produce results that have something to do with money? Anyone's money at all? Then, by definition, the software doesn't matter, probably shouldn't have been built by a business, and, accordingly, can afford excuses like that.

    Yes, faulty software is a reality. So is plague. Efforts to make plague less painful are only valuable if you don't have a cure. We have a cure for faulty software, which is: Educate the people who are building it. Accepting lower standards, even tacitly, is a definite statement that you don't want that job.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • It is a shame that anyone would cut such corners in project planning and development. Yes you may need to get the software out the door, but I watched as the company I work for spent over a million dollars on consulting because of this type of short sighted coding with a 10,000 row result set and then everything either failed in production or took days to run.

    Seems like there is never time in the project to do things right, but there is always time to do it over...

  • Another point that occurs to me on this: Who's going to do better in the market, the guy who gets buggy software out the door first, and takes years to fix it, or the guy who gets scalable software out the door second by a few weeks?

    Depending on the level of bugs in the rushed-to-production software, you could very well be the best marketing your competitor has. "He did it first, we did it right. 10% discount for anyone switching to our software this month."

    Of course, the counter-example could be brought up of Windows vs Mac. Microsoft did it second, and did it more buggy, but had better marketing. Unless you can count on that, I'd go with better over more rushed most times.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Joe Celko (1/20/2009)


    Bad example. Ever read about what New York City was like before automobiles? How many TONS of house shit were carted off the streets every day to farms in New Jersey? The disease rate? Medical and emergency response time and materials?

    Actually, it is a good example. If you read the post following, you'd also note that the horse made MORE pollution than the car, but that MAKING a horse made less than making a car.

  • GSquared, I think you are off-base on many fronts.

    1) Again, just because someone can't or won't design/write optimal software does not mean he/she should not be working in development. You and Jeff agree on that, I very very strongly disagree and that will just have to be that I suppose. At a minimum I believe that far less software would be developed if your point were reality.

    2) Many mentions in the last few posts have spoken about BUGGY software being put out quickly then losing in the long run. I NEVER said to put out software that did the WRONG thing - just do the right thing SUBOPTIMALLY or less efficiently than is optimum. There is HUGE difference between the two. For my point about suboptimal code you only need to look at SAP, Lawson, Peoplesoft, etc. Even the relatively new Biztalk is a dog from a database perspective. Actually Sharepoint is probably the best example of all. That system is a frickin' nightmare on the database and processing logic side up one side and down the other (as are the ERP packages from what I have seen/heard of them - RBAR, missing indexes, inefficient data access and data types, yada yada yada). Yet Sharepoint has and continues to garner significant marketshare in a very short span of time as far as enterprise servers go (IMHO).

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 181 through 195 (of 255 total)

You must be logged in to reply to this topic. Login to reply