Performance Tuning: Concatenation Functions and Some Tuning Myths

  • Martin Bastable (1/2/2008)


    ...

    We contacted the suppliers and talked to a developer.

    ....Yes we use count(*) so our system can check if a table exists with a particular copy of our software or not

    Im sure you can think of your own series of polite questions you might ask a supplier regarding the above, suffice to say at this point we had some words with them, and await their next update with great interest 🙂

    ...

    martin

    Not to mention the locking issues and the fact that if the table doesn't exist they're unwinding the stack on client machines dozens of times per minute to handle exceptions. Sad part is how easy that would be to fix with some simple catalog view queries or even the EXISTS clause. They're lucky they had you on the other end of the phone - I don't think I could have maintained the politeness level 🙂

  • It's always fun to watch what 3rd party software does in SQL Profiler 🙂 I particularly like some software that clearly uses string concatenation to build SQL statements in their client-side code when date columns such as Last_Updated (stored as varchars in the DB) do not contain the last updated date, but instead sometimes contain the text 'Last_Updated'. Makes it really fun to try to read values from their tables on an incremental basis!!

    This was a great article to read - well written, not preachy and didn't contain factual errors. I suspect like many others on here the reason to read it was because it was written by Jeff - keep the articles coming! 😀

    I agree with the earlier poster about distinct being more obvious in its meaning, but if you have some computed columns that you know will be distinct you really need to help the optimiser by using the group by expression instead to let it know that you know best. I'm aware that you can create functions that are non-deterministic (although you shouldn't) and even create a function that does indeed have a side effect (using extended procs to do nasty things - there's an article someone on here about it) but, if UDFs are supposed to be deterministic given their inputs then you'd think the optimiser would take this into account. Anyone have ideas why this isn't the case? I've been aware of the issue and have used group by myself, but I haven't seen a reason why SQL Server doesn't use the deterministic nature in these cases.... Maybe it is tricky to implement for MS? Anyone able to try Jeff's query on SQL 2008?

  • sushila (1/2/2008)


    Jeff,

    What an absolutely well-written and informative article - I totally agree with the earlier post that remarked on this being a great way to start the new year. I seldom have the time or opportunity to visit this site anymore but specially made time to read this because I've always associated the author's name with exceptional quality and it turned out to be time incredibly well spent.

    I've always said that having something to say is as important as how well you say it - your article has both content and style - obviously your communication skills match your performance-tuning skills!:)

    Keep them coming Jeff!

    Gosh, Sushila, your words are incredibly kind. :blush: Thank you so much for your thoughtful and encouraging comments.

    We miss you around here, a LOT!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Michael Skinner (1/2/2008)


    Great article.

    What drives me insane about issues like this one (and I have had to create comma delimited files far too often through SQL) is that you never know what is going to be the best way of really doing it anyway. What I mean by that is when I pull this out of a function and then wrap it up into a cursor (the dreaded cursor) it runs virtually instantly when doing the 10,000/25 rows where as when running using the function, it takes over a minute as you point out. This held true on both SQL 2000 and SQL 2005. And doing this same thing with the 1,000,000/2,500 rows it takes about 10 seconds if you simply add an index on SomeID (my machine is 3.0 Ghz, 2GB RAM). It is far less readable and clearly not as nice as wrapping it up in a function, but it is faster and this makes very little sense given the fact that the code is essentially the same.

    Michael, thanks for the cursor example and the compliment... I'll give it a try on my poor ol' 1.8Ghz 1GB Ram box (same one I did the testing for the article on).

    Heh... you might want to throw a "FAST FORWARD" on that cursor... 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • JJ B (1/2/2008)


    Nice, easy-read article. It even introduced me to a new function (CheckSum), which I looked up when reading your article. I particularly appreciated the code comments. I'm a big believer in documentation. Happy New Year. - JJ

    Thanks for the Kudo, JJ. I believe it was Matt Miller who introduced me to using CheckSum to convert the NewID instead of converting it to VarBinary and then to Int as a seed for Rand. Seems to run quite a bit faster for these random number generators. Of course, as you found out in your reading, CheckSum was designed for something totally different.

    I really appreciate the comments on the embedded documentation in the code... I've actually included those types of comments in the "non-optional" section of the SQL Guidelines I wrote for work... took a couple of years to get around to a lot of the legacy code but, since then, the research time for modifying "large" stored procedures has dropped from 2 days to less than an hour in most cases. Time to research smaller procs and functions is practically non-existant anymore. Once something like this is done, the Developers really make a turn around because they're able to get ahead of the game because of the documentation. They also see that it really doesn't take much time to document the code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • AlexT (1/2/2008)


    Nice article.

    Just another way to do this in sql 2005

    with h(SomeID) as (SELECT DISTINCT SomeID from dbo.TestData )

    select h.SomeID, dbo.fnConcatTest(h.SomeID) from h

    Cool... another bullet for the gunbelt. Thanks for the code and the compliment, Alex!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Eric Stimpson (1/2/2008)


    The actual performance tuning principle that this article touches on is the difference between distinct and group by constructs....

    My message would be to challenge any use of distinct in SQL the same way you might challenge a goto in procedural code. And secondly, teach the use of group by to as many SQL programmers as possible!

    Exactly and wonderfully stated. I'll even go one step further... teach as many SQL programmers not only how to write good set based code, but anticipate what's going to go on "behind the scenes". Not knowing simple things, such as the basic differences you pointed out between Distinct and Group By, can really cause some big time performance issues on code that still "looks" set based.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Eric Stimpson (1/2/2008)


    I think that when learning SQL, it is really important to map each SQL expression to an English (or your native language) equivalent so that you learn to read SQL code like natural language. Then it is usually easy to identify the proper solution. For example

    SELECT DISTINCT SomeID, dbo.fnConcatTest(SomeID) AS CSVString

    FROM dbo.TestData

    is like saying, "Using data from the table TestData, for every row get the SomeID and evaluate the function fnConcatTest on SomeID, then return the distinct results". Where as

    SELECT SomeID, dbo.fnConcatTest(SomeID) AS CSVString

    FROM dbo.TestData

    GROUP BY SomeID

    is like saying "Using the data from the table TestData, summarize by SomeID, then return the summary of SomeID with the value of the function fnConcatTest for each summary row."

    Outstanding explanation... I call it the "Fire Fox" syndrome... "To fly this plane, Mr. Gant, you must think in Russian."

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Ian Yates (1/2/2008)


    This was a great article to read - well written, not preachy and didn't contain factual errors. I suspect like many others on here the reason to read it was because it was written by Jeff - keep the articles coming! 😀

    Dang, Ian... what an awesome compliment! :blush: Thank you! I'll do my best!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hi there Wayne!

    Unfortunatly I didnt get to speak with their developer directly, so I indirectly passed on suggestions such as `checking for table in sysobjects` and the ever helpful `why don't you just check for its existance once, and e.g. set a flag if necessary`.

    I can picture them their end ... `but we would have to rewrite everything` 😉

    hee hee

    Then charge us loads for the next upgrade to the product with its new `speed optimisations` I guess 🙂

    Jeff:

    I loved your comments on embedded documentation. I do that all the time, I find it helps me loads! (And other developers here when needs be). Even though I get some ear ache from developer friends who strongly belive in not commenting, for various reasons 🙂

    Martin

  • Martin Bastable (1/3/2008)


    Jeff:

    I loved your comments on embedded documentation. I do that all the time, I find it helps me loads! (And other developers here when needs be). Even though I get some ear ache from developer friends who strongly belive in not commenting, for various reasons 🙂

    Martin

    Thanks for the feedback, Martin... yeah... I just wish more folks felt like we do... life would be a lot easier in the code world. And, yeah, I've got "friends" like that, too. Someday they'll learn... 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Nice article, and a nice informal tone.

    I am going to argue one point though. You say

    Even the estimated and actual execution plans lie! Both say that we're only processing (touching) 10,000 rows.

    Actually, the exec plan isn't lying. The 10 000 rows count is for the initial index scan on the TestData table to get the 10 000 SomeIDs from TestData. Not for the total number of rows touched in total for the query.

    There are only 10 000 rows flowing from one operator to another in the plan. The problem with the exec plan is that the compute scalar operator doesn't mention that it is also doing reads, and is reading around 400 rows for each row of the 10 000 that it processes.

    I'm curious to know if there's some mention of that in the xml plan on 2005. I've noticed before that there's lots of info in the xml plan that isn't displayed in the graphical rendition. Will test tomorrow cause management studio is foobar on my home machine.

    What does SET Statistics IO on show? My guess is somewhere around 10 001 scans of the table. That should be a red flag for anyone trying to performance tune a bad concatenation query. As should the indicator showing that 10 000 rows are going into the compute scalar operator.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Gail,

    I guess my point was that the execution plans aren't always going to show the number of rows touched and that you cannot always determine which of two or more code snippets will prove to be the most effective just by comparing execution plans.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (1/3/2008)


    Hi Gail,

    I guess my point was that the execution plans aren't always going to show the number of rows touched

    No, they won't. At best an exec plan shows the number of rows that each operator accepted, processed and returned, but that's the number of rows in the result set at that point in the query processing, not the number touched by the operator to get that result set.

    and that you cannot always determine which of two or more code snippets will prove to be the most effective just by comparing execution plans.

    That's a fact. Only way to determine which piece of code is more efficient is to run it. Exec plan, and stats IO help, nothing more.

    Edit: Clarifying a bit.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • No, they won't. At best an exec plan shows the number of rows that each operator accepted, processed and returned, but that's the number of rows in the result set, not the number touched to get that result set.

    No... my turn to disagree... execution plan showed 10,000 rows... there were only 25 in the result set...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 31 through 45 (of 82 total)

You must be logged in to reply to this topic. Login to reply