Tally OH! An Improved SQL 8K “CSV Splitter” Function

  • First off, this was a great article followed by an awesome thread… great suggestions, dialog, debates and explanations. Even though Jeff was kind enough to update the article on 5/12/2011, it seems like there has been some good suggestions since then. Unfortunately, in the 53 pages of posts, I've lost track of the "latest version". Could someone point me to the best post? Or, is the 5/12/2011 update the best.

    Steve Pirazzi
    ONEWARE, Inc.
    http://www.ONEWARE.com

  • ErikEckhardt (2/22/2013)


    mister.magoo (2/22/2013)

    WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN

    COLLATE only needs to be on one side of the expression. Either side will do. You don't have to put it on both sides.

    Yes, you are right. I added it one time too many, but as it was there when I ran the tests, I thought it best to leave it. I am sure that Jeff would remove the extra one...

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • L' Eomot Inversé (2/22/2013)


    mister.magoo (2/22/2013)


    What bothered me was that feeling that "someone should have thought of this ages ago, shouldn't they"?

    Always makes me nervous when that is the case...

    That feeling was bothering me too - I can't understand why I didn't spot the opportunity the first time I looked at the code, let alone why Jeff or Paul or another of the top experts who contribute so much to SQLServerCentral didn't do it long ago.

    Looks as if you're the only one who's awake around here, Mr M. 😎

    IIRC, Jeff Moden already knows it. But I guess the conversation we had somewhere else was something like that he likes to use the default collation. (I may be wrong. So I sincerely apologize in advance in case I said anything/everything wrong).

    But in my case, the binary collated splitter version is already in production for comma delimeted strings 😉

  • L' Eomot Inversé (2/22/2013)


    mister.magoo (2/22/2013)


    What bothered me was that feeling that "someone should have thought of this ages ago, shouldn't they"?

    Always makes me nervous when that is the case...

    That feeling was bothering me too - I can't understand why I didn't spot the opportunity the first time I looked at the code, let alone why Jeff or Paul or another of the top experts who contribute so much to SQLServerCentral didn't do it long ago.

    Looks as if you're the only one who's awake around here, Mr M. 😎

    Actually, the answer is embarrasingly simple. I flat out forgot. :blush:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Usman Butt (2/22/2013)


    L' Eomot Inversé (2/22/2013)


    mister.magoo (2/22/2013)


    What bothered me was that feeling that "someone should have thought of this ages ago, shouldn't they"?

    Always makes me nervous when that is the case...

    That feeling was bothering me too - I can't understand why I didn't spot the opportunity the first time I looked at the code, let alone why Jeff or Paul or another of the top experts who contribute so much to SQLServerCentral didn't do it long ago.

    Looks as if you're the only one who's awake around here, Mr M. 😎

    IIRC, Jeff Moden already knows it. But I guess the conversation we had somewhere else was something like that he likes to use the default collation. (I may be wrong. So I sincerely apologize in advance in case I said anything/everything wrong).

    But in my case, the binary collated splitter version is already in production for comma delimeted strings 😉

    See, I knew someone must have done it already

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (2/22/2013)


    Usman Butt (2/22/2013)


    L' Eomot Inversé (2/22/2013)


    mister.magoo (2/22/2013)


    What bothered me was that feeling that "someone should have thought of this ages ago, shouldn't they"?

    Always makes me nervous when that is the case...

    That feeling was bothering me too - I can't understand why I didn't spot the opportunity the first time I looked at the code, let alone why Jeff or Paul or another of the top experts who contribute so much to SQLServerCentral didn't do it long ago.

    Looks as if you're the only one who's awake around here, Mr M. 😎

    IIRC, Jeff Moden already knows it. But I guess the conversation we had somewhere else was something like that he likes to use the default collation. (I may be wrong. So I sincerely apologize in advance in case I said anything/everything wrong).

    But in my case, the binary collated splitter version is already in production for comma delimeted strings 😉

    And to supplement my earlier post, please read posts from here onwards. I guess both of our top experts Jeff Moden and Paul White already knows it very well? :hehe:

  • Usman Butt (2/22/2013)


    And to supplement my earlier post, please read posts from here onwards. I guess both of our top experts Jeff Moden and Paul White already knows it very well? :hehe:

    Ouch. Less than 6 months ago and I already forgot!

    Tom

  • Using a binary comparisons is a good speedup for specific cases and good to be aware of and certainly worth mentioning in bold with its benefits. What it is not however is a good in a generic solution that people can copy paste and directly use without unexpected side-effects. Collations do not exist for nothing and most of the searches in a database will be case insensitve for valid reasons.

  • peter-757102 (2/26/2013)


    Using a binary comparisons is a good speedup for specific cases and good to be aware of and certainly worth mentioning in bold with its benefits. What it is not however is a good in a generic solution that people can copy paste and directly use without unexpected side-effects. Collations do not exist for nothing and most of the searches in a database will be case insensitve for valid reasons.

    The collation changes to DS8K suggested by Mister Magoo are all internal to the function and are not inherited by (or retained by) the output columns. Good point though - the output of multiple nested REPLACE accelerated by applying a binary collation will retain the new collation cast and could affect subsequent operations.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (2/26/2013)


    peter-757102 (2/26/2013)


    Using a binary comparisons is a good speedup for specific cases and good to be aware of and certainly worth mentioning in bold with its benefits. What it is not however is a good in a generic solution that people can copy paste and directly use without unexpected side-effects. Collations do not exist for nothing and most of the searches in a database will be case insensitve for valid reasons.

    The collation changes to DS8K suggested by Mister Magoo are all internal to the function and are not inherited by (or retained by) the output columns. Good point though - the output of multiple nested REPLACE accelerated by applying a binary collation will retain the new collation cast and could affect subsequent operations.

    Does this mean though, that internally the BIN collation sequence should be either CS or CI, depending on what the default is for the database?

    I always use Latin1_General_BIN and now I'm not sure if that is CS or CI! :w00t::hehe:


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (2/26/2013)


    ChrisM@Work (2/26/2013)


    peter-757102 (2/26/2013)


    Using a binary comparisons is a good speedup for specific cases and good to be aware of and certainly worth mentioning in bold with its benefits. What it is not however is a good in a generic solution that people can copy paste and directly use without unexpected side-effects. Collations do not exist for nothing and most of the searches in a database will be case insensitve for valid reasons.

    The collation changes to DS8K suggested by Mister Magoo are all internal to the function and are not inherited by (or retained by) the output columns. Good point though - the output of multiple nested REPLACE accelerated by applying a binary collation will retain the new collation cast and could affect subsequent operations.

    Does this mean though, that internally the BIN collation sequence should be either CS or CI, depending on what the default is for the database?

    I always use Latin1_General_BIN and now I'm not sure if that is CS or CI! :w00t::hehe:

    It's neither case nor accent sensitive but there's evidence here that it's smirk-sensitive:

    Latin1_General_BIN_:s

    Latin1_General_BIN_:]

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (2/26/2013)


    dwain.c (2/26/2013)


    ChrisM@Work (2/26/2013)


    peter-757102 (2/26/2013)


    Using a binary comparisons is a good speedup for specific cases and good to be aware of and certainly worth mentioning in bold with its benefits. What it is not however is a good in a generic solution that people can copy paste and directly use without unexpected side-effects. Collations do not exist for nothing and most of the searches in a database will be case insensitve for valid reasons.

    The collation changes to DS8K suggested by Mister Magoo are all internal to the function and are not inherited by (or retained by) the output columns. Good point though - the output of multiple nested REPLACE accelerated by applying a binary collation will retain the new collation cast and could affect subsequent operations.

    Does this mean though, that internally the BIN collation sequence should be either CS or CI, depending on what the default is for the database?

    I always use Latin1_General_BIN and now I'm not sure if that is CS or CI! :w00t::hehe:

    It's neither case nor accent sensitive but there's evidence here that it's smirk-sensitive:

    Latin1_General_BIN_:s

    Latin1_General_BIN_:]

    Surely you mean that Latin1_General_BIN is case sensitive (CS) and accent sensitive (AS) ?

    Binary collation will definitely find 'a' and 'A' to be different and 'á' to be different to 'a'.

    That is part of the reason it is quicker for CHARINDEX and equality testing - it doesn't have to compare 'a' to 'A','á','Á' etc... it is either 'a' or NOT 'a'.

    I do agree that it is smirk sensitive though :o) != :O)

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (2/26/2013)


    ChrisM@Work (2/26/2013)


    dwain.c (2/26/2013)


    ChrisM@Work (2/26/2013)


    peter-757102 (2/26/2013)


    Using a binary comparisons is a good speedup for specific cases and good to be aware of and certainly worth mentioning in bold with its benefits. What it is not however is a good in a generic solution that people can copy paste and directly use without unexpected side-effects. Collations do not exist for nothing and most of the searches in a database will be case insensitve for valid reasons.

    The collation changes to DS8K suggested by Mister Magoo are all internal to the function and are not inherited by (or retained by) the output columns. Good point though - the output of multiple nested REPLACE accelerated by applying a binary collation will retain the new collation cast and could affect subsequent operations.

    Does this mean though, that internally the BIN collation sequence should be either CS or CI, depending on what the default is for the database?

    I always use Latin1_General_BIN and now I'm not sure if that is CS or CI! :w00t::hehe:

    It's neither case nor accent sensitive but there's evidence here that it's smirk-sensitive:

    Latin1_General_BIN_:s

    Latin1_General_BIN_:]

    Surely you mean that Latin1_General_BIN is case sensitive (CS) and accent sensitive (AS) ?

    Binary collation will definitely find 'a' and 'A' to be different and 'á' to be different to 'a'.

    That is part of the reason it is quicker for CHARINDEX and equality testing - it doesn't have to compare 'a' to 'A','á','Á' etc... it is either 'a' or NOT 'a'.

    I do agree that it is smirk sensitive though :o) != :O)

    Oops! Smirk in me eyes 😉 yes of course. "binary" is a bit of a giveaway.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (2/26/2013)


    mister.magoo (2/26/2013)


    ChrisM@Work (2/26/2013)


    dwain.c (2/26/2013)


    ChrisM@Work (2/26/2013)


    peter-757102 (2/26/2013)


    Using a binary comparisons is a good speedup for specific cases and good to be aware of and certainly worth mentioning in bold with its benefits. What it is not however is a good in a generic solution that people can copy paste and directly use without unexpected side-effects. Collations do not exist for nothing and most of the searches in a database will be case insensitve for valid reasons.

    The collation changes to DS8K suggested by Mister Magoo are all internal to the function and are not inherited by (or retained by) the output columns. Good point though - the output of multiple nested REPLACE accelerated by applying a binary collation will retain the new collation cast and could affect subsequent operations.

    Does this mean though, that internally the BIN collation sequence should be either CS or CI, depending on what the default is for the database?

    I always use Latin1_General_BIN and now I'm not sure if that is CS or CI! :w00t::hehe:

    It's neither case nor accent sensitive but there's evidence here that it's smirk-sensitive:

    Latin1_General_BIN_:s

    Latin1_General_BIN_:]

    Surely you mean that Latin1_General_BIN is case sensitive (CS) and accent sensitive (AS) ?

    Binary collation will definitely find 'a' and 'A' to be different and 'á' to be different to 'a'.

    That is part of the reason it is quicker for CHARINDEX and equality testing - it doesn't have to compare 'a' to 'A','á','Á' etc... it is either 'a' or NOT 'a'.

    I do agree that it is smirk sensitive though :o) != :O)

    Oops! Smirk in me eyes 😉 yes of course. "binary" is a bit of a giveaway.

    Doh! (Homer Simpson moment) 😛


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (2/26/2013)


    ChrisM@Work (2/26/2013)


    mister.magoo (2/26/2013)


    ChrisM@Work (2/26/2013)


    dwain.c (2/26/2013)


    ChrisM@Work (2/26/2013)


    peter-757102 (2/26/2013)


    Using a binary comparisons is a good speedup for specific cases and good to be aware of and certainly worth mentioning in bold with its benefits. What it is not however is a good in a generic solution that people can copy paste and directly use without unexpected side-effects. Collations do not exist for nothing and most of the searches in a database will be case insensitve for valid reasons.

    The collation changes to DS8K suggested by Mister Magoo are all internal to the function and are not inherited by (or retained by) the output columns. Good point though - the output of multiple nested REPLACE accelerated by applying a binary collation will retain the new collation cast and could affect subsequent operations.

    Does this mean though, that internally the BIN collation sequence should be either CS or CI, depending on what the default is for the database?

    I always use Latin1_General_BIN and now I'm not sure if that is CS or CI! :w00t::hehe:

    It's neither case nor accent sensitive but there's evidence here that it's smirk-sensitive:

    Latin1_General_BIN_:s

    Latin1_General_BIN_:]

    Surely you mean that Latin1_General_BIN is case sensitive (CS) and accent sensitive (AS) ?

    Binary collation will definitely find 'a' and 'A' to be different and 'á' to be different to 'a'.

    That is part of the reason it is quicker for CHARINDEX and equality testing - it doesn't have to compare 'a' to 'A','á','Á' etc... it is either 'a' or NOT 'a'.

    I do agree that it is smirk sensitive though :o) != :O)

    Oops! Smirk in me eyes 😉 yes of course. "binary" is a bit of a giveaway.

    Doh! (Homer Simpson moment) 😛

    About once a week, mate. Now I'm an old git I can blame it on declining memory. When I was a sprightly youngster it was more embarrassing.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Viewing 15 posts - 526 through 540 (of 981 total)

    You must be logged in to reply to this topic. Login to reply