Tally OH! An Improved SQL 8K “CSV Splitter” Function

  • Jason-368451 (8/6/2014)


    I have a CSV file that I bulk loaded into a single column staging table.

    I need a way to parse proper csv files that are comma delimited with optional quote text qualifiers

    as the data may have embedded commas.

    i.e. Jason,"900 N. May ST., #5"

    using the Jeff's splitter it would get split into 3 columns i.e. Jason as column 1 900 N. May ST. as column 2 and #5 as column 3

    I need it to keep the address together when there are quotes text qualifiers .

    I was able to get the function from this article to work on my dev machine "Reaping the benefits of the Window functions in T-SQL"

    http://qa.sqlservercentral.com/articles/SQL+Server+2012/106903

    But I need to deploy this to a server running 2008 and that function uses syntax that is only valid in 2012.

    I would appreciate any help.

    Have you looked at Phil's article, The TSQL of CSV: Comma-Delimited of Errors[/url]? He explores some alternatives there.

    I haven't created a 2K8 version of the CSV parsing code in the article as it uses a running total for marking delimiters active/inactive, quite inefficient without the window functions.

    😎

  • Jason-368451 (8/6/2014)


    I have a CSV file that I bulk loaded into a single column staging table.

    I need a way to parse proper csv files that are comma delimited with optional quote text qualifiers

    as the data may have embedded commas.

    i.e. Jason,"900 N. May ST., #5"

    using the Jeff's splitter it would get split into 3 columns i.e. Jason as column 1 900 N. May ST. as column 2 and #5 as column 3

    I need it to keep the address together when there are quotes text qualifiers .

    I was able to get the function from this article to work on my dev machine "Reaping the benefits of the Window functions in T-SQL"

    http://qa.sqlservercentral.com/articles/SQL+Server+2012/106903

    But I need to deploy this to a server running 2008 and that function uses syntax that is only valid in 2012.

    I would appreciate any help.

    The "easy" way I handle this at work is to run it through Import-CSV and pipe it to Export-CSV to another file and use BCP with a format file to do the imports based on the compound delimiters of "," and the leading and trailing quote. It's nasty fast.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jason-368451 (8/6/2014)


    I have a CSV file that I bulk loaded into a single column staging table.

    I need a way to parse proper csv files that are comma delimited with optional quote text qualifiers

    as the data may have embedded commas.

    i.e. Jason,"900 N. May ST., #5"

    using the Jeff's splitter it would get split into 3 columns i.e. Jason as column 1 900 N. May ST. as column 2 and #5 as column 3

    I need it to keep the address together when there are quotes text qualifiers .

    I was able to get the function from this article to work on my dev machine "Reaping the benefits of the Window functions in T-SQL"

    http://qa.sqlservercentral.com/articles/SQL+Server+2012/106903

    But I need to deploy this to a server running 2008 and that function uses syntax that is only valid in 2012.

    I would appreciate any help.

    Hey there. Another option is to use SQL CLR. I wrote a library called SQL#[/url] that contains a Stored Proc named SQL#.String_SplitResultIntoFields. This proc takes a single field from a result set and splits it. The split expression can accept Regular Expressions :-). Meaning, you can follow the example here and split your imported data in a way that accounts for both embedded commas and embedded text-qualifiers. The only thing that RegEx expression doesn't do is remove the text qualifiers from the field, but it might be possible to update the RegEx to do that. To be fair, this proc is in the Full version, not the Free version, but it does handle your situation quite cleanly. There is another proc called File_SplitIntoFields which reads directly from the import file so you can bypass BCP and its format files, but that option does not handle embedded newlines (but that might not be a problem anyway given those would probably break your existing import).

    Take care,

    Solomon..

    EDIT:

    The condensed RegEx from the link noted above is:

    (?x)\s*,\s*(?=(?:[^"]*"[^"]*")*[^"]*$)

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • I'm using the splitter (just downloaded the zip file) and noticed something odd. Dunno if its just me, but check this out:

    select * from dbo.[DelimitedSplit8K](' ', ' ') split -- input is one space, delimiter is one space

    yields:

    ItemNumber Item

    1

    2

    So, I get two rows back from the splitter when I only expect one. What would need to change to handle this situation correctly? Or, is there a better way to split when the delimiter is a blank?

    Oh, fudge! I get it. It's two null strings separated by a blank.

    My bad!!

  • gbritton1 (9/3/2014)


    I'm using the splitter (just downloaded the zip file) and noticed something odd. Dunno if its just me, but check this out:

    select * from dbo.[DelimitedSplit8K](' ', ' ') split -- input is one space, delimiter is one space

    yields:

    ItemNumber Item

    1

    2

    So, I get two rows back from the splitter when I only expect one. What would need to change to handle this situation correctly? Or, is there a better way to split when the delimiter is a blank?

    It's two 0-length values separated by a blank. You will always get the number of delimiters+1.

    DECLARE @txt1 varchar(100) = ',,',

    @txt2 varchar(10) = ' ',

    @txt3 varchar(10) = ' ';

    select ItemNumber, Item, item_length = LEN(item)

    from dbo.[DelimitedSplit8K](@txt1, ',') split; -- this will get you three blanks

    select ItemNumber, Item, item_length = LEN(item)

    from dbo.[DelimitedSplit8K](@txt2, ' ') split; -- this will get you 2 blanks

    select ItemNumber, Item, item_length = LEN(item)

    from dbo.[DelimitedSplit8K](@txt3, ' ') split; -- this will get you 2 3 blanks

    Edit: noticed your edit after posting my reply. :w00t: Also fixed a type in my code comments.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • gbritton1 (9/3/2014)


    I'm using the splitter (just downloaded the zip file) and noticed something odd. Dunno if its just me, but check this out:

    select * from dbo.[DelimitedSplit8K](' ', ' ') split -- input is one space, delimiter is one space

    yields:

    ItemNumber Item

    1

    2

    So, I get two rows back from the splitter when I only expect one. What would need to change to handle this situation correctly? Or, is there a better way to split when the delimiter is a blank?

    Oh, fudge! I get it. It's two null strings separated by a blank.

    My bad!!

    Z'actly!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (9/3/2014)


    gbritton1 (9/3/2014)


    I'm using the splitter (just downloaded the zip file) and noticed something odd. Dunno if its just me, but check this out:

    select * from dbo.[DelimitedSplit8K](' ', ' ') split -- input is one space, delimiter is one space

    yields:

    ItemNumber Item

    1

    2

    So, I get two rows back from the splitter when I only expect one. What would need to change to handle this situation correctly? Or, is there a better way to split when the delimiter is a blank?

    Oh, fudge! I get it. It's two null strings separated by a blank.

    My bad!!

    Z'actly!

    Great Moden's Beard! What an epiphany!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • gbritton1 (9/3/2014)


    I'm using the splitter (just downloaded the zip file) and noticed something odd. Dunno if its just me, but check this out:

    select * from dbo.[DelimitedSplit8K](' ', ' ') split -- input is one space, delimiter is one space

    yields:

    ItemNumber Item

    1

    2

    So, I get two rows back from the splitter when I only expect one. What would need to change to handle this situation correctly? Or, is there a better way to split when the delimiter is a blank?

    Oh, fudge! I get it. It's two null strings separated by a blank.

    My bad!!

    Not entirely your bad as the function doesn't return the implied NULL values, one can argue that the result set should be

    ItemNumber Item

    1 NULL

    2 NULL

    😎

  • I guess I'd have to disagree. Nulls mean "Unknown" and it is definitely known that there are no values on either side of the space delimiter so, IMHO, empty strings are the proper thing to return. I also respect disagreement on such a controversial subject so if you want NULLs, just wrap the ITEM being returned from the function in an ISNULL.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (9/3/2014)


    I guess I'd have to disagree. Nulls mean "Unknown" and it is definitely known that there are no values on either side of the space delimiter so, IMHO, empty strings are the proper thing to return. I also respect disagreement on such a controversial subject so if you want NULLs, just wrap the ITEM being returned from the function in an ISNULL.

    I assume you meant something like this?

    SELECT ItemNumber, Item, NULLIF(Item, '')

    FROM dbo.DelimitedSplit8K(' ', ' ');


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Jeff Moden (9/3/2014)


    I guess I'd have to disagree. Nulls mean "Unknown" and it is definitely known that there are no values on either side of the space delimiter so, IMHO, empty strings are the proper thing to return. I also respect disagreement on such a controversial subject so if you want NULLs, just wrap the ITEM being returned from the function in an ISNULL.

    I guess I'd have to agree with Jeff's disagreement, but probably disagree with any need to respect disagreement as it shouldn't be a controversial subject ;-).

    First, if [space] is the delimiter and it just happens to be the only character in the string to be split, that is no different than splitting a string of ',' with a delimiter of ','; it is a set of two empty strings. A single empty string would not have any delimiter present in the string to split. And a NULL string to split would not return any rows at all (i.e. an empty result set).

    Second, this shouldn't be controversial since a NULL concatenated into a non-NULL string will (well, should) result in a NULL. This behavior can be changed via SET CONCAT_NULL_YIELDS_NULL OFF, but the MSDN documentation even says that in a future version it will always be ON. Basically, NULL does not have a string representation since an empty string is a valid string with a length of 0.

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky (9/4/2014)


    Jeff Moden (9/3/2014)


    I guess I'd have to disagree. Nulls mean "Unknown" and it is definitely known that there are no values on either side of the space delimiter so, IMHO, empty strings are the proper thing to return. I also respect disagreement on such a controversial subject so if you want NULLs, just wrap the ITEM being returned from the function in an ISNULL.

    I guess I'd have to agree with Jeff's disagreement, but probably disagree with any need to respect disagreement as it shouldn't be a controversial subject ;-).

    First, if [space] is the delimiter and it just happens to be the only character in the string to be split, that is no different than splitting a string of ',' with a delimiter of ','; it is a set of two empty strings. A single empty string would not have any delimiter present in the string to split. And a NULL string to split would return a NULL.

    Second, this shouldn't be controversial since a NULL concatenated into a non-NULL string will (well, should) result in a NULL. This behavior can be changed via SET CONCAT_NULL_YIELDS_NULL OFF, but the MSDN documentation even says that in a future version it will always be ON. Basically, NULL does not have a string representation since an empty string is a valid string with a length of 0.

    Take care,

    Solomon...

    Unless you happen to also work with Oracle. At this time Oracle treats an empty string as null. Messed me up for awhile until I read that in the documentation. It does say that this is subject to change in future versions of the product. This was Oracle 10 and 11.

  • Solomon Rutzky (9/4/2014)


    Jeff Moden (9/3/2014)


    I guess I'd have to disagree. Nulls mean "Unknown" and it is definitely known that there are no values on either side of the space delimiter so, IMHO, empty strings are the proper thing to return. I also respect disagreement on such a controversial subject so if you want NULLs, just wrap the ITEM being returned from the function in an ISNULL.

    I guess I'd have to agree with Jeff's disagreement, but probably disagree with any need to respect disagreement as it shouldn't be a controversial subject ;-).

    First, if [space] is the delimiter and it just happens to be the only character in the string to be split, that is no different than splitting a string of ',' with a delimiter of ','; it is a set of two empty strings. A single empty string would not have any delimiter present in the string to split. And a NULL string to split would return a NULL.

    Second, this shouldn't be controversial since a NULL concatenated into a non-NULL string will (well, should) result in a NULL. This behavior can be changed via SET CONCAT_NULL_YIELDS_NULL OFF, but the MSDN documentation even says that in a future version it will always be ON. Basically, NULL does not have a string representation since an empty string is a valid string with a length of 0.

    Take care,

    Solomon...

    Lets step back here, the function is used for presenting a delimited string data in a SQL Server tabular format, and the whole of the SQL Server unknown/missing logic (although somewhat thin as it does not differ between unknown, missing, no respond, empty, not applicable etc.) depends on NULLs. Now just think how much is going to fail down the line if suddenly NULLs become empty strings. It is no longer an unknown value but an empty value. This has nothing to do with the general NULL agreement/disagreement, it has everything to do with the logic we normally use. Bottom line is that if the value is not there it is truly unknown and in T-SQL or any SQL for that matter that is a NULL, only NULL and nothing (unintended pun here) but NULL.

    😎

  • Lynn Pettis (9/4/2014)


    Solomon Rutzky (9/4/2014)


    Second, this shouldn't be controversial since a NULL concatenated into a non-NULL string will (well, should) result in a NULL. This behavior can be changed via SET CONCAT_NULL_YIELDS_NULL OFF, but the MSDN documentation even says that in a future version it will always be ON. Basically, NULL does not have a string representation since an empty string is a valid string with a length of 0.

    Unless you happen to also work with Oracle. At this time Oracle treats an empty string as null. Messed me up for awhile until I read that in the documentation. It does say that this is subject to change in future versions of the product. This was Oracle 10 and 11.

    Thanks Lynn. That is interesting. It has been so long since I have worked with Oracle that I either forgot that or I forgot that I never knew it to begin with ;-). I did find this info which sheds a little light on that seemingly odd behavior.

    Eirikur Eiriksson (9/4/2014)


    Solomon Rutzky (9/4/2014)


    Second, this shouldn't be controversial since a NULL concatenated into a non-NULL string will (well, should) result in a NULL. This behavior can be changed via SET CONCAT_NULL_YIELDS_NULL OFF, but the MSDN documentation even says that in a future version it will always be ON. Basically, NULL does not have a string representation since an empty string is a valid string with a length of 0.

    Lets step back here, the function is used for presenting a delimited string data in a SQL Server tabular format, and the whole of the SQL Server unknown/missing logic (although somewhat thin as it does not differ between unknown, missing, no respond, empty, not applicable etc.) depends on NULLs. Now just think how much is going to fail down the line if suddenly NULLs become empty strings. It is no longer an unknown value but an empty value. This has nothing to do with the general NULL agreement/disagreement, it has everything to do with the logic we normally use. Bottom line is that if the value is not there it is truly unknown and in T-SQL or any SQL for that matter that is a NULL, only NULL and nothing (unintended pun here) but NULL.

    😎

    Eirikur, I completely agree with the statements of "just think how much is going to fail down the line if suddenly NULLs become empty strings" and "if the value is not there it is truly unknown and ... that is a NULL, only NULL and nothing ... but NULL". This is why I agree with Jeff in disagreeing with the idea that the split function could, under some other interpretation, return NULL for any element. A NULL, as you stated, means "not there", hence could never have been part of the set that was joined together into a single, delimited string in the first place. If a set contains:

    Bob

    NULL

    Sally

    then the result of that set joined (as in String.Join(",")) together would be:

    Bob,Sally

    On the other hand, the following set:

    Bob

    {empty string}

    Sally

    would result in:

    Bob,,Sally

    Splitting either of those results on the comma would give you back the original set, with the exception, of course, of the missing element in the first set: the known unknown. If the reduced number of elements (due to NULLs not being represented in the CSV list) presents a problem for the operation, then either:

  • send separately the original number of elements (the diff between that and the result from the Split() would be the number of NULL elements)
  • don't allow the source string fields to be NULL
  • do an ISNULL / COALESCE such that NULLs are converted to an empty string or some other reserved designater that the down-stream code knows how to interpret
  • Basically, in terms of SQL Server (and not Oracle ;-)), splitting a string can only return a NULL if the input itself is NULL.

    Take care,

    Solomon..

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky (9/4/2014)


    Lynn Pettis (9/4/2014)


    Solomon Rutzky (9/4/2014)


    Second, this shouldn't be controversial since a NULL concatenated into a non-NULL string will (well, should) result in a NULL. This behavior can be changed via SET CONCAT_NULL_YIELDS_NULL OFF, but the MSDN documentation even says that in a future version it will always be ON. Basically, NULL does not have a string representation since an empty string is a valid string with a length of 0.

    Unless you happen to also work with Oracle. At this time Oracle treats an empty string as null. Messed me up for awhile until I read that in the documentation. It does say that this is subject to change in future versions of the product. This was Oracle 10 and 11.

    Thanks Lynn. That is interesting. It has been so long since I have worked with Oracle that I either forgot that or I forgot that I never knew it to begin with ;-). I did find this info which sheds a little light on that seemingly odd behavior.

    Eirikur Eiriksson (9/4/2014)


    Solomon Rutzky (9/4/2014)


    Second, this shouldn't be controversial since a NULL concatenated into a non-NULL string will (well, should) result in a NULL. This behavior can be changed via SET CONCAT_NULL_YIELDS_NULL OFF, but the MSDN documentation even says that in a future version it will always be ON. Basically, NULL does not have a string representation since an empty string is a valid string with a length of 0.

    Lets step back here, the function is used for presenting a delimited string data in a SQL Server tabular format, and the whole of the SQL Server unknown/missing logic (although somewhat thin as it does not differ between unknown, missing, no respond, empty, not applicable etc.) depends on NULLs. Now just think how much is going to fail down the line if suddenly NULLs become empty strings. It is no longer an unknown value but an empty value. This has nothing to do with the general NULL agreement/disagreement, it has everything to do with the logic we normally use. Bottom line is that if the value is not there it is truly unknown and in T-SQL or any SQL for that matter that is a NULL, only NULL and nothing (unintended pun here) but NULL.

    😎

    Eirikur, I completely agree with the statements of "just think how much is going to fail down the line if suddenly NULLs become empty strings" and "if the value is not there it is truly unknown and ... that is a NULL, only NULL and nothing ... but NULL". This is why I agree with Jeff in disagreeing with the idea that the split function could, under some other interpretation, return NULL for any element. A NULL, as you stated, means "not there", hence could never have been part of the set that was joined together into a single, delimited string in the first place. If a set contains:

    Bob

    NULL

    Sally

    then the result of that set joined (as in String.Join(",")) together would be:

    Bob,Sally

    On the other hand, the following set:

    Bob

    {empty string}

    Sally

    would result in:

    Bob,,Sally

    Splitting either of those results on the comma would give you back the original set, with the exception, of course, of the missing element in the first set: the known unknown. If the reduced number of elements (due to NULLs not being represented in the CSV list) presents a problem for the operation, then either:

  • send separately the original number of elements (the diff between that and the result from the Split() would be the number of NULL elements)
  • don't allow the source string fields to be NULL
  • do an ISNULL / COALESCE such that NULLs are converted to an empty string or some other reserved designater that the down-stream code knows how to interpret
  • Basically, in terms of SQL Server (and not Oracle ;-)), splitting a string can only return a NULL if the input itself is NULL.

    Take care,

    Solomon..

    According to your logic what would a splitter return if the input string had a trailing comma?

    Bob,Sally,

    By no means am I saying I disagree with your sentiments just wondering. Is this a 2 element list or 3? Is the NULL there or is it not? Is the third element an empty string, a NULL or not returned?

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Viewing 15 posts - 646 through 660 (of 981 total)

    You must be logged in to reply to this topic. Login to reply