Tally OH! An Improved SQL 8K “CSV Splitter” Function

  • Jeff Moden (5/24/2012)


    dwain.c (5/24/2012)


    Jeff - I've been a fan of this splitter for a long time even though I've been a slow adopter, 🙂 recommending it highly around my office at any opportunity.

    I just went to use it today and realized that it only supports 1 character delimiter.

    I'm sure I could modify it to handle a longer delimiter but I'm afraid if I bumble around and make those modifications I'm going to do something nasty to its performance.

    Any suggestions on how to do this and not impact the performance to any significant degree?

    Why does it need to handle more than a single character delimiter? Replace multiple character delimiters with 1 before you pass it to the splitter. Then, beat the tar out of the person that designed the data with a multi-character delimiter. 😉

    Rats! Now why didn't I think of that?

    Just goes to show, the right approach was to ask your advice first.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Naomi N (5/3/2011)


    Jeff,

    That was one of the best articles I read recently. Terrific job!

    BTW, is there a way to contact you privately somehow?

    Thanks again.

    My apologies. I missed this response.

    Thank you very much for the feedback. You could always initiate "private" message through this site's PM messaging system.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (5/24/2012)


    dwain.c (5/24/2012)


    Jeff - I've been a fan of this splitter for a long time even though I've been a slow adopter, 🙂 recommending it highly around my office at any opportunity.

    I just went to use it today and realized that it only supports 1 character delimiter.

    I'm sure I could modify it to handle a longer delimiter but I'm afraid if I bumble around and make those modifications I'm going to do something nasty to its performance.

    Any suggestions on how to do this and not impact the performance to any significant degree?

    Why does it need to handle more than a single character delimiter? Replace multiple character delimiters with 1 before you pass it to the splitter. Then, beat the tar out of the person that designed the data with a multi-character delimiter. 😉

    I too have struggled with this occasionally. I use this splitter in a lot of very unconventional ways. For example say I want to find all anchor tags in some html. I can't split on either < or a but being able to split on <a would be highly helpful. I have used your idea of replacing with some other character but this in itself can be somewhat challenging to find a suitable character because you have to scour the data first to see if the character to use as a replacement is nowhere else. hmmm thanks Dwain I think now I may have to make another version of this splitter that allows for varchar as the delimiter. Seems like performance may take a hit but it would be a great addition to the toolbox.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (6/20/2012)


    I too have struggled with this occasionally. I use this splitter in a lot of very unconventional ways. For example say I want to find all anchor tags in some html. I can't split on either < or a but being able to split on <a would be highly helpful. I have used your idea of replacing with some other character but this in itself can be somewhat challenging to find a suitable character because you have to scour the data first to see if the character to use as a replacement is nowhere else. hmmm thanks Dwain I think now I may have to make another version of this splitter that allows for varchar as the delimiter. Seems like performance may take a hit but it would be a great addition to the toolbox.

    For finding a suitable single character value, look outside the printable values. In particular, the ASCII unit and record separators make excellent choices these days. They're almost never used, and they're explicitly defined for the purpose.

    Unit separator:

    SELECT * FROM YourSplitterName(CHAR(31) + 'One' + CHAR(31) + 'Two' + CHAR(31) + 'Three', CHAR(31))

    Record separator:

    SELECT * FROM YourSplitterName(CHAR(30) + 'One' + CHAR(30) + 'Two' + CHAR(30) + 'Three', CHAR(30))

  • Nadrek (6/20/2012)


    Sean Lange (6/20/2012)


    I too have struggled with this occasionally. I use this splitter in a lot of very unconventional ways. For example say I want to find all anchor tags in some html. I can't split on either < or a but being able to split on <a would be highly helpful. I have used your idea of replacing with some other character but this in itself can be somewhat challenging to find a suitable character because you have to scour the data first to see if the character to use as a replacement is nowhere else. hmmm thanks Dwain I think now I may have to make another version of this splitter that allows for varchar as the delimiter. Seems like performance may take a hit but it would be a great addition to the toolbox.

    For finding a suitable single character value, look outside the printable values. In particular, the ASCII unit and record separators make excellent choices these days. They're almost never used, and they're explicitly defined for the purpose.

    Unit separator:

    SELECT * FROM YourSplitterName(CHAR(31) + 'One' + CHAR(31) + 'Two' + CHAR(31) + 'Three', CHAR(31))

    Record separator:

    SELECT * FROM YourSplitterName(CHAR(30) + 'One' + CHAR(30) + 'Two' + CHAR(30) + 'Three', CHAR(30))

    /facepalm

    That is a great idea. Sometimes the easiest solution is right in front you and you just can't see it. Thanks!!

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Nadrek (6/20/2012)


    Sean Lange (6/20/2012)


    I too have struggled with this occasionally. I use this splitter in a lot of very unconventional ways. For example say I want to find all anchor tags in some html. I can't split on either < or a but being able to split on <a would be highly helpful. I have used your idea of replacing with some other character but this in itself can be somewhat challenging to find a suitable character because you have to scour the data first to see if the character to use as a replacement is nowhere else. hmmm thanks Dwain I think now I may have to make another version of this splitter that allows for varchar as the delimiter. Seems like performance may take a hit but it would be a great addition to the toolbox.

    For finding a suitable single character value, look outside the printable values. In particular, the ASCII unit and record separators make excellent choices these days. They're almost never used, and they're explicitly defined for the purpose.

    Unit separator:

    SELECT * FROM YourSplitterName(CHAR(31) + 'One' + CHAR(31) + 'Two' + CHAR(31) + 'Three', CHAR(31))

    Record separator:

    SELECT * FROM YourSplitterName(CHAR(30) + 'One' + CHAR(30) + 'Two' + CHAR(30) + 'Three', CHAR(30))

    +1 on this. We do the same thing with needing to parse CSV params with possible quoted fields.

    /* Anything is possible but is it worth it? */

  • "Back in the old days"... the ASCII set of values has some pretty neat stuff in it. We didn't use to try to make things readable by human on things like paper tape and we sure didn't try to make them readable on mag tape. Instead of tabs, commas, CrLf combinations, and all sorts of "tag" markers, we'd use ASCII characters 28 through 31 and they worked great especially for transmitting things that looked like tables.

    28 = File Separator (think "table" here)

    29 = Group Separator (we used this to group hierarchical information)

    30 = Record Separator (think table "row" here)

    31 = Unit Separator (think table "column" here)

    What that meant is that you could also pass "documents" that had embedded tabs, commas, and CrLfs very easily and without having to worry about using double quotes within CSV data because it wasn't CSV data. It was SSV (Separator Separated Values) or "true ASCII transmission file".

    It would still work great except that humans are annoyed by little square boxes they don't actually know what to do with when they're trying to read data that was meant for loading into a computer. People laugh at this but would you ever actually try to read the old style of Excel data by eye? Heck no!

    Throw in the Esc character (27) and you have the ability to easily replace things like XML (really bloated, IMHO), JSON, and many other formats that you need special code to import data with. With just a little imagination, you might even be able to make it handle Unicode.

    Since the "Bell" character (7) isn't used anymore (it actually caused a physical bell to ring on Teletypes and some monitors and printers), I'll sometimes use that as a "special" delimiter.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff,

    Ctrl+G all the way! I used to do that when character 7 did make the computer beep. 🙂

    Erik

  • dwain.c (5/24/2012)


    I just went to use it today and realized that it only supports 1 character delimiter.

    For anyone interested, here's a multi-character SQLCLR splitter based on Adam Machanic's code:

    CREATE ASSEMBLY [MultiSplit]

    FROM 0x4D5A90000300000004000000FFFF0000B800000000000000400000000000000000000000000000000000000000000000000000000000000000000000800000000E1FBA0E00B409CD21B8014CCD21546869732070726F6772616D2063616E6E6F742062652072756E20696E20444F53206D6F64652E0D0D0A2400000000000000504500004C010300E3DEE24F0000000000000000E00002210B010800000E00000006000000000000FE2C0000002000000040000000004000002000000002000004000000000000000400000000000000008000000002000000000000030040850000100000100000000010000010000000000000100000000000000000000000A42C000057000000004000009003000000000000000000000000000000000000006000000C00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000080000000000000000000000082000004800000000000000000000002E74657874000000040D000000200000000E000000020000000000000000000000000000200000602E7273726300000090030000004000000004000000100000000000000000000000000000400000402E72656C6F6300000C0000000060000000020000001400000000000000000000000000004000004200000000000000000000000000000000E02C000000000000480000000200050054220000500A00000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000D2026F1000000A2D1A036F1000000A2D12026F1100000A036F1100000A73050000062A168D15000001168D1500000173050000062A000000133002002A000000010000110274030000020A03067B01000004281200000A810400000104067B02000004281300000A81050000012A1E02281400000A2A5602281400000A02037D0100000402047D020000042A03300300530000000000000002281400000A02037D0600000402038E697D0800000402047D0700000402048E69D27D0900000402027B0900000417FE017D0A00000402167D0500000402167D0300000402027B09000004155A7D040000042AAA027B05000004027B06000004027B03000004027B04000004027B0300000459731500000A73040000062A000013300300C90000000200001102257B0500000417587D05000004027B04000004027B080000043202162A02027B04000004027B09000004587D03000004027B030000040A2B65170B027B0A0000042C16027B060000040693027B0700000416932E39160B2B35160C2B28060858027B080000042F14027B0600000406085893027B0700000408932E04160B2B0E081758D20C08027B0900000432CF072C0902067D04000004172A0617580A06027B08000004329202027B04000004027B09000004587D0300000402027B080000047D04000004172A5A02167D0300000402027B09000004155A7D040000042A42534A4201000100000000000C00000076322E302E35303732370000000005006C0000002C030000237E000098030000A803000023537472696E6773000000004007000008000000235553004807000010000000234755494400000058070000F802000023426C6F6200000000000000020000015717A2010902000000FA2533001600000100000016000000040000000A000000080000000900000001000000150000000D0000000200000001000000010000000100000001000000020000000200000000000A0001000000000006005C0055000600760063000A00A3008E000A00BE008E000A00C7008E000A00840169010600C301A4010600FA01E80106001102E80106002E02E80106004D02E80106006602E80106007F02E80106009A02E8010600B502E8010600CE02A4010600E202E80106001B03FB0206003B03FB020A006403690106008E03550006009F03550000000000010000000000010001000100100019000000050001000100030110002E000000050001000400020110003B0000000500030005002300E50022002300EE00250001000E0122000100160122000100E500220021001E013E00210028013E0021003201220021003C0142002100490145005020000000009600AC000A0001008820000000009600D10013000300BE20000000008618DF001E000600C620000000008618DF0028000600DC20000000008618DF002E0008003B2100000000E609F30036000A00682100000000E601FF003A000A003D2200000000E60108011E000A0000000100630100000200960100000100A00102000200E50002000300EE0000000100D00100000200D90100000100DE01000002009601040009003100DF001E003900DF001E004100DF00B0004900DF00B0005100DF00B0005900DF00B0006100DF00B0006900DF00B0007100DF00B0007900DF00B0008100DF00B5008900DF00B0009100DF00BA009900DF001E00A100DF001E00190079033A00190084036302210093036802290093036E020900DF001E00B100DF00790220007B00BF0024000B004C002E0033009D022E001B0087022E00230097022E002B0097022E004B0097022E007300D6022E004300AC022E003B0087022E005B0097022E006B00CD0244000B007E00740281020400010000005B0148000200060003000480000001000000CB11D9980000000000005903000002000000000000000000000001004C0000000000020000000000000000000000010082000000000003000200040002000000003C4D6F64756C653E004D756C746953706C69742E646C6C0055736572446566696E656446756E6374696F6E73004F75747075745265636F72640053706C6974537472696E674D756C7469006D73636F726C69620053797374656D004F626A6563740053797374656D2E436F6C6C656374696F6E730049456E756D657261746F720053797374656D2E446174610053797374656D2E446174612E53716C54797065730053716C43686172730053706C6974537472696E675F4D756C74690053716C496E7433320053716C537472696E670046696C6C526F775F4D756C7469002E63746F720073657175656E6365006974656D006765745F43757272656E74004D6F76654E657874005265736574006C617374506F73006E657874506F7300746865537472696E670064656C696D6974657200737472696E674C656E0064656C696D697465724C656E00697353696E676C654368617244656C696D0043757272656E7400496E707574004D6963726F736F66742E53716C5365727665722E5365727665720053716C46616365744174747269627574650044656C696D69746572006F626A0053797374656D2E52756E74696D652E496E7465726F705365727669636573004F75744174747269627574650053657175656E6365004974656D00546865537472696E670053797374656D2E5265666C656374696F6E00417373656D626C795469746C6541747472696275746500417373656D626C794465736372697074696F6E41747472696275746500417373656D626C79436F6E66696775726174696F6E41747472696275746500417373656D626C79436F6D70616E7941747472696275746500417373656D626C7950726F6475637441747472696275746500417373656D626C79436F7079726967687441747472696275746500417373656D626C7954726164656D61726B41747472696275746500417373656D626C7943756C7475726541747472696275746500436F6D56697369626C6541747472696275746500417373656D626C7956657273696F6E4174747269627574650053797374656D2E52756E74696D652E436F6D70696C6572536572766963657300436F6D70696C6174696F6E52656C61786174696F6E734174747269627574650052756E74696D65436F6D7061746962696C697479417474726962757465004D756C746953706C69740053716C46756E6374696F6E417474726962757465006765745F49734E756C6C006765745F56616C75650043686172006F705F496D706C6963697400537472696E670000000003200000000000A3AEF14706753B429B464E1C5D2862700008B77A5C561934E0890800021209120D120D0A0003011C1011111011150320000102060802060E05200201080E072002011D031D030320001C0320000203061D030206050206020328001C31010003005408074D617853697A65FFFFFFFF54020D497346697865644C656E6774680054020A49734E756C6C61626C650031010003005408074D617853697A65FF00000054020D497346697865644C656E6774680054020A49734E756C6C61626C6500042001010E0420010102042001010881A201000600540E1146696C6C526F774D6574686F644E616D650D46696C6C526F775F4D756C74695455794D6963726F736F66742E53716C5365727665722E5365727665722E446174614163636573734B696E642C2053797374656D2E446174612C2056657273696F6E3D322E302E302E302C2043756C747572653D6E65757472616C2C205075626C69634B6579546F6B656E3D623737613563353631393334653038390A446174614163636573730000000054557F4D6963726F736F66742E53716C5365727665722E5365727665722E53797374656D446174614163636573734B696E642C2053797374656D2E446174612C2056657273696F6E3D322E302E302E302C2043756C747572653D6E65757472616C2C205075626C69634B6579546F6B656E3D623737613563353631393334653038391053797374656D446174614163636573730000000054020F497344657465726D696E69737469630154020949735072656369736501540E0F5461626C65446566696E6974696F6E274974656D4E756D62657220696E74656765722C204974656D206E766172636861722834303030290420001D0305000111110805000111150E040701120C072003011D0308080507030802050F01000A4D756C746953706C697400000501000000000E0100094D6963726F736F667400002001001B436F7079726967687420C2A9204D6963726F736F6674203230313200000801000800000000001E01000100540216577261704E6F6E457863657074696F6E5468726F777301000000CC2C00000000000000000000EE2C0000002000000000000000000000000000000000000000000000E02C00000000000000000000000000000000000000005F436F72446C6C4D61696E006D73636F7265652E646C6C0000000000FF250020400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100100000001800008000000000000000000000000000000100010000003000008000000000000000000000000000000100000000004800000058400000380300000000000000000000380334000000560053005F00560045005200530049004F004E005F0049004E0046004F0000000000BD04EFFE0000010000000100D998CB1100000100D998CB113F000000000000000400000002000000000000000000000000000000440000000100560061007200460069006C00650049006E0066006F00000000002400040000005400720061006E0073006C006100740069006F006E00000000000000B00498020000010053007400720069006E006700460069006C00650049006E0066006F00000074020000010030003000300030003000340062003000000034000A00010043006F006D00700061006E0079004E0061006D006500000000004D006900630072006F0073006F0066007400000040000B000100460069006C0065004400650073006300720069007000740069006F006E00000000004D0075006C0074006900530070006C00690074000000000040000F000100460069006C006500560065007200730069006F006E000000000031002E0030002E0034003500350035002E00330039003100320039000000000040000F00010049006E007400650072006E0061006C004E0061006D00650000004D0075006C0074006900530070006C00690074002E0064006C006C00000000005C001B0001004C006500670061006C0043006F007000790072006900670068007400000043006F0070007900720069006700680074002000A90020004D006900630072006F0073006F0066007400200032003000310032000000000048000F0001004F0072006900670069006E0061006C00460069006C0065006E0061006D00650000004D0075006C0074006900530070006C00690074002E0064006C006C000000000038000B000100500072006F0064007500630074004E0061006D006500000000004D0075006C0074006900530070006C00690074000000000044000F000100500072006F006400750063007400560065007200730069006F006E00000031002E0030002E0034003500350035002E00330039003100320039000000000048000F00010041007300730065006D0062006C0079002000560065007200730069006F006E00000031002E0030002E0034003500350035002E00330039003100320039000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000000C000000003D00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

    WITH PERMISSION_SET = SAFE;

    GO

    CREATE FUNCTION dbo.SplitString_Multi

    (

    @Input nvarchar(max),

    @Delimiter nvarchar(255)

    )

    RETURNS TABLE

    (

    ItemNumber integer NULL,

    Item nvarchar(4000) NULL

    )

    -- ORDER (ItemNumber) -- SQL Server 2008 onward

    AS EXTERNAL NAME MultiSplit.UserDefinedFunctions.SplitString_Multi;

    Example usage:

    SELECT

    ssm.ItemNumber,

    ssm.Item

    FROM dbo.SplitString_Multi (N'First-*Second-*Third', N'-*') AS ssm;

    Source code:

    using System.Collections;

    using System.Data.SqlTypes;

    using Microsoft.SqlServer.Server;

    public partial class UserDefinedFunctions

    {

    [SqlFunction

    (

    FillRowMethodName = "FillRow_Multi",

    DataAccess = DataAccessKind.None,

    SystemDataAccess = SystemDataAccessKind.None,

    IsDeterministic = true,

    IsPrecise = true,

    TableDefinition = "ItemNumber integer, Item nvarchar(4000)"

    )

    ]

    public static IEnumerator SplitString_Multi

    (

    [SqlFacet(MaxSize = -1, IsFixedLength = false, IsNullable = false)]

    SqlChars Input,

    [SqlFacet(MaxSize = 255, IsFixedLength = false, IsNullable = false)]

    SqlChars Delimiter

    )

    {

    return

    (

    (Input.IsNull || Delimiter.IsNull) ?

    new SplitStringMulti(new char[0], new char[0]) :

    new SplitStringMulti(Input.Value, Delimiter.Value));

    }

    private sealed class OutputRecord

    {

    internal readonly int sequence;

    internal readonly string item;

    public OutputRecord(int Sequence, string Item)

    {

    this.sequence = Sequence;

    this.item = Item;

    }

    }

    public static void FillRow_Multi(object obj, out SqlInt32 sequence, out SqlString item)

    {

    OutputRecord r = (OutputRecord)obj;

    sequence = r.sequence;

    item = r.item;

    }

    public sealed class SplitStringMulti : IEnumerator

    {

    public SplitStringMulti(char[] TheString, char[] Delimiter)

    {

    theString = TheString;

    stringLen = TheString.Length;

    delimiter = Delimiter;

    delimiterLen = (byte)(Delimiter.Length);

    isSingleCharDelim = (delimiterLen == 1);

    sequence = 0;

    lastPos = 0;

    nextPos = delimiterLen * -1;

    }

    #region IEnumerator Members

    public object Current

    {

    get

    {

    return new OutputRecord(sequence, new string(theString, lastPos, nextPos - lastPos));

    }

    }

    public bool MoveNext()

    {

    sequence++;

    if (nextPos >= stringLen)

    return false;

    else

    {

    lastPos = nextPos + delimiterLen;

    for (int i = lastPos; i < stringLen; i++)

    {

    bool matches = true;

    if (isSingleCharDelim)

    {

    if (theString != delimiter[0])

    matches = false;

    }

    else

    {

    for (byte j = 0; j < delimiterLen; j++)

    {

    if (((i + j) >= stringLen) || (theString != delimiter[j]))

    {

    matches = false;

    break;

    }

    }

    }

    if (matches)

    {

    nextPos = i;

    return true;

    }

    }

    lastPos = nextPos + delimiterLen;

    nextPos = stringLen;

    return true;

    }

    }

    public void Reset()

    {

    lastPos = 0;

    nextPos = delimiterLen * -1;

    }

    #endregion

    private int lastPos;

    private int nextPos;

    private int sequence;

    private readonly char[] theString;

    private readonly char[] delimiter;

    private readonly int stringLen;

    private readonly byte delimiterLen;

    private readonly bool isSingleCharDelim;

    }

    };

  • Messing around with some code yesterday got me thinking. DS8K throws away all the tally table rows where substring(string,n,1) isn't a delimiter. The discarded rows contain information - the distance to the next delimiter. If you could count them, you could do away with the CHARINDEX used to find the length of an item. Counting the rows in between delimiters using the DS8K code block wouldn't work, it would introduce too much cost - it would be an islands'n'gaps analysis.

    But what about CHARINDEX?

    DECLARE @OneRowOfData VARCHAR(8000), @pDelimiter VARCHAR(1)

    -- delimiters at 4, 13, 20, 41, 50, 54, end + 1 = 91

    SET @OneRowOfData = '255.a55xxxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.255.255000000000000000000000000000000000'

    SET @pDelimiter = '.'

    ;WITH Tally AS (

    SELECT TOP (1+DATALENGTH(@OneRowOfData))

    n = ROW_NUMBER() OVER(ORDER BY @@SPID)

    FROM sys.columns a, sys.columns b

    )

    SELECT

    n,

    NextDelimiter = CHARINDEX('.',@OneRowOfData+'.',n)

    FROM Tally

    This generates output like the following:

    n NextDelimiter

    1 4

    2 4

    3 4

    4 4

    5 13

    6 13

    ..

    13 13

    14 20

    15 20

    Grouping the output over [NextDelimiter] yields the next delimiter position, and the length of the preceeding segment:

    ;WITH Tally AS (

    SELECT TOP (1+DATALENGTH(@OneRowOfData))

    n = ROW_NUMBER() OVER(ORDER BY @@SPID)

    FROM sys.columns a, sys.columns b

    )

    SELECT [WordLength] = COUNT(*)-1, NextDelimiter

    FROM (

    SELECT

    n,

    NextDelimiter = CHARINDEX('.',@OneRowOfData+'.',n)

    FROM Tally

    ) d

    GROUP BY NextDelimiter

    Output:

    WordLength NextDelimiter

    3 4

    8 13

    6 20

    20 41

    8 50

    3 54

    36 91

    - which is all the information you need to put together a splitter. Note that to record a value other than 0 for the last segment of the input string it has to be terminated with a delimiter, and the number of rows collected from the tally has to be adjusted to account for this.

    Putting it all together with the code to provide a list sequence number yields the following:

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (

    SELECT TOP (1+DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    )

    SELECT

    ItemNumber = ROW_NUMBER() OVER(ORDER BY p),

    Item = SUBSTRING(@pString ,p-(COUNT(*)-1), (COUNT(*)-1))

    FROM cteTally

    OUTER APPLY (SELECT CHARINDEX(@pDelimiter,@pString+@pDelimiter,n)) x (p)

    GROUP BY p

    - where I've used the inline tally cte from DS8K because folks are familiar with it.

    So how does it perform? Plugged into the test harness used for this article, it works well for small numbers but horribly when the number of segments rises beyond five or six. Out of curiosity, I tested it against a mimic of some real data (10 rows multiplied out 10,000 times) and the results were surprising:

    APPLY splitter ======================================================

    Table 'Worktable'. Scan count 10, logical reads 200121, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Temp_______________________________________________________________________________________________________________000000000044'. Scan count 1, logical reads 1360, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 998 ms, elapsed time = 6939 ms.

    DelimitedSplit8K ====================================================

    Table 'Worktable'. Scan count 10, logical reads 200121, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Temp_______________________________________________________________________________________________________________000000000044'. Scan count 1, logical reads 1360, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 3963 ms, elapsed time = 6889 ms.

    Heh, interesting. So what happens if you eliminate some of the io (stream the results into a @BlackHole)?

    APPLY splitter ======================================================

    Table '#Temp_______________________________________________________________________________________________________________000000000044'. Scan count 3, logical reads 1360, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 10, logical reads 200122, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1420 ms, elapsed time = 1343 ms.

    DelimitedSplit8K ====================================================

    Table 'Worktable'. Scan count 10, logical reads 200121, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Temp_______________________________________________________________________________________________________________000000000044'. Scan count 1, logical reads 1360, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 3900 ms, elapsed time = 4099 ms.

    It's been fun to play with. Make of this what you will - but if you choose to use it, be sure to test it rigorously first.

    Here's the test harness:

    SET NOCOUNT ON

    DECLARE @OneRowOfData VARCHAR(8000), @pDelimiter VARCHAR(1)

    SET @pDelimiter = '.'

    /*

    -- 100,000 rows of quick and dirty data: 10 rows repeated 10000 times.

    DROP TABLE #Temp

    ;WITH Tally AS (

    SELECT n = n1+n2+n3+n4+1

    FROM (VALUES (0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) t2(n2)

    CROSS APPLY (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t3(n3)

    CROSS APPLY (VALUES (0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) t1(n1)

    CROSS APPLY (VALUES (0),(1000),(2000),(3000),(4000),(5000),(6000),(7000),(8000),(9000)) t4(n4)

    )

    SELECT OneRowOfData

    INTO #Temp

    FROM Tally

    CROSS JOIN (

    SELECT OneRowOfData = '255.a55xxxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.251.2550000000000000000A00000000000000000' UNION ALL

    SELECT '2DDD55.a55xxDxxA.bxxB.cxxxxxxxxxxxxxC.dxxxxD.252.25500000000000B000000000000' UNION ALL

    SELECT '2FFF55.a55xxDDDDDDDDDxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.253.255000000000000000000C000000000000000' UNION ALL

    SELECT '25SS5.a55xxDxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.254.255000000000000000D000000000000000000' UNION ALL

    SELECT '25x.a55xxxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.255.25500000000000000000E0000000000000000' UNION ALL

    SELECT '2755.a55xxDDDxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.256.25500000000000000000000F0000000000000' UNION ALL

    SELECT '2BBBBBBBBB55.a55xxDDxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.257.25500000000000000G0000000000000000000' UNION ALL

    SELECT '25EEE5.a55xDDDDDDDxxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.258.25500000000000000000H0000000000000000' UNION ALL

    SELECT '2WW55.a55xxxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.259.25500000000000000000I0000000000000000' UNION ALL

    SELECT '25LLLLLLLLLLLLLL5.a55xxDDDDDDDDDDxxA.bxxxxB.cxxxxxxxxxxxxxxxxxxC.dxxxxxxD.260.25500000000000000000J0000000000000000'

    ) d

    */

    DECLARE @BlackHole VARCHAR(10)

    PRINT 'APPLY splitter ======================================================'

    SET STATISTICS IO,TIME ON

    SELECT @BlackHole = Item

    FROM (

    SELECT x.*

    FROM #Temp t

    CROSS APPLY [dbo].[DelimitedSplitCA] (t.OneRowOfData, @pDelimiter) x

    ) t

    SET STATISTICS IO,TIME OFF

    PRINT 'DelimitedSplit8K ===================================================='

    SET STATISTICS IO,TIME ON

    SELECT @BlackHole = Item

    FROM (

    SELECT x.*

    FROM #Temp t

    CROSS APPLY [dbo].[DelimitedSplit8K] (t.OneRowOfData, @pDelimiter) x

    ) t

    SET STATISTICS IO,TIME OFF

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris,

    Which version of DS8K did you use? The one from the article or the updated one in the attachments?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hi Jeff

    This one, which I'm pretty sure is the most recent:

    ALTER FUNCTION [dbo].[DelimitedSplit8K]

    --===== Define I/O parameters

    (@pString VARCHAR(8000), @pDelimiter CHAR(1))

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...

    -- enough to cover VARCHAR(8000)

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT 0 UNION ALL

    SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT t.N+1

    FROM cteTally t

    WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)

    )

    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),

    Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))

    FROM cteStart s

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Jeff Moden (7/4/2012)


    Chris,

    Which version of DS8K did you use? The one from the article or the updated one in the attachments?

    It was an old version :blush: the updated one is significantly more efficient.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Jeff Moden (7/4/2012)


    Chris,

    Which version of DS8K did you use? The one from the article or the updated one in the attachments?

    Update the dratted article already! 😎 This happened to me too. :crazy:

  • +1 to Chris for trying!

    I guess now I'll need to check whether I'm using the right version of DelimitedSplit8K!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 15 posts - 331 through 345 (of 981 total)

You must be logged in to reply to this topic. Login to reply