Tally OH! An Improved SQL 8K “CSV Splitter” Function

  • aochss (1/16/2013)


    I would not expect a record back if the string being sent in is NULL, a single space or all spaces. In my case, I have records where the string being split is often blank and I would want no records coming back from the CROSS APPLY. I could filter them out using in the WHERE clause. I fixed this by adding LTRIM and RTRIM to the source string in the function.

    Also, some of my "not so good" data also had multiple spaces in between the values causing numerous records returned that were blank.

    For example:

    dbo.DelimitedSplit8k('123 467 789', ' ') -- There are two spaces between 467 and 789

    ItemNumber Item

    1 123

    2 467

    3 <-- Empty String

    4 789

    I modified the function to give an option for removing extra spaces after the delimiter (like '123, 345, 567) and then as you did to your version to check for null/blank input as well as null/blank rows in the output. To do that I had to change the function into a multi-statement tvf.

    Due to the additional checking for nulls/blanks and removing blank rows from the returned table I figured efficiency would be seriously compromised. So I tested this revision against the original and it seriously under performs. In the results below the first run of DelimitedSplit8k_NULLS sets the parameter "@bRemoveSpacesAfterDelimiter" to 0 and the 2nd run set it to 1.

    I can see where nulls/blanks could be an issue, but I think it would be better to filter out such rows AFTER applying any CROSS APPLYs using the original function as opposed to trying to do the filtering within the function (though I didn't specifically test that). Maybe someone else can come up with a better way of getting rid of nulls/blanks without compromising the efficiency of DelimitedSplit8K?

    Test Results:

    RowNum SplitterName NumberOfRows NumberOfElements MinElementLength MaxElementLength Duration MinLength AvgLength MaxLength

    1 DelimitedSplit8K 1000 1 1 10 0.01300 1 5 10

    2 DelimitedSplit8K_NULLS 1000 1 1 10 0.29300 1 5 10

    3 DelimitedSplit8K_NULLS 1000 1 1 10 0.30000 1 5 10

    4 DelimitedSplit8K 1000 2 1 10 0.01600 3 11 21

    5 DelimitedSplit8K_NULLS 1000 2 1 10 0.36000 3 11 21

    6 DelimitedSplit8K_NULLS 1000 2 1 10 0.24000 3 11 21

    7 DelimitedSplit8K 1000 4 1 10 0.03300 9 25 40

    8 DelimitedSplit8K_NULLS 1000 4 1 10 0.40300 9 25 40

    9 DelimitedSplit8K_NULLS 1000 4 1 10 0.31000 9 25 40

    10 DelimitedSplit8K 1000 8 1 10 0.05600 29 50 74

    11 DelimitedSplit8K_NULLS 1000 8 1 10 0.51300 29 50 74

    12 DelimitedSplit8K_NULLS 1000 8 1 10 0.58600 29 50 74

    13 DelimitedSplit8K 1000 16 1 10 0.11000 72 102 137

    14 DelimitedSplit8K_NULLS 1000 16 1 10 0.84300 72 102 137

    15 DelimitedSplit8K_NULLS 1000 16 1 10 0.69600 72 102 137

    16 DelimitedSplit8K 1000 32 1 10 0.24300 148 207 251

    17 DelimitedSplit8K_NULLS 1000 32 1 10 1.37600 148 207 251

    18 DelimitedSplit8K_NULLS 1000 32 1 10 1.23600 148 207 251

    19 DelimitedSplit8K 1000 64 1 10 0.40300 328 415 482

    20 DelimitedSplit8K_NULLS 1000 64 1 10 2.48600 328 415 482

    21 DelimitedSplit8K_NULLS 1000 64 1 10 2.29000 328 415 482

    22 DelimitedSplit8K 1000 128 1 10 0.82600 734 832 942

    23 DelimitedSplit8K_NULLS 1000 128 1 10 4.69600 734 832 942

    24 DelimitedSplit8K_NULLS 1000 128 1 10 4.46000 734 832 942

    25 DelimitedSplit8K 1000 256 1 10 1.62000 1528 1661 1815

    26 DelimitedSplit8K_NULLS 1000 256 1 10 8.88000 1528 1661 1815

    27 DelimitedSplit8K_NULLS 1000 256 1 10 8.87300 1528 1661 1815

    28 DelimitedSplit8K 1000 512 1 10 3.21000 3124 3322 3493

    29 DelimitedSplit8K_NULLS 1000 512 1 10 18.02600 3124 3322 3493

    30 DelimitedSplit8K_NULLS 1000 512 1 10 17.05000 3124 3322 3493

    31 DelimitedSplit8K 1000 1150 1 10 7.44600 7158 7478 7798

    32 DelimitedSplit8K_NULLS 1000 1150 1 10 40.46000 7158 7478 7798

    33 DelimitedSplit8K_NULLS 1000 1150 1 10 37.25000 7158 7478 7798

    34 DelimitedSplit8K 1000 1 10 20 0.01600 10 14 20

    35 DelimitedSplit8K_NULLS 1000 1 10 20 0.20600 10 14 20

    36 DelimitedSplit8K_NULLS 1000 1 10 20 0.20600 10 14 20

    37 DelimitedSplit8K 1000 2 10 20 0.02000 21 30 41

    38 DelimitedSplit8K_NULLS 1000 2 10 20 0.30000 21 30 41

    39 DelimitedSplit8K_NULLS 1000 2 10 20 0.30600 21 30 41

    40 DelimitedSplit8K 1000 4 10 20 0.04600 46 62 81

    41 DelimitedSplit8K_NULLS 1000 4 10 20 0.46000 46 62 81

    42 DelimitedSplit8K_NULLS 1000 4 10 20 0.46000 46 62 81

    43 DelimitedSplit8K 1000 8 10 20 0.09600 102 126 155

    44 DelimitedSplit8K_NULLS 1000 8 10 20 0.74000 102 126 155

    45 DelimitedSplit8K_NULLS 1000 8 10 20 0.75000 102 126 155

    46 DelimitedSplit8K 1000 16 10 20 0.18300 216 254 292

    47 DelimitedSplit8K_NULLS 1000 16 10 20 1.32000 216 254 292

    48 DelimitedSplit8K_NULLS 1000 16 10 20 1.32300 216 254 292

    49 DelimitedSplit8K 1000 32 10 20 0.36000 458 510 586

    50 DelimitedSplit8K_NULLS 1000 32 10 20 2.52300 458 510 586

    51 DelimitedSplit8K_NULLS 1000 32 10 20 2.49600 458 510 586

    52 DelimitedSplit8K 1000 64 10 20 0.74000 945 1022 1097

    53 DelimitedSplit8K_NULLS 1000 64 10 20 4.75600 945 1022 1097

    54 DelimitedSplit8K_NULLS 1000 64 10 20 4.79000 945 1022 1097

    55 DelimitedSplit8K 1000 128 10 20 1.44000 1944 2047 2156

    56 DelimitedSplit8K_NULLS 1000 128 10 20 9.38300 1944 2047 2156

    57 DelimitedSplit8K_NULLS 1000 128 10 20 9.44600 1944 2047 2156

    58 DelimitedSplit8K 1000 256 10 20 2.88300 3903 4096 4254

    59 DelimitedSplit8K_NULLS 1000 256 10 20 18.67000 3903 4096 4254

    60 DelimitedSplit8K_NULLS 1000 256 10 20 18.86600 3903 4096 4254

    61 DelimitedSplit8K 1000 480 10 20 5.45600 7432 7680 7881

    62 DelimitedSplit8K_NULLS 1000 480 10 20 34.67600 7432 7680 7881

    63 DelimitedSplit8K_NULLS 1000 480 10 20 35.24600 7432 7680 7881

    64 DelimitedSplit8K 1000 1 20 30 0.01300 20 24 30

    65 DelimitedSplit8K_NULLS 1000 1 20 30 0.24000 20 24 30

    66 DelimitedSplit8K_NULLS 1000 1 20 30 0.24300 20 24 30

    67 DelimitedSplit8K 1000 2 20 30 0.02300 41 51 61

    68 DelimitedSplit8K_NULLS 1000 2 20 30 0.41300 41 51 61

    69 DelimitedSplit8K_NULLS 1000 2 20 30 0.40300 41 51 61

    70 DelimitedSplit8K 1000 4 20 30 0.07000 86 103 121

    71 DelimitedSplit8K_NULLS 1000 4 20 30 0.66000 86 103 121

    72 DelimitedSplit8K_NULLS 1000 4 20 30 0.66000 86 103 121

    73 DelimitedSplit8K 1000 8 20 30 0.13600 180 206 232

    74 DelimitedSplit8K_NULLS 1000 8 20 30 1.15600 180 206 232

    75 DelimitedSplit8K_NULLS 1000 8 20 30 1.19300 180 206 232

    76 DelimitedSplit8K 1000 16 20 30 0.27000 379 415 456

    77 DelimitedSplit8K_NULLS 1000 16 20 30 2.17300 379 415 456

    78 DelimitedSplit8K_NULLS 1000 16 20 30 2.21600 379 415 456

    79 DelimitedSplit8K 1000 32 20 30 0.52600 774 830 892

    80 DelimitedSplit8K_NULLS 1000 32 20 30 4.18600 774 830 892

    81 DelimitedSplit8K_NULLS 1000 32 20 30 4.33000 774 830 892

    82 DelimitedSplit8K 1000 64 20 30 1.05600 1582 1663 1747

    83 DelimitedSplit8K_NULLS 1000 64 20 30 8.30000 1582 1663 1747

    84 DelimitedSplit8K_NULLS 1000 64 20 30 8.37300 1582 1663 1747

    85 DelimitedSplit8K 1000 128 20 30 2.19300 3208 3325 3442

    86 DelimitedSplit8K_NULLS 1000 128 20 30 16.45300 3208 3325 3442

    87 DelimitedSplit8K_NULLS 1000 128 20 30 16.68600 3208 3325 3442

    88 DelimitedSplit8K 1000 256 20 30 4.22000 6484 6657 6834

    89 DelimitedSplit8K_NULLS 1000 256 20 30 32.53000 6484 6657 6834

    90 DelimitedSplit8K_NULLS 1000 256 20 30 32.85000 6484 6657 6834

    91 DelimitedSplit8K 1000 290 20 30 4.76000 7369 7539 7712

    92 DelimitedSplit8K_NULLS 1000 290 20 30 37.29000 7369 7539 7712

    93 DelimitedSplit8K_NULLS 1000 290 20 30 38.92000 7369 7539 7712

    94 DelimitedSplit8K 1000 1 30 40 0.01300 30 34 40

    95 DelimitedSplit8K_NULLS 1000 1 30 40 0.27300 30 34 40

    96 DelimitedSplit8K_NULLS 1000 1 30 40 0.28000 30 34 40

    97 DelimitedSplit8K 1000 2 30 40 0.02600 61 70 81

    98 DelimitedSplit8K_NULLS 1000 2 30 40 0.50300 61 70 81

    99 DelimitedSplit8K_NULLS 1000 2 30 40 0.51000 61 70 81

    100 DelimitedSplit8K 1000 4 30 40 0.09300 124 142 158

    101 DelimitedSplit8K_NULLS 1000 4 30 40 0.89300 124 142 158

    102 DelimitedSplit8K_NULLS 1000 4 30 40 0.93600 124 142 158

    103 DelimitedSplit8K 1000 8 30 40 0.18600 262 286 314

    104 DelimitedSplit8K_NULLS 1000 8 30 40 1.66300 262 286 314

    105 DelimitedSplit8K_NULLS 1000 8 30 40 1.67600 262 286 314

    106 DelimitedSplit8K 1000 16 30 40 0.35000 529 574 613

    107 DelimitedSplit8K_NULLS 1000 16 30 40 3.19300 529 574 613

    108 DelimitedSplit8K_NULLS 1000 16 30 40 3.25000 529 574 613

    109 DelimitedSplit8K 1000 32 30 40 0.71300 1097 1150 1222

    110 DelimitedSplit8K_NULLS 1000 32 30 40 7.12600 1097 1150 1222

    111 DelimitedSplit8K_NULLS 1000 32 30 40 6.80300 1097 1150 1222

    112 DelimitedSplit8K 1000 64 30 40 1.51600 2225 2302 2376

    113 DelimitedSplit8K_NULLS 1000 64 30 40 12.73300 2225 2302 2376

    114 DelimitedSplit8K_NULLS 1000 64 30 40 12.53000 2225 2302 2376

    115 DelimitedSplit8K 1000 128 30 40 2.85600 4487 4604 4706

    116 DelimitedSplit8K_NULLS 1000 128 30 40 24.68000 4487 4604 4706

    117 DelimitedSplit8K_NULLS 1000 128 30 40 24.94300 4487 4604 4706

    118 DelimitedSplit8K 1000 210 30 40 4.99600 7378 7558 7705

    119 DelimitedSplit8K_NULLS 1000 210 30 40 41.28300 7378 7558 7705

    120 DelimitedSplit8K_NULLS 1000 210 30 40 41.99300 7378 7558 7705

    121 DelimitedSplit8K 1000 1 40 50 0.01300 40 44 50

    122 DelimitedSplit8K_NULLS 1000 1 40 50 0.33300 40 44 50

    123 DelimitedSplit8K_NULLS 1000 1 40 50 0.34000 40 44 50

    124 DelimitedSplit8K 1000 2 40 50 0.03000 81 91 101

    125 DelimitedSplit8K_NULLS 1000 2 40 50 0.65600 81 91 101

    126 DelimitedSplit8K_NULLS 1000 2 40 50 0.68000 81 91 101

    127 DelimitedSplit8K 1000 4 40 50 0.11600 166 183 199

    128 DelimitedSplit8K_NULLS 1000 4 40 50 1.19600 166 183 199

    129 DelimitedSplit8K_NULLS 1000 4 40 50 1.22600 166 183 199

    130 DelimitedSplit8K 1000 8 40 50 0.22300 341 367 392

    131 DelimitedSplit8K_NULLS 1000 8 40 50 2.44000 341 367 392

    132 DelimitedSplit8K_NULLS 1000 8 40 50 2.31300 341 367 392

    133 DelimitedSplit8K 1000 16 40 50 0.47600 700 735 777

    134 DelimitedSplit8K_NULLS 1000 16 40 50 4.64300 700 735 777

    135 DelimitedSplit8K_NULLS 1000 16 40 50 4.69600 700 735 777

    136 DelimitedSplit8K 1000 32 40 50 0.91000 1401 1471 1531

    137 DelimitedSplit8K_NULLS 1000 32 40 50 9.08000 1401 1471 1531

    138 DelimitedSplit8K_NULLS 1000 32 40 50 9.06600 1401 1471 1531

    139 DelimitedSplit8K 1000 64 40 50 1.77600 2860 2945 3026

    140 DelimitedSplit8K_NULLS 1000 64 40 50 17.82300 2860 2945 3026

    141 DelimitedSplit8K_NULLS 1000 64 40 50 17.84600 2860 2945 3026

    142 DelimitedSplit8K 1000 128 40 50 3.54300 5785 5887 5997

    143 DelimitedSplit8K_NULLS 1000 128 40 50 35.12000 5785 5887 5997

    144 DelimitedSplit8K_NULLS 1000 128 40 50 36.19600 5785 5887 5997

    145 DelimitedSplit8K 1000 165 40 50 4.55600 7473 7589 7743

    146 DelimitedSplit8K_NULLS 1000 165 40 50 45.69600 7473 7589 7743

    147 DelimitedSplit8K_NULLS 1000 165 40 50 45.45300 7473 7589 7743

    Here's the revised function I tested. I won't claim that it is the best variation--just a quick-and-dirty trial run for comparison purposes.

    /*

    --Some test strings for manual testing

    SELECT * FROM dbo.DelimitedSplit8K_NULLS(' ',' ',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('',' ',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS(NULL,' ',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS(' ',',',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('',',',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS(NULL,',',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('123 467 789',' ',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('123 467 789',' ',1)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('123 467 789',' ',1)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('123 467 789',' ',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('123, 467, 789',',',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('123, 467, 789',',',1)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('123,467,789',',',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('123,467,789',',',1)

    */

    CREATE FUNCTION [dbo].[DelimitedSplit8K_NULLS]

    (

    @pString VARCHAR(8000)

    ,@pDelimiter CHAR(1)

    ,@bRemoveSpaceAfterDelimiter BIT

    )

    RETURNS @Split TABLE

    (

    [ItemNumber] INT

    ,[Item] VARCHAR(50)

    )

    WITH SCHEMABINDING

    AS

    BEGIN

    SET @pString = NULLIF(RTRIM(LTRIM(@pString)),'')

    IF @bRemoveSpaceAfterDelimiter = 1

    SET @pString = REPLACE(@pString,@pDelimiter+' ',@pDelimiter)

    IF @pString IS NULL

    RETURN

    ELSE

    WITH E1(N)

    AS (

    SELECT

    1

    UNION ALL

    SELECT

    1

    UNION ALL

    SELECT

    1

    UNION ALL

    SELECT

    1

    UNION ALL

    SELECT

    1

    UNION ALL

    SELECT

    1

    UNION ALL

    SELECT

    1

    UNION ALL

    SELECT

    1

    UNION ALL

    SELECT

    1

    UNION ALL

    SELECT

    1

    ), --10E+1 or 10 rows

    E2(N)

    AS (

    SELECT

    1

    FROM

    E1 a

    ,E1 b

    ), --10E+2 or 100 rows

    E4(N)

    AS (

    SELECT

    1

    FROM

    E2 a

    ,E2 b

    ), --10E+4 or 10,000 rows max

    cteTally(N)

    AS (

    SELECT TOP (ISNULL(DATALENGTH(@pString),0))

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM E4

    ),

    cteStart(N1)

    AS (

    SELECT

    1

    UNION ALL

    SELECT

    t.N + 1

    FROM

    cteTally t

    WHERE

    SUBSTRING(@pString,t.N,1) = @pDelimiter

    ),

    cteLen(N1,L1)

    AS (

    SELECT

    s.N1

    ,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0) - s.N1,8000)

    FROM

    cteStart s

    )

    INSERT INTO @Split

    SELECT

    ItemNumber = ROW_NUMBER() OVER (ORDER BY l.N1)

    ,Item = SUBSTRING(@pString,l.N1,l.L1)

    FROM

    cteLen l

    WHERE

    NULLIF(RTRIM(LTRIM(SUBSTRING(@pString,l.N1,l.L1))),'') IS NOT NULL

    RETURN

    END

  • Due to the additional checking for nulls/blanks and removing blank rows from the returned table I figured efficiency would be seriously compromised. So I tested this revision against the original and it seriously under performs. In the results below the first run of DelimitedSplit8k_NULLS sets the parameter "@bRemoveSpacesAfterDelimiter" to 0 and the 2nd run set it to 1.

    Steve,

    I came up with the same results after doing almost exactly the same thing.

    In fact, I found that even after taking out the cleaning of the duplicate characters and check for null, the function with the return table declaration is a lot slower than the original single statement function. I wonder if SQL Server is using a different memory management scheme.

    I guess having the WHERE clause in the CROSS APPLY do the work gives the next person looking at my code an idea of what I am trying to do.

    Either way it has been a fun afternoon of testing the assumptions and changes. This is a perfect example of why I still love programming after over 20 years of doing this stuff. Going through Jeff's code line by line was quite a revelation in how to use CTE, NULLIF and "Tally" tables in ways I never thought of.

    Anton

  • Once you switch from an iTVF to a multi-statement TVF you are dead meat and the Fat Lady is belting a tune. You basically get a cursor under the covers because the mTVF cannot be truly in-lined with the query/queryplan. The optimization process is shot to hell too (although that won't come into play here). Voids parallelism also IIRC.

    Try doing a statement-level completion profiler trace while running each type of code. CAUTION: beware statement completed profiling on production systems!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (1/16/2013)


    Once you switch from an iTVF to a multi-statement TVF you are dead meat and the Fat Lady is belting a tune. You basically get a cursor under the covers

    What was startling for me was not that functions acted like an implicit cursor (I'd always known that) but rather that you could create inline functions that don't!

  • aochss (1/16/2013)


    The performance of this is amazing, as is the thought behind the code. These kind of posts really make you think and learn a whole new set of features in SQL

    Thanks Jeff.

    A couple of questions (sorry if it has been addressed - I didn't see it mentioned in the thread):

    I would not expect a record back if the string being sent in is NULL, a single space or all spaces. In my case, I have records where the string being split is often blank and I would want no records coming back from the CROSS APPLY. I could filter them out using in the WHERE clause.

    The best indeed would be to filter them out in the WHERE clause with something like WHERE LEN(StringColumn) > 0. Hence, the function calls would be for only those rows which are neither NULL nor containing only space/s.

    But this can also be done in the function itself as I have tried to do it.


    Also, some of my "not so good" data also had multiple spaces in between the values causing numerous records returned that were blank.

    For example:

    dbo.DelimitedSplit8k('123 467 789', ' ') -- There are two spaces between 467 and 789

    ItemNumber Item

    1 123

    2 467

    3 <-- Empty String

    4 789

    Thanks Again,

    Anton

    Here is my attempt while keeping the function iTVF (I did not test it thoroughly). Only slight changes are done

    CREATE FUNCTION [dbo].[DelimitedSplit8K_With_NULL_ZeroLengthString_And_ConsecutiveDelimiter_Filter]

    --===== Define I/O parameters

    (@pString VARCHAR(8000), @pDelimiter VARCHAR(1))

    --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...

    -- enough to cover VARCHAR(8000)

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    /*========= USE LEN() FUNCTION TO CHECK WHETHER STRING IS NULL OR CONTAIN ONLY SPACES =======*/

    WHERE LEN(@pString) > 0 --FILTER OUT NON_REQUIRED

    /*===========================================================================================*/

    ),

    cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT 1

    /*========= USE LEN() FUNCTION TO CHECK WHETHER STRING IS NULL OR CONTAIN ONLY SPACES =======*/

    WHERE LEN(@pString) > 0 --FILTER OUT NON_REQUIRED

    /*===========================================================================================*/

    UNION ALL

    SELECT t.N+1 FROM cteTally t

    WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter

    /*====== --FILTER OUT TWO CONSECUTIVE OCCURENCES OF DELIMITERS ============*/

    AND SUBSTRING(@pString,t.N+1,1) <> @pDelimiter

    /*=========================================================================*/

    ),

    cteLen(N1,L1) AS(--==== Return start and length (for use in substring)

    SELECT s.N1,

    ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)

    FROM cteStart s

    )

    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),

    Item = RTRIM(LTRIM(SUBSTRING(@pString, l.N1, l.L1)))--USE LTRIM AND RTRIM FUNCTION TO OMIT TRAILING SPACES

    FROM cteLen l

    ;

  • TheSQLGuru (1/16/2013)


    Once you switch from an iTVF to a multi-statement TVF you are dead meat and the Fat Lady is belting a tune.

    Not always true 🙂 It really depends upon several things. For instance, if we have to deal with VARCHAR(max) strings and multiple rows, the mTVF version of the tally table splitter can beat the iTVF version quite comprehensively. Sometimes, mTVF versions can show their worth too 😉

    I am not trying to be infamous but sometimes RBAR approach could be handy as well :-D. I hope my Anti-RBAR membership is not cancelled for issuing this statement :hehe:

  • Usman Butt (1/17/2013)


    TheSQLGuru (1/16/2013)


    Once you switch from an iTVF to a multi-statement TVF you are dead meat and the Fat Lady is belting a tune.

    Not always true 🙂 It really depends upon several things. For instance, if we have to deal with VARCHAR(max) strings and multiple rows, the mTVF version of the tally table splitter can beat the iTVF version quite comprehensively. Sometimes, mTVF versions can show their worth too 😉

    I am not trying to be infamous but sometimes RBAR approach could be handy as well :-D. I hope my Anti-RBAR membership is not cancelled for issuing this statement :hehe:

    Multi-line TVF work well when they are called/invoked sparsly or only once as say the logical starting table in your from clause, before other joins take place. They can also work quite well as part of an additional condition in a join. If it is not the only condition and applied using OR logic, it only needs to be invoked when other simpler conditions do not match.

    mTVF also allow some control over the indexing of the result table (trough PK and unique constraints). This can speed up certain operations in complex processing as the required indexing on the datamodel might not be there and/or too costly to implement for all records.

    What I generaly do is write first as an iTVF and if for some reason it turns out unexpectedly slow or impractical, make a mTVF from it as well with proper indexing to speed up the propblem operation.

  • I have no problem whatsoever in using the right tool for the job!! In fact, I make a fair bit of my consulting dollars cleaning up messes from clients who do NOT do that! 😎 Sometimes being "dead meat" IS the better alternative. :w00t:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hello,

    Greatt article. I used the function and works great. I have a situation where I need to search for more than 1 character, then split the data. How could I use this function to search for 2 characters. For example, I want to search for semicolon, or, comma (;,), then, split the column if either is found.

  • How can I use this function to search for more than 1 character? I have data that is delimited by either a comma or semicolon (,;). Need to split contents based on either character.

  • ahpitre (2/11/2013)


    How can I use this function to search for more than 1 character? I have data that is delimited by either a comma or semicolon (,;). Need to split contents based on either character.

    Easiest way I can think of would be to simply use replace.

    select *

    from YourTable

    cross apply dbo.DelimitedSplit8k(replace(YourColumn, ';', ','), ',')

    This way you are still splitting on commas but it will work for either character. 😉

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • ahpitre (2/11/2013)


    How can I use this function to search for more than 1 character? I have data that is delimited by either a comma or semicolon (,;). Need to split contents based on either character.

    Just throw in a replace to change the second character into the desired delimiter. I can't guess what effect this would have on performance of the function.

    If you are talking about a two-dimensional array like 'A,B;X,Y;D,E' and you want to split that into two columns, then that's a different problem.

    DECLARE @strExample VARCHAR(8000)

    SET @strExample = 'A,B;C;D,E,F,G,H'

    SELECT

    ItemNumber

    ,Item

    FROM

    dbo.DelimitedSplit8K(REPLACE(@strExample,';',','),',') AS dsk

  • Sorry Sean...I didn't see that you had already answered with exactly the same idea. 😛

     

  • Steven Willis (2/11/2013)


    Sorry Sean...I didn't see that you had already answered with exactly the same idea. 😛

     

    No worries. Great minds...

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • If you have two delimiters you can change the code slightly:

    DECLARE @pString varchar(8000) = '55555;4444,333,22,1';

    DECLARE @pDelimiter1 char(1) = ',';

    DECLARE @pDelimiter2 char(2) = ';';

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT 1 UNION ALL

    SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) in (@pDelimiter1, @pDelimiter2)

    ),

    cteLen(N1,L1) AS(--==== Return start and length (for use in substring)

    SELECT s.N1,

    Case when ISNULL(NULLIF(CHARINDEX(@pDelimiter1,@pString,s.N1),0)-s.N1,8000)

    < ISNULL(NULLIF(CHARINDEX(@pDelimiter2,@pString,s.N1),0)-s.N1,8000)

    then ISNULL(NULLIF(CHARINDEX(@pDelimiter1,@pString,s.N1),0)-s.N1,8000)

    else ISNULL(NULLIF(CHARINDEX(@pDelimiter2,@pString,s.N1),0)-s.N1,8000)

    end

    FROM cteStart s

    )

    select * from cteLen;

    Performance of the second charindex might be similar to using replace on the input string. An alternative would be do join cteStart to itself to derive cteLen. For a single delimiter it is slower than charindex but for multiples it might be better. Certainly for more than 2 delimiters the self join would be simpler code to read.

Viewing 15 posts - 481 through 495 (of 981 total)

You must be logged in to reply to this topic. Login to reply