Tally OH! An Improved SQL 8K “CSV Splitter” Function

Question

Post reply

Tally OH! An Improved SQL 8K “CSV Splitter” Function

Viewing 15 posts - 481 through 495 (of 981 total)

You must be logged in to reply to this topic. Login to reply

Steven Willis SSCrazy Eights Points: 9893 More actions · Answer 1

aochss (1/16/2013)

I would not expect a record back if the string being sent in is NULL, a single space or all spaces. In my case, I have records where the string being split is often blank and I would want no records coming back from the CROSS APPLY. I could filter them out using in the WHERE clause. I fixed this by adding LTRIM and RTRIM to the source string in the function.

Also, some of my "not so good" data also had multiple spaces in between the values causing numerous records returned that were blank.

For example:

dbo.DelimitedSplit8k('123 467 789', ' ') -- There are two spaces between 467 and 789

ItemNumber Item

1 123

2 467

3 <-- Empty String

4 789

I modified the function to give an option for removing extra spaces after the delimiter (like '123, 345, 567) and then as you did to your version to check for null/blank input as well as null/blank rows in the output. To do that I had to change the function into a multi-statement tvf.

Due to the additional checking for nulls/blanks and removing blank rows from the returned table I figured efficiency would be seriously compromised. So I tested this revision against the original and it seriously under performs. In the results below the first run of DelimitedSplit8k_NULLS sets the parameter "@bRemoveSpacesAfterDelimiter" to 0 and the 2nd run set it to 1.

I can see where nulls/blanks could be an issue, but I think it would be better to filter out such rows AFTER applying any CROSS APPLYs using the original function as opposed to trying to do the filtering within the function (though I didn't specifically test that). Maybe someone else can come up with a better way of getting rid of nulls/blanks without compromising the efficiency of DelimitedSplit8K?

Test Results:



RowNum	SplitterName	NumberOfRows	NumberOfElements	MinElementLength	MaxElementLength	Duration	MinLength	AvgLength	MaxLength

1	DelimitedSplit8K	1000	1	1	10	0.01300	1	5	10

2	DelimitedSplit8K_NULLS	1000	1	1	10	0.29300	1	5	10

3	DelimitedSplit8K_NULLS	1000	1	1	10	0.30000	1	5	10

4	DelimitedSplit8K	1000	2	1	10	0.01600	3	11	21

5	DelimitedSplit8K_NULLS	1000	2	1	10	0.36000	3	11	21

6	DelimitedSplit8K_NULLS	1000	2	1	10	0.24000	3	11	21

7	DelimitedSplit8K	1000	4	1	10	0.03300	9	25	40

8	DelimitedSplit8K_NULLS	1000	4	1	10	0.40300	9	25	40

9	DelimitedSplit8K_NULLS	1000	4	1	10	0.31000	9	25	40

10	DelimitedSplit8K	1000	8	1	10	0.05600	29	50	74

11	DelimitedSplit8K_NULLS	1000	8	1	10	0.51300	29	50	74

12	DelimitedSplit8K_NULLS	1000	8	1	10	0.58600	29	50	74

13	DelimitedSplit8K	1000	16	1	10	0.11000	72	102	137

14	DelimitedSplit8K_NULLS	1000	16	1	10	0.84300	72	102	137

15	DelimitedSplit8K_NULLS	1000	16	1	10	0.69600	72	102	137

16	DelimitedSplit8K	1000	32	1	10	0.24300	148	207	251

17	DelimitedSplit8K_NULLS	1000	32	1	10	1.37600	148	207	251

18	DelimitedSplit8K_NULLS	1000	32	1	10	1.23600	148	207	251

19	DelimitedSplit8K	1000	64	1	10	0.40300	328	415	482

20	DelimitedSplit8K_NULLS	1000	64	1	10	2.48600	328	415	482

21	DelimitedSplit8K_NULLS	1000	64	1	10	2.29000	328	415	482

22	DelimitedSplit8K	1000	128	1	10	0.82600	734	832	942

23	DelimitedSplit8K_NULLS	1000	128	1	10	4.69600	734	832	942

24	DelimitedSplit8K_NULLS	1000	128	1	10	4.46000	734	832	942

25	DelimitedSplit8K	1000	256	1	10	1.62000	1528	1661	1815

26	DelimitedSplit8K_NULLS	1000	256	1	10	8.88000	1528	1661	1815

27	DelimitedSplit8K_NULLS	1000	256	1	10	8.87300	1528	1661	1815

28	DelimitedSplit8K	1000	512	1	10	3.21000	3124	3322	3493

29	DelimitedSplit8K_NULLS	1000	512	1	10	18.02600	3124	3322	3493

30	DelimitedSplit8K_NULLS	1000	512	1	10	17.05000	3124	3322	3493

31	DelimitedSplit8K	1000	1150	1	10	7.44600	7158	7478	7798

32	DelimitedSplit8K_NULLS	1000	1150	1	10	40.46000	7158	7478	7798

33	DelimitedSplit8K_NULLS	1000	1150	1	10	37.25000	7158	7478	7798

34	DelimitedSplit8K	1000	1	10	20	0.01600	10	14	20

35	DelimitedSplit8K_NULLS	1000	1	10	20	0.20600	10	14	20

36	DelimitedSplit8K_NULLS	1000	1	10	20	0.20600	10	14	20

37	DelimitedSplit8K	1000	2	10	20	0.02000	21	30	41

38	DelimitedSplit8K_NULLS	1000	2	10	20	0.30000	21	30	41

39	DelimitedSplit8K_NULLS	1000	2	10	20	0.30600	21	30	41

40	DelimitedSplit8K	1000	4	10	20	0.04600	46	62	81

41	DelimitedSplit8K_NULLS	1000	4	10	20	0.46000	46	62	81

42	DelimitedSplit8K_NULLS	1000	4	10	20	0.46000	46	62	81

43	DelimitedSplit8K	1000	8	10	20	0.09600	102	126	155

44	DelimitedSplit8K_NULLS	1000	8	10	20	0.74000	102	126	155

45	DelimitedSplit8K_NULLS	1000	8	10	20	0.75000	102	126	155

46	DelimitedSplit8K	1000	16	10	20	0.18300	216	254	292

47	DelimitedSplit8K_NULLS	1000	16	10	20	1.32000	216	254	292

48	DelimitedSplit8K_NULLS	1000	16	10	20	1.32300	216	254	292

49	DelimitedSplit8K	1000	32	10	20	0.36000	458	510	586

50	DelimitedSplit8K_NULLS	1000	32	10	20	2.52300	458	510	586

51	DelimitedSplit8K_NULLS	1000	32	10	20	2.49600	458	510	586

52	DelimitedSplit8K	1000	64	10	20	0.74000	945	1022	1097

53	DelimitedSplit8K_NULLS	1000	64	10	20	4.75600	945	1022	1097

54	DelimitedSplit8K_NULLS	1000	64	10	20	4.79000	945	1022	1097

55	DelimitedSplit8K	1000	128	10	20	1.44000	1944	2047	2156

56	DelimitedSplit8K_NULLS	1000	128	10	20	9.38300	1944	2047	2156

57	DelimitedSplit8K_NULLS	1000	128	10	20	9.44600	1944	2047	2156

58	DelimitedSplit8K	1000	256	10	20	2.88300	3903	4096	4254

59	DelimitedSplit8K_NULLS	1000	256	10	20	18.67000	3903	4096	4254

60	DelimitedSplit8K_NULLS	1000	256	10	20	18.86600	3903	4096	4254

61	DelimitedSplit8K	1000	480	10	20	5.45600	7432	7680	7881

62	DelimitedSplit8K_NULLS	1000	480	10	20	34.67600	7432	7680	7881

63	DelimitedSplit8K_NULLS	1000	480	10	20	35.24600	7432	7680	7881

64	DelimitedSplit8K	1000	1	20	30	0.01300	20	24	30

65	DelimitedSplit8K_NULLS	1000	1	20	30	0.24000	20	24	30

66	DelimitedSplit8K_NULLS	1000	1	20	30	0.24300	20	24	30

67	DelimitedSplit8K	1000	2	20	30	0.02300	41	51	61

68	DelimitedSplit8K_NULLS	1000	2	20	30	0.41300	41	51	61

69	DelimitedSplit8K_NULLS	1000	2	20	30	0.40300	41	51	61

70	DelimitedSplit8K	1000	4	20	30	0.07000	86	103	121

71	DelimitedSplit8K_NULLS	1000	4	20	30	0.66000	86	103	121

72	DelimitedSplit8K_NULLS	1000	4	20	30	0.66000	86	103	121

73	DelimitedSplit8K	1000	8	20	30	0.13600	180	206	232

74	DelimitedSplit8K_NULLS	1000	8	20	30	1.15600	180	206	232

75	DelimitedSplit8K_NULLS	1000	8	20	30	1.19300	180	206	232

76	DelimitedSplit8K	1000	16	20	30	0.27000	379	415	456

77	DelimitedSplit8K_NULLS	1000	16	20	30	2.17300	379	415	456

78	DelimitedSplit8K_NULLS	1000	16	20	30	2.21600	379	415	456

79	DelimitedSplit8K	1000	32	20	30	0.52600	774	830	892

80	DelimitedSplit8K_NULLS	1000	32	20	30	4.18600	774	830	892

81	DelimitedSplit8K_NULLS	1000	32	20	30	4.33000	774	830	892

82	DelimitedSplit8K	1000	64	20	30	1.05600	1582	1663	1747

83	DelimitedSplit8K_NULLS	1000	64	20	30	8.30000	1582	1663	1747

84	DelimitedSplit8K_NULLS	1000	64	20	30	8.37300	1582	1663	1747

85	DelimitedSplit8K	1000	128	20	30	2.19300	3208	3325	3442

86	DelimitedSplit8K_NULLS	1000	128	20	30	16.45300	3208	3325	3442

87	DelimitedSplit8K_NULLS	1000	128	20	30	16.68600	3208	3325	3442

88	DelimitedSplit8K	1000	256	20	30	4.22000	6484	6657	6834

89	DelimitedSplit8K_NULLS	1000	256	20	30	32.53000	6484	6657	6834

90	DelimitedSplit8K_NULLS	1000	256	20	30	32.85000	6484	6657	6834

91	DelimitedSplit8K	1000	290	20	30	4.76000	7369	7539	7712

92	DelimitedSplit8K_NULLS	1000	290	20	30	37.29000	7369	7539	7712

93	DelimitedSplit8K_NULLS	1000	290	20	30	38.92000	7369	7539	7712

94	DelimitedSplit8K	1000	1	30	40	0.01300	30	34	40

95	DelimitedSplit8K_NULLS	1000	1	30	40	0.27300	30	34	40

96	DelimitedSplit8K_NULLS	1000	1	30	40	0.28000	30	34	40

97	DelimitedSplit8K	1000	2	30	40	0.02600	61	70	81

98	DelimitedSplit8K_NULLS	1000	2	30	40	0.50300	61	70	81

99	DelimitedSplit8K_NULLS	1000	2	30	40	0.51000	61	70	81

100	DelimitedSplit8K	1000	4	30	40	0.09300	124	142	158

101	DelimitedSplit8K_NULLS	1000	4	30	40	0.89300	124	142	158

102	DelimitedSplit8K_NULLS	1000	4	30	40	0.93600	124	142	158

103	DelimitedSplit8K	1000	8	30	40	0.18600	262	286	314

104	DelimitedSplit8K_NULLS	1000	8	30	40	1.66300	262	286	314

105	DelimitedSplit8K_NULLS	1000	8	30	40	1.67600	262	286	314

106	DelimitedSplit8K	1000	16	30	40	0.35000	529	574	613

107	DelimitedSplit8K_NULLS	1000	16	30	40	3.19300	529	574	613

108	DelimitedSplit8K_NULLS	1000	16	30	40	3.25000	529	574	613

109	DelimitedSplit8K	1000	32	30	40	0.71300	1097	1150	1222

110	DelimitedSplit8K_NULLS	1000	32	30	40	7.12600	1097	1150	1222

111	DelimitedSplit8K_NULLS	1000	32	30	40	6.80300	1097	1150	1222

112	DelimitedSplit8K	1000	64	30	40	1.51600	2225	2302	2376

113	DelimitedSplit8K_NULLS	1000	64	30	40	12.73300	2225	2302	2376

114	DelimitedSplit8K_NULLS	1000	64	30	40	12.53000	2225	2302	2376

115	DelimitedSplit8K	1000	128	30	40	2.85600	4487	4604	4706

116	DelimitedSplit8K_NULLS	1000	128	30	40	24.68000	4487	4604	4706

117	DelimitedSplit8K_NULLS	1000	128	30	40	24.94300	4487	4604	4706

118	DelimitedSplit8K	1000	210	30	40	4.99600	7378	7558	7705

119	DelimitedSplit8K_NULLS	1000	210	30	40	41.28300	7378	7558	7705

120	DelimitedSplit8K_NULLS	1000	210	30	40	41.99300	7378	7558	7705

121	DelimitedSplit8K	1000	1	40	50	0.01300	40	44	50

122	DelimitedSplit8K_NULLS	1000	1	40	50	0.33300	40	44	50

123	DelimitedSplit8K_NULLS	1000	1	40	50	0.34000	40	44	50

124	DelimitedSplit8K	1000	2	40	50	0.03000	81	91	101

125	DelimitedSplit8K_NULLS	1000	2	40	50	0.65600	81	91	101

126	DelimitedSplit8K_NULLS	1000	2	40	50	0.68000	81	91	101

127	DelimitedSplit8K	1000	4	40	50	0.11600	166	183	199

128	DelimitedSplit8K_NULLS	1000	4	40	50	1.19600	166	183	199

129	DelimitedSplit8K_NULLS	1000	4	40	50	1.22600	166	183	199

130	DelimitedSplit8K	1000	8	40	50	0.22300	341	367	392

131	DelimitedSplit8K_NULLS	1000	8	40	50	2.44000	341	367	392

132	DelimitedSplit8K_NULLS	1000	8	40	50	2.31300	341	367	392

133	DelimitedSplit8K	1000	16	40	50	0.47600	700	735	777

134	DelimitedSplit8K_NULLS	1000	16	40	50	4.64300	700	735	777

135	DelimitedSplit8K_NULLS	1000	16	40	50	4.69600	700	735	777

136	DelimitedSplit8K	1000	32	40	50	0.91000	1401	1471	1531

137	DelimitedSplit8K_NULLS	1000	32	40	50	9.08000	1401	1471	1531

138	DelimitedSplit8K_NULLS	1000	32	40	50	9.06600	1401	1471	1531

139	DelimitedSplit8K	1000	64	40	50	1.77600	2860	2945	3026

140	DelimitedSplit8K_NULLS	1000	64	40	50	17.82300	2860	2945	3026

141	DelimitedSplit8K_NULLS	1000	64	40	50	17.84600	2860	2945	3026

142	DelimitedSplit8K	1000	128	40	50	3.54300	5785	5887	5997

143	DelimitedSplit8K_NULLS	1000	128	40	50	35.12000	5785	5887	5997

144	DelimitedSplit8K_NULLS	1000	128	40	50	36.19600	5785	5887	5997

145	DelimitedSplit8K	1000	165	40	50	4.55600	7473	7589	7743

146	DelimitedSplit8K_NULLS	1000	165	40	50	45.69600	7473	7589	7743

147	DelimitedSplit8K_NULLS	1000	165	40	50	45.45300	7473	7589	7743

Here's the revised function I tested. I won't claim that it is the best variation--just a quick-and-dirty trial run for comparison purposes.



/*

--Some test strings for manual testing

SELECT * FROM dbo.DelimitedSplit8K_NULLS(' ',' ',0)

SELECT * FROM dbo.DelimitedSplit8K_NULLS('',' ',0)

SELECT * FROM dbo.DelimitedSplit8K_NULLS(NULL,' ',0)

SELECT * FROM dbo.DelimitedSplit8K_NULLS(' ',',',0)

SELECT * FROM dbo.DelimitedSplit8K_NULLS('',',',0)

SELECT * FROM dbo.DelimitedSplit8K_NULLS(NULL,',',0)

SELECT * FROM dbo.DelimitedSplit8K_NULLS('123 467  789',' ',0)

SELECT * FROM dbo.DelimitedSplit8K_NULLS('123 467  789',' ',1)

SELECT * FROM dbo.DelimitedSplit8K_NULLS('123 467 789',' ',1)

SELECT * FROM dbo.DelimitedSplit8K_NULLS('123 467 789',' ',0)

SELECT * FROM dbo.DelimitedSplit8K_NULLS('123, 467, 789',',',0)

SELECT * FROM dbo.DelimitedSplit8K_NULLS('123, 467, 789',',',1)

SELECT * FROM dbo.DelimitedSplit8K_NULLS('123,467,789',',',0)

SELECT * FROM dbo.DelimitedSplit8K_NULLS('123,467,789',',',1) 

*/

CREATE FUNCTION [dbo].[DelimitedSplit8K_NULLS]

    (

     @pString VARCHAR(8000)

    ,@pDelimiter CHAR(1)

    ,@bRemoveSpaceAfterDelimiter BIT

    )

RETURNS @Split TABLE

    (

     [ItemNumber] INT

    ,[Item] VARCHAR(50)

    )

WITH SCHEMABINDING

AS

BEGIN

    SET @pString = NULLIF(RTRIM(LTRIM(@pString)),'')

    IF @bRemoveSpaceAfterDelimiter = 1

        SET @pString = REPLACE(@pString,@pDelimiter+' ',@pDelimiter)

    IF @pString IS NULL

        RETURN

    ELSE

        WITH  E1(N)

          AS (

              SELECT

                1

              UNION ALL

              SELECT

                1

              UNION ALL

              SELECT

                1

              UNION ALL

              SELECT

                1

              UNION ALL

              SELECT

                1

              UNION ALL

              SELECT

                1

              UNION ALL

              SELECT

                1

              UNION ALL

              SELECT

                1

              UNION ALL

              SELECT

                1

              UNION ALL

              SELECT

                1

             ),                --10E+1 or 10 rows

        E2(N)

          AS (

              SELECT

                1

              FROM

                E1 a

               ,E1 b

             ),                --10E+2 or 100 rows

        E4(N)

          AS (

              SELECT

                1

              FROM

                E2 a

               ,E2 b

             ),                --10E+4 or 10,000 rows max

        cteTally(N)

          AS (

             SELECT TOP (ISNULL(DATALENGTH(@pString),0))

                ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

             FROM E4

             ),

        cteStart(N1)

          AS (

              SELECT

                    1

              UNION ALL

              SELECT

                t.N + 1

              FROM

                cteTally t

              WHERE

                SUBSTRING(@pString,t.N,1) = @pDelimiter

             ),

        cteLen(N1,L1)

          AS (

             SELECT

                s.N1

               ,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0) - s.N1,8000)

             FROM

                cteStart s

             )

    INSERT INTO @Split

    SELECT

        ItemNumber = ROW_NUMBER() OVER (ORDER BY l.N1)

       ,Item = SUBSTRING(@pString,l.N1,l.L1)

     FROM

        cteLen l

     WHERE

        NULLIF(RTRIM(LTRIM(SUBSTRING(@pString,l.N1,l.L1))),'') IS NOT NULL

    RETURN

END

aochss SSCommitted Points: 1677 More actions · Answer 2

Due to the additional checking for nulls/blanks and removing blank rows from the returned table I figured efficiency would be seriously compromised. So I tested this revision against the original and it seriously under performs. In the results below the first run of DelimitedSplit8k_NULLS sets the parameter "@bRemoveSpacesAfterDelimiter" to 0 and the 2nd run set it to 1.

Steve,

I came up with the same results after doing almost exactly the same thing.

In fact, I found that even after taking out the cleaning of the duplicate characters and check for null, the function with the return table declaration is a lot slower than the original single statement function. I wonder if SQL Server is using a different memory management scheme.

I guess having the WHERE clause in the CROSS APPLY do the work gives the next person looking at my code an idea of what I am trying to do.

Either way it has been a fun afternoon of testing the assumptions and changes. This is a perfect example of why I still love programming after over 20 years of doing this stuff. Going through Jeff's code line by line was quite a revelation in how to use CTE, NULLIF and "Tally" tables in ways I never thought of.

Anton

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 3

Once you switch from an iTVF to a multi-statement TVF you are dead meat and the Fat Lady is belting a tune. You basically get a cursor under the covers because the mTVF cannot be truly in-lined with the query/queryplan. The optimization process is shot to hell too (although that won't come into play here). Voids parallelism also IIRC.

Try doing a statement-level completion profiler trace while running each type of code. CAUTION: beware statement completed profiling on production systems!

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Bruce W Cassidy SSChampion Points: 12668 More actions · Answer 4

TheSQLGuru (1/16/2013)

Once you switch from an iTVF to a multi-statement TVF you are dead meat and the Fat Lady is belting a tune. You basically get a cursor under the covers

What was startling for me was not that functions acted like an implicit cursor (I'd always known that) but rather that you could create inline functions that don't!

Usman Butt Hall of Fame Points: 3166 More actions · Answer 5

aochss (1/16/2013)

The performance of this is amazing, as is the thought behind the code. These kind of posts really make you think and learn a whole new set of features in SQL

Thanks Jeff.

A couple of questions (sorry if it has been addressed - I didn't see it mentioned in the thread):

I would not expect a record back if the string being sent in is NULL, a single space or all spaces. In my case, I have records where the string being split is often blank and I would want no records coming back from the CROSS APPLY. I could filter them out using in the WHERE clause.

The best indeed would be to filter them out in the WHERE clause with something like WHERE LEN(StringColumn) > 0. Hence, the function calls would be for only those rows which are neither NULL nor containing only space/s.

But this can also be done in the function itself as I have tried to do it.

Also, some of my "not so good" data also had multiple spaces in between the values causing numerous records returned that were blank.

For example:

dbo.DelimitedSplit8k('123 467 789', ' ') -- There are two spaces between 467 and 789

ItemNumber Item

1 123

2 467

3 <-- Empty String

4 789

Thanks Again,

Anton

Here is my attempt while keeping the function iTVF (I did not test it thoroughly). Only slight changes are done

CREATE FUNCTION [dbo].[DelimitedSplit8K_With_NULL_ZeroLengthString_And_ConsecutiveDelimiter_Filter]



--===== Define I/O parameters

        (@pString VARCHAR(8000), @pDelimiter VARCHAR(1))

--WARNING!!! DO NOT USE MAX DATA-TYPES HERE!  IT WILL KILL PERFORMANCE!

RETURNS TABLE WITH SCHEMABINDING AS

 RETURN

--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...

     -- enough to cover VARCHAR(8000)

  WITH E1(N) AS (

                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

                ),                          --10E+1 or 10 rows

       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

 cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

                     -- for both a performance gain and prevention of accidental "overruns"

                 SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

                 /*========= USE LEN() FUNCTION TO CHECK WHETHER STRING IS NULL OR CONTAIN ONLY SPACES =======*/

                 WHERE LEN(@pString) > 0 --FILTER OUT NON_REQUIRED

                 /*===========================================================================================*/

                ),

cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

                 SELECT 1

                 /*========= USE LEN() FUNCTION TO CHECK WHETHER STRING IS NULL OR CONTAIN ONLY SPACES =======*/

                 WHERE LEN(@pString) > 0 --FILTER OUT NON_REQUIRED

                 /*===========================================================================================*/

                 UNION ALL

                 SELECT t.N+1 FROM cteTally t

                 WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter

                 /*====== --FILTER OUT TWO CONSECUTIVE OCCURENCES OF DELIMITERS ============*/

                 AND SUBSTRING(@pString,t.N+1,1) <> @pDelimiter

                 /*=========================================================================*/

                ),

cteLen(N1,L1) AS(--==== Return start and length (for use in substring)

                 SELECT s.N1,

                        ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)

                   FROM cteStart s

                )

--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),

        Item       = RTRIM(LTRIM(SUBSTRING(@pString, l.N1, l.L1)))--USE LTRIM AND RTRIM FUNCTION TO OMIT TRAILING SPACES

   FROM cteLen l

;

Usman Butt Hall of Fame Points: 3166 More actions · Answer 6

TheSQLGuru (1/16/2013)

Once you switch from an iTVF to a multi-statement TVF you are dead meat and the Fat Lady is belting a tune.

Not always true 🙂 It really depends upon several things. For instance, if we have to deal with VARCHAR(max) strings and multiple rows, the mTVF version of the tally table splitter can beat the iTVF version quite comprehensively. Sometimes, mTVF versions can show their worth too 😉

I am not trying to be infamous but sometimes RBAR approach could be handy as well :-D. I hope my Anti-RBAR membership is not cancelled for issuing this statement :hehe:

peter-757102 SSCertifiable Points: 6877 More actions · Answer 7

Usman Butt (1/17/2013)

TheSQLGuru (1/16/2013)

Once you switch from an iTVF to a multi-statement TVF you are dead meat and the Fat Lady is belting a tune.

Not always true 🙂 It really depends upon several things. For instance, if we have to deal with VARCHAR(max) strings and multiple rows, the mTVF version of the tally table splitter can beat the iTVF version quite comprehensively. Sometimes, mTVF versions can show their worth too 😉

I am not trying to be infamous but sometimes RBAR approach could be handy as well :-D. I hope my Anti-RBAR membership is not cancelled for issuing this statement :hehe:

Multi-line TVF work well when they are called/invoked sparsly or only once as say the logical starting table in your from clause, before other joins take place. They can also work quite well as part of an additional condition in a join. If it is not the only condition and applied using OR logic, it only needs to be invoked when other simpler conditions do not match.

mTVF also allow some control over the indexing of the result table (trough PK and unique constraints). This can speed up certain operations in complex processing as the required indexing on the datamodel might not be there and/or too costly to implement for all records.

What I generaly do is write first as an iTVF and if for some reason it turns out unexpectedly slow or impractical, make a mTVF from it as well with proper indexing to speed up the propblem operation.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 8

I have no problem whatsoever in using the right tool for the job!! In fact, I make a fair bit of my consulting dollars cleaning up messes from clients who do NOT do that! 😎 Sometimes being "dead meat" IS the better alternative. :w00t:

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

ahpitre Grasshopper Points: 23 More actions · Answer 9

ahpitre

Grasshopper

Points: 23

February 11, 2013 at 12:55 pm

#1586723

Hello,

Greatt article. I used the function and works great. I have a situation where I need to search for more than 1 character, then split the data. How could I use this function to search for 2 characters. For example, I want to search for semicolon, or, comma (;,), then, split the column if either is found.

ahpitre Grasshopper Points: 23 More actions · Answer 10

ahpitre

Grasshopper

Points: 23

February 11, 2013 at 12:59 pm

#1586724

How can I use this function to search for more than 1 character? I have data that is delimited by either a comma or semicolon (,;). Need to split contents based on either character.

Sean Lange SSC Guru Points: 286562 More actions · Answer 11

ahpitre (2/11/2013)

How can I use this function to search for more than 1 character? I have data that is delimited by either a comma or semicolon (,;). Need to split contents based on either character.

Easiest way I can think of would be to simply use replace.



select *

from YourTable

cross apply dbo.DelimitedSplit8k(replace(YourColumn, ';', ','), ',')

This way you are still splitting on commas but it will work for either character. 😉

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Steven Willis SSCrazy Eights Points: 9893 More actions · Answer 12

ahpitre (2/11/2013)

How can I use this function to search for more than 1 character? I have data that is delimited by either a comma or semicolon (,;). Need to split contents based on either character.

Just throw in a replace to change the second character into the desired delimiter. I can't guess what effect this would have on performance of the function.

If you are talking about a two-dimensional array like 'A,B;X,Y;D,E' and you want to split that into two columns, then that's a different problem.



DECLARE @strExample VARCHAR(8000)

SET @strExample = 'A,B;C;D,E,F,G,H'

SELECT

    ItemNumber

   ,Item

FROM

    dbo.DelimitedSplit8K(REPLACE(@strExample,';',','),',') AS dsk

Steven Willis SSCrazy Eights Points: 9893 More actions · Answer 13

Steven Willis

SSCrazy Eights

Points: 9893

February 11, 2013 at 3:28 pm

#1586761

Sorry Sean...I didn't see that you had already answered with exactly the same idea. 😛

Sean Lange SSC Guru Points: 286562 More actions · Answer 14

Steven Willis (2/11/2013)

Sorry Sean...I didn't see that you had already answered with exactly the same idea. 😛

No worries. Great minds...

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

m.t.cleary SSC Veteran Points: 200 More actions · Answer 15

If you have two delimiters you can change the code slightly:

DECLARE @pString varchar(8000) = '55555;4444,333,22,1';



DECLARE @pDelimiter1 char(1) = ',';

DECLARE @pDelimiter2 char(2) = ';';

  WITH E1(N) AS (

                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

                ),                          --10E+1 or 10 rows

       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

 cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

                     -- for both a performance gain and prevention of accidental "overruns"

                 SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

                ),

cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

                 SELECT 1 UNION ALL

                 SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) in (@pDelimiter1, @pDelimiter2)

                ),

cteLen(N1,L1) AS(--==== Return start and length (for use in substring)

                 SELECT s.N1,

                        Case when ISNULL(NULLIF(CHARINDEX(@pDelimiter1,@pString,s.N1),0)-s.N1,8000)

									 < ISNULL(NULLIF(CHARINDEX(@pDelimiter2,@pString,s.N1),0)-s.N1,8000)

							 then ISNULL(NULLIF(CHARINDEX(@pDelimiter1,@pString,s.N1),0)-s.N1,8000)

							 else ISNULL(NULLIF(CHARINDEX(@pDelimiter2,@pString,s.N1),0)-s.N1,8000)

							 end

                   FROM cteStart s

                )

select * from cteLen;

Performance of the second charindex might be similar to using replace on the input string. An alternative would be do join cteStart to itself to derive cteLen. For a single delimiter it is slower than charindex but for multiples it might be better. Certainly for more than 2 delimiters the self join would be simpler code to read.