December 30, 2012 at 2:28 pm
Lynn Pettis (12/30/2012)
John Hardin (12/30/2012)
L' Eomot Inversé (12/30/2012)
Jeff Moden (12/28/2012)
If you look at the execution plan, it doesn't actually use the "same list". CTEs have some wonderful advantages but they also have a hidden fault. Instead of using the same internal result set from a CTE when things like self joins are used, the entire CTE is re-executed.Now, if only one could force the optimiser to spool the CTE and reuse it, CTEs would be far more useful.
Don't become so blinded by CTEs that you forget about temporary tables or table-type variables; another problem with CTEs is they are not indexed.
On page 27 I posted a suggested version of this that avoided CHARINDEX() and used an indexed table-type variable and it showed performance gains over the pure-CTE version, but I don't think anyone has seen fit to comment on that suggested version... 🙁
Page 27 passed on how many posts per page? I ask as I display 50 posts per page. It would help if you posted the url of your post so we could go directly to that post.
D'oh! I didn't consider non-default posts-per-page settings. Sorry. Here's the direct link: http://qa.sqlservercentral.com/Forums/FindPost1129283.aspx
The second code block is what I am referring to.
December 30, 2012 at 5:58 pm
L' Eomot Inversé (12/30/2012)
Why can't the optimiser look and see if spooling would enhance performance (only in the cases where it make no semantic difference, of course) and use it when it does? Or if that's too difficult a task for the optimiser (it shouldn't be, but they haven't done it so I can imagine them claiming it is) why can't we have a query hint that tells them to do it? I hope Paul or someone will jump in and explain all this - maybe tell me I've got it all wrong, but that's OK too, I like learning.
You're right that the optimizer doesn't consider CTE materialization today (they are strictly in-line view definitions, expanded once per reference) and there is no hint for materialization either.
There is a 2006 active Connect suggestion by Adam Machanic for a hint. Though some duplicates of that item have been closed as Won't Fix (= perhaps for the time being) my overall impression from public Microsoft comments is that something along these lines is being actively considered for a future release. We will have to wait and see.
Related: more generalized subtree matching and reuse has been trialled in the product, see this Microsoft Research paper (PDF).
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
December 30, 2012 at 10:23 pm
Tony.l (12/29/2012)
Hiunbelievable - so good. I have been doing these puter things for 30 years now and i am still amazed how little i know or probably at just how clever other people are. Really well explained and totally useful.
keep up the good work.
APL. My initials not the language.
Thanks for the great feedback, Tony. I feel the same way. I've been working with computers in one form or another since 1968 and I'm still learning something new about them everyday.
--Jeff Moden
December 30, 2012 at 10:48 pm
John Hardin (12/30/2012)
On page 27 I posted a suggested version of this that avoided CHARINDEX() and used an indexed table-type variable and it showed performance gains over the pure-CTE version, but I don't think anyone has seen fit to comment on that suggested version... 🙁
You'll have to forgive me a bit on that. If you look at even the most recent pages of this thread, there are still people saying "Try this" or "How about this"? Even you said...
How about this to get rid of the CHARINDEX() string operation:
{snip}
...how does that affect performance?
Two days later (lots of water had flowed under the bridge by then and we're not notified by edits), you added...
edited to add: Actually performance goes down a bit because the subquery in making cteStartEnd is not indexed. This version does exhibit performance gains in my very modest testing:
...and followed that up with...
(Note: I'm doing my version as a SP so the above exact code hasn't actually been tested by me...)
I just didn't (and still don't) have the time to test everyone's suggestion. That's why I spent so much time on building a test harness for everyone to use.
If you think your method has merit in the area of performance, please test it and post the test results. Who knows? You might be on to something.
--Jeff Moden
December 31, 2012 at 10:24 am
SQL Kiwi (12/30/2012)
There is a 2006 active Connect suggestion by Adam Machanic for a hint. Though some duplicates of that item have been closed as Won't Fix (= perhaps for the time being) my overall impression from public Microsoft comments is that something along these lines is being actively considered for a future release. We will have to wait and see.
Well, I've added my vote to that. It will probably make no difference, but that isn't an excuse for not doing so whem I think it's right.
Related: more generalized subtree matching and reuse has been trialled in the product, see this Microsoft Research paper (PDF).
For some reason the link doesn't work. But looking in the obvious place indicated by your link (http://research.microsoft.com/en-us/um/people/jrzhou/publications.htm) finds it and the link from there, although it appears to be identical to your link, works fine. I have no idea what's happening there.
Edit: yes, I see what was happening: your link is relative, not absolute, so it tries to connect to qa.sqlservercentral.com/Forums/research.microsoft.com/en-us/um/people/jrzhou/pub/cse.pdf instead of to the real target. I suspect I've done that sometimes too - probably every time I've typed the link instead of copyying and pasting from the address field of my browser.
Tom
December 31, 2012 at 4:52 pm
L' Eomot Inversé (12/31/2012)
For some reason the link doesn't work. But looking in the obvious place indicated by your link (http://research.microsoft.com/en-us/um/people/jrzhou/publications.htm) finds it and the link from there, although it appears to be identical to your link, works fine. I have no idea what's happening there.Edit: yes, I see what was happening: your link is relative, not absolute, so it tries to connect to qa.sqlservercentral.com/Forums/research.microsoft.com/en-us/um/people/jrzhou/pub/cse.pdf instead of to the real target. I suspect I've done that sometimes too - probably every time I've typed the link instead of copyying and pasting from the address field of my browser.
Thanks Tom, I have corrected the link in the original post.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
January 1, 2013 at 6:24 pm
Nice update to this article Jeff!
I especially liked your mysteriously labelled black line (????). Nice trick with the 0 based Tally table too - will need to remember that one.
I have already replaced the old DelimitedSplit8K in my sandbox and now I must seek out other places where it may have been used and do the same.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
January 2, 2013 at 8:02 am
Interesting stuff, thank you.
Table type parameters for stored procedures have eliminated much of my need for doing this within SQL, but it's always nice to have options.
January 3, 2013 at 7:26 am
Jeff or Paul:
I didn't see the source code for the CLR function. I also didn't read all 473 posts before mine, so maybe there's a link to it somewhere..
Paul, would you mind sharing that with us? I would love to see its guts.
Kurt
January 3, 2013 at 9:42 am
January 4, 2013 at 7:12 am
hi sorry i haven't tested for performance or sure if this has been done but removed the ctelen
IF OBJECT_ID('dbo.DelimitedSplit8K') IS NOT NULL
DROP FUNCTION [dbo].[DelimitedSplit8K]
go
CREATE FUNCTION [dbo].[DelimitedSplit8K]
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover NVARCHAR(4000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
, cteStart(N1,L1) AS (
SELECT 1,CHARINDEX(@pDelimiter,@pString,1) -1
UNION ALL
SELECT t.N+1
, ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,t.N+1),0)-t.N,8001) -1
FROM cteTally t
WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
)
--select * from cteStart
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(@pString, l.N1, l.L1)
FROM cteStart l
January 4, 2013 at 10:30 am
telcogod (1/4/2013)
hi sorry i haven't tested for performance or sure if this has been done but removed the ctelenIF OBJECT_ID('dbo.DelimitedSplit8K') IS NOT NULL
DROP FUNCTION [dbo].[DelimitedSplit8K]
go
CREATE FUNCTION [dbo].[DelimitedSplit8K]
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover NVARCHAR(4000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
, cteStart(N1,L1) AS (
SELECT 1,CHARINDEX(@pDelimiter,@pString,1) -1
UNION ALL
SELECT t.N+1
, ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,t.N+1),0)-t.N,8001) -1
FROM cteTally t
WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
)
--select * from cteStart
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(@pString, l.N1, l.L1)
FROM cteStart l
Pretty good performance compared to DelimitedSplit8K! However, when running the tests using Jeff Moden's test harness the function failed whenever there was only one element and was still not quite as fast. Very close in performance though, so the change you made didn't really make much difference.
RowNum SplitterName NumberOfRows NumberOfElements MinElementLength MaxElementLength Duration MinLength AvgLength MaxLength
1 DelimitedSplit8K 1000 1 1 10 0.12300 1 5 10
2 DelimitedSplit_TELCO 1000 1 1 10 0.07600 1 5 10
3 DelimitedSplit8K 1000 2 1 10 0.15300 3 11 21
4 DelimitedSplit_TELCO 1000 2 1 10 0.18600 3 11 21
5 DelimitedSplit8K 1000 4 1 10 0.26000 9 24 43
6 DelimitedSplit_TELCO 1000 4 1 10 0.15000 9 24 43
7 DelimitedSplit8K 1000 8 1 10 0.33000 21 50 73
8 DelimitedSplit_TELCO 1000 8 1 10 0.13000 21 50 73
9 DelimitedSplit8K 1000 16 1 10 0.34600 64 102 140
10 DelimitedSplit_TELCO 1000 16 1 10 0.46600 64 102 140
11 DelimitedSplit8K 1000 32 1 10 0.33300 158 206 268
12 DelimitedSplit_TELCO 1000 32 1 10 0.41600 158 206 268
13 DelimitedSplit8K 1000 64 1 10 0.53600 349 415 511
14 DelimitedSplit_TELCO 1000 64 1 10 0.48600 349 415 511
15 DelimitedSplit8K 1000 128 1 10 0.87600 701 830 920
16 DelimitedSplit_TELCO 1000 128 1 10 0.96600 701 830 920
17 DelimitedSplit8K 1000 256 1 10 1.59600 1530 1663 1835
18 DelimitedSplit_TELCO 1000 256 1 10 1.96000 1530 1663 1835
19 DelimitedSplit8K 1000 512 1 10 3.09300 3075 3323 3541
20 DelimitedSplit_TELCO 1000 512 1 10 3.32300 3075 3323 3541
21 DelimitedSplit8K 1000 1150 1 10 7.23000 7141 7466 7790
22 DelimitedSplit_TELCO 1000 1150 1 10 8.41600 7141 7466 7790
23 DelimitedSplit8K 1000 1 10 20 0.01000 10 14 20
24 DelimitedSplit_TELCO 1000 1 10 20 0.01000 10 14 20
25 DelimitedSplit8K 1000 2 10 20 0.03600 21 30 41
26 DelimitedSplit_TELCO 1000 2 10 20 0.05300 21 30 41
27 DelimitedSplit8K 1000 4 10 20 0.10000 47 62 82
28 DelimitedSplit_TELCO 1000 4 10 20 0.08000 47 62 82
29 DelimitedSplit8K 1000 8 10 20 0.12000 93 126 155
30 DelimitedSplit_TELCO 1000 8 10 20 0.12600 93 126 155
31 DelimitedSplit8K 1000 16 10 20 0.20000 219 254 292
32 DelimitedSplit_TELCO 1000 16 10 20 0.23600 219 254 292
33 DelimitedSplit8K 1000 32 10 20 0.36000 458 511 559
34 DelimitedSplit_TELCO 1000 32 10 20 0.39300 458 511 559
35 DelimitedSplit8K 1000 64 10 20 0.71000 931 1021 1099
36 DelimitedSplit_TELCO 1000 64 10 20 0.76000 931 1021 1099
37 DelimitedSplit8K 1000 128 10 20 1.45300 1943 2047 2158
38 DelimitedSplit_TELCO 1000 128 10 20 1.48300 1943 2047 2158
39 DelimitedSplit8K 1000 256 10 20 2.92000 3946 4095 4238
40 DelimitedSplit_TELCO 1000 256 10 20 2.98300 3946 4095 4238
41 DelimitedSplit8K 1000 480 10 20 5.37000 7437 7676 7900
42 DelimitedSplit_TELCO 1000 480 10 20 5.66300 7437 7676 7900
43 DelimitedSplit8K 1000 1 20 30 0.05000 20 25 30
44 DelimitedSplit_TELCO 1000 1 20 30 0.01000 20 25 30
45 DelimitedSplit8K 1000 2 20 30 0.02300 41 51 61
46 DelimitedSplit_TELCO 1000 2 20 30 0.05600 41 51 61
47 DelimitedSplit8K 1000 4 20 30 0.07000 84 102 120
48 DelimitedSplit_TELCO 1000 4 20 30 0.10300 84 102 120
49 DelimitedSplit8K 1000 8 20 30 0.13600 179 206 232
50 DelimitedSplit_TELCO 1000 8 20 30 0.16300 179 206 232
51 DelimitedSplit8K 1000 16 20 30 0.26600 372 414 449
52 DelimitedSplit_TELCO 1000 16 20 30 0.32600 372 414 449
53 DelimitedSplit8K 1000 32 20 30 0.52600 768 830 896
54 DelimitedSplit_TELCO 1000 32 20 30 0.57000 768 830 896
55 DelimitedSplit8K 1000 64 20 30 1.04300 1593 1663 1733
56 DelimitedSplit_TELCO 1000 64 20 30 1.09600 1593 1663 1733
57 DelimitedSplit8K 1000 128 20 30 2.09000 3182 3328 3441
58 DelimitedSplit_TELCO 1000 128 20 30 2.20000 3182 3328 3441
59 DelimitedSplit8K 1000 256 20 30 4.24000 6497 6654 6803
60 DelimitedSplit_TELCO 1000 256 20 30 4.30300 6497 6654 6803
61 DelimitedSplit8K 1000 290 20 30 4.75300 7331 7538 7696
62 DelimitedSplit_TELCO 1000 290 20 30 4.88600 7331 7538 7696
63 DelimitedSplit8K 1000 1 30 40 0.01300 30 34 40
64 DelimitedSplit_TELCO 1000 1 30 40 0.01000 30 34 40
65 DelimitedSplit8K 1000 2 30 40 0.02600 61 71 81
66 DelimitedSplit_TELCO 1000 2 30 40 0.03000 61 71 81
67 DelimitedSplit8K 1000 4 30 40 0.09300 126 143 161
68 DelimitedSplit_TELCO 1000 4 30 40 0.12300 126 143 161
69 DelimitedSplit8K 1000 8 30 40 0.18300 263 287 315
70 DelimitedSplit_TELCO 1000 8 30 40 0.23300 263 287 315
71 DelimitedSplit8K 1000 16 30 40 0.35300 529 575 619
72 DelimitedSplit_TELCO 1000 16 30 40 0.38300 529 575 619
73 DelimitedSplit8K 1000 32 30 40 0.69300 1098 1150 1212
74 DelimitedSplit_TELCO 1000 32 30 40 0.73300 1098 1150 1212
75 DelimitedSplit8K 1000 64 30 40 1.38300 2219 2301 2391
76 DelimitedSplit_TELCO 1000 64 30 40 1.43300 2219 2301 2391
77 DelimitedSplit8K 1000 128 30 40 2.87300 4496 4605 4736
78 DelimitedSplit_TELCO 1000 128 30 40 2.83300 4496 4605 4736
79 DelimitedSplit8K 1000 210 30 40 4.57600 7412 7561 7710
80 DelimitedSplit_TELCO 1000 210 30 40 4.67000 7412 7561 7710
81 DelimitedSplit8K 1000 1 40 50 0.01600 40 44 50
82 DelimitedSplit_TELCO 1000 1 40 50 0.01000 40 44 50
83 DelimitedSplit8K 1000 2 40 50 0.04300 81 91 101
84 DelimitedSplit_TELCO 1000 2 40 50 0.03000 81 91 101
85 DelimitedSplit8K 1000 4 40 50 0.14000 167 182 201
86 DelimitedSplit_TELCO 1000 4 40 50 0.14300 167 182 201
87 DelimitedSplit8K 1000 8 40 50 0.22000 342 366 395
88 DelimitedSplit_TELCO 1000 8 40 50 0.25300 342 366 395
89 DelimitedSplit8K 1000 16 40 50 0.43600 693 735 779
90 DelimitedSplit_TELCO 1000 16 40 50 0.49000 693 735 779
91 DelimitedSplit8K 1000 32 40 50 0.86000 1399 1470 1530
92 DelimitedSplit_TELCO 1000 32 40 50 0.91300 1399 1470 1530
93 DelimitedSplit8K 1000 64 40 50 1.71600 2852 2942 3022
94 DelimitedSplit_TELCO 1000 64 40 50 1.81300 2852 2942 3022
95 DelimitedSplit8K 1000 128 40 50 3.56600 5767 5888 6012
96 DelimitedSplit_TELCO 1000 128 40 50 3.59000 5767 5888 6012
97 DelimitedSplit8K 1000 165 40 50 4.44600 7465 7590 7717
98 DelimitedSplit_TELCO 1000 165 40 50 4.55000 7465 7590 7717
January 4, 2013 at 8:43 pm
ah well, I have seen that before where you think removing the extra select will improve performance but it doesn't. Anyway I am working on a look ma no ddl version as well. This is brilliant stuff I always knew had to be a better way than select substring(string,charindex(delimiter,string), substring(string(charindex(delimiter,string,(charindex(delimiter,string))) etc. Thanks for checking I do not have a sandbox at work where is ok to freeproccache.
January 8, 2013 at 3:48 am
SQL Kiwi (12/30/2012)
Related: more generalized subtree matching and reuse has been trialled in the product, see this Microsoft Research paper (PDF).
More generalized matching is exactly what I always wanted, and would hate to see a query hint specific for subexpression reuse. CTE's expanding as they do now is logical as on each site, the "inline view" defined in the CTE can be used differently and thus must be optimised seperately as well. A more generalised optimisation as in the paper will exploit the similarities still found afer that process.
As for the paper, I only quickly scanned it to see what they were up to, and hope their table signatures are based on what is there after expanding contributing code to a query/batch. You want CTE's and non inline views expanded to have a wide as possible base for finding matches. Thus the optimisation should be 100% seperate from the SQL code and no query hint should be wished for!
January 16, 2013 at 9:22 am
The performance of this is amazing, as is the thought behind the code. These kind of posts really make you think and learn a whole new set of features in SQL
Thanks Jeff.
A couple of questions (sorry if it has been addressed - I didn't see it mentioned in the thread):
I would not expect a record back if the string being sent in is NULL, a single space or all spaces. In my case, I have records where the string being split is often blank and I would want no records coming back from the CROSS APPLY. I could filter them out using in the WHERE clause. I fixed this by adding LTRIM and RTRIM to the source string in the function.
Also, some of my "not so good" data also had multiple spaces in between the values causing numerous records returned that were blank.
For example:
dbo.DelimitedSplit8k('123 467 789', ' ') -- There are two spaces between 467 and 789
ItemNumber Item
1 123
2 467
3 <-- Empty String
4 789
Thanks Again,
Anton
Viewing 15 posts - 466 through 480 (of 981 total)
You must be logged in to reply to this topic. Login to reply