Help on Performance issue on Recursive CTE

Question

Post reply

Help on Performance issue on Recursive CTE

Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

mister.magoo SSC-Forever Points: 47068 More actions · Answer 1

Have you got a select that produces more rows? I hadn't spotted that one was only 25 rows, which is far too small a sample.

We need one that produces about 25,000 rows judging by previous runs...

And measure the performance for the select into variables (or bitbuckets) and for the insert separately.

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

haiao2000 Hall of Fame Points: 3734 More actions · Answer 2

MM

Here it is

SQL Server parse and compile time:



   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 0 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 1 ms.

 SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 0 ms.

 SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 0 ms.

 SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 0 ms.

(25491 row(s) affected)

(1 row(s) affected)

 SQL Server Execution Times:

   CPU time = 188 ms,  elapsed time = 427 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 0 ms.

mister.magoo (10/11/2012)

Have you got a select that produces more rows? I hadn't spotted that one was only 25 rows, which is far too small a sample.

We need one that produces about 25,000 rows judging by previous runs...

And measure the performance for the select into variables (or bitbuckets) and for the insert separately.

mister.magoo SSC-Forever Points: 47068 More actions · Answer 3

Really baffling why the inserts are so slow during the run...

Can you try switching the index on #results to a clustered one?

CREATE CLUSTERED INDEX #IXS ON #RESULTS(LEVEL);

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

haiao2000 Hall of Fame Points: 3734 More actions · Answer 4

MM,

I switched to clustered Index it runs about the same amount of time. Could you think of something else I can try to see if it makes any different? If you want to IM me quicker, I am on yahoo messager: haiao2000@yahoo.com

Thanks for helping out!

mister.magoo (10/11/2012)

Really baffling why the inserts are so slow during the run...

Can you try switching the index on #results to a clustered one?

CREATE CLUSTERED INDEX #IXS ON #RESULTS(LEVEL);

haiao2000 Hall of Fame Points: 3734 More actions · Answer 5

Maybe this will give another angle to look at. There are 2 ways I can benefit from this query if I can get it runs fast enough.

1) get all the tree nodes (which we have been trying, so far it is not fast enough)

2) just end as soon as at least one of the nodes has a modification status of 1 (IsDirty=1)? Which is the best way to handle this?



CREATE TABLE #Result(ID INT NOT NULL, ParentID UNIQUEIDENTIFIER NOT NULL, ChildID UNIQUEIDENTIFIER, IsDirty Bit, LEVEL INT IDENTITY(1,1) NOT NULL);

CREATE INDEX #IXS1 ON #Result(LEVEL) INCLUDE(ChildID);

CREATE INDEX #IXS2 ON #Result(IsDirty); 

SET IDENTITY_INSERT #Result ON;

--pop the top level node to the temp table

INSERT #Result(ID,ParentID,ChildID,IsDirty,LEVEL)

SELECT ID,ParentID,ChildID,IsDirty,1 FROM MyTable WHERE ParentID = @ObjectID;

--recursive until no more children

WHILE @@ROWCOUNT > 0 BEGIN

	INSERT #Result(ID,ParentID,ChildID,IsDirty, LEVEL)

	SELECT S.ID, S.ParentID,S.ChildID,S.IsDirty, SCOPE_IDENTITY()+1

	FROM #Result AS R

	INNER JOIN MyTable  S ON S.ParentID = R.ChildID

	WHERE LEVEL = SCOPE_IDENTITY(); 

	--for item #2 above, I am thinking to add this, but I am sure it is not the best idea. anythought?

             --the problem with this is when non of the nodes with IsDirty=1 then it takes at minimum, the same amount of time to traverse all nodes

	IF EXISTS(SELECT ChildID FROM #Result WHERE LEVEL=SCOPE_IDENTITY() AND IsDirty=1)

	    BREAK;

END

SET IDENTITY_INSERT #Result OFF;

SELECT * FROM #Result

mister.magoo SSC-Forever Points: 47068 More actions · Answer 6

haiao2000 (10/11/2012)

Maybe this will give another angle to look at. There are 2 ways I can benefit from this query if I can get it runs fast enough.

1) get all the tree nodes (which we have been trying, so far it is not fast enough)

2) just end as soon as at least one of the nodes has a modification status of 1 (IsDirty=1)? Which is the best way to handle this?

CREATE TABLE #Result(ID INT NOT NULL, ParentID UNIQUEIDENTIFIER NOT NULL, ChildID UNIQUEIDENTIFIER, IsDirty Bit, LEVEL INT IDENTITY(1,1) NOT NULL); CREATE INDEX #IXS1 ON #Result(LEVEL) INCLUDE(ChildID); CREATE INDEX #IXS2 ON #Result(IsDirty); SET IDENTITY_INSERT #Result ON; --pop the top level node to the temp table INSERT #Result(ID,ParentID,ChildID,IsDirty,LEVEL) SELECT ID,ParentID,ChildID,IsDirty,1 FROM MyTable WHERE ParentID = @ObjectID; --recursive until no more children WHILE @@ROWCOUNT > 0 BEGIN INSERT #Result(ID,ParentID,ChildID,IsDirty, LEVEL) SELECT S.ID, S.ParentID,S.ChildID,S.IsDirty, SCOPE_IDENTITY()+1 FROM #Result AS R INNER JOIN MyTable S ON S.ParentID = R.ChildID WHERE LEVEL = SCOPE_IDENTITY(); --for item #2 above, I am thinking to add this, but I am sure it is not the best idea. anythought? --the problem with this is when non of the nodes with IsDirty=1 then it takes at minimum, the same amount of time to traverse all nodes IF EXISTS(SELECT ChildID FROM #Result WHERE LEVEL=SCOPE_IDENTITY() AND IsDirty=1) BREAK; END SET IDENTITY_INSERT #Result OFF; SELECT * FROM #Result

Maybe...if you do try that, do it like this, in the WHILE...and add an index on ISDIRTY in #RESULTS



WHILE @@ROWCOUNT>0 AND NOT EXISTS(SELECT 1 FROM #RESULTS WHERE ISDIRTY=1)

	INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL,ISDIRTY)

	SELECT S.ID,S.PARENTID,S.CHILDID,SCOPE_IDENTITY()+1,S.ISDIRTY

	FROM #RESULTS AS R

	JOIN #SOURCE AS S

	ON S.PARENTID = R.CHILDID

	WHERE LEVEL = SCOPE_IDENTITY();

What is bugging me is why you are seeing such long delays in inserting into #results - that is really odd.

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

haiao2000 Hall of Fame Points: 3734 More actions · Answer 7

mister.magoo (10/11/2012)

haiao2000 (10/11/2012)

Maybe this will give another angle to look at. There are 2 ways I can benefit from this query if I can get it runs fast enough.

1) get all the tree nodes (which we have been trying, so far it is not fast enough)

2) just end as soon as at least one of the nodes has a modification status of 1 (IsDirty=1)? Which is the best way to handle this?

CREATE TABLE #Result(ID INT NOT NULL, ParentID UNIQUEIDENTIFIER NOT NULL, ChildID UNIQUEIDENTIFIER, IsDirty Bit, LEVEL INT IDENTITY(1,1) NOT NULL); CREATE INDEX #IXS1 ON #Result(LEVEL) INCLUDE(ChildID); CREATE INDEX #IXS2 ON #Result(IsDirty); SET IDENTITY_INSERT #Result ON; --pop the top level node to the temp table INSERT #Result(ID,ParentID,ChildID,IsDirty,LEVEL) SELECT ID,ParentID,ChildID,IsDirty,1 FROM MyTable WHERE ParentID = @ObjectID; --recursive until no more children WHILE @@ROWCOUNT > 0 BEGIN INSERT #Result(ID,ParentID,ChildID,IsDirty, LEVEL) SELECT S.ID, S.ParentID,S.ChildID,S.IsDirty, SCOPE_IDENTITY()+1 FROM #Result AS R INNER JOIN MyTable S ON S.ParentID = R.ChildID WHERE LEVEL = SCOPE_IDENTITY(); --for item #2 above, I am thinking to add this, but I am sure it is not the best idea. anythought? --the problem with this is when non of the nodes with IsDirty=1 then it takes at minimum, the same amount of time to traverse all nodes IF EXISTS(SELECT ChildID FROM #Result WHERE LEVEL=SCOPE_IDENTITY() AND IsDirty=1) BREAK; END SET IDENTITY_INSERT #Result OFF; SELECT * FROM #Result

Maybe...if you do try that, do it like this, in the WHILE...and add an index on ISDIRTY in #RESULTS

WHILE @@ROWCOUNT>0 AND NOT EXISTS(SELECT 1 FROM #RESULTS WHERE ISDIRTY=1) INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL,ISDIRTY) SELECT S.ID,S.PARENTID,S.CHILDID,SCOPE_IDENTITY()+1,S.ISDIRTY FROM #RESULTS AS R JOIN #SOURCE AS S ON S.PARENTID = R.CHILDID WHERE LEVEL = SCOPE_IDENTITY();

What is bugging me is why you are seeing such long delays in inserting into #results - that is really odd.

Great that works for (2). I am not sure why there is such delay either , but I thought it is reasonable amount of time since it inserted roughly 35k records also it needs to insert into index tables. the biggest delay is on the index seek for source table?

Any other mehod beside using temp table? I understand that we want to take advance of indexing on temp table

haiao2000 Hall of Fame Points: 3734 More actions · Answer 8

haiao2000

Hall of Fame

Points: 3734

October 11, 2012 at 6:48 pm

#1548402

I upload new execution plan . if someone can help identify one more break thru like MM did would be a home run for me.

Thanks in advance.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 9

haiao2000 (10/11/2012)

I upload new execution plan . if someone can help identify one more break thru like MM did would be a home run for me.

Thanks in advance.

Is this the same data as on your other post???

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

haiao2000 Hall of Fame Points: 3734 More actions · Answer 10

Jeff Moden (10/11/2012)

haiao2000 (10/11/2012)

I upload new execution plan . if someone can help identify one more break thru like MM did would be a home run for me.

Thanks in advance.

Is this the same data as on your other post???

Jeff,

Yes it is the same, it just sometime I ran agains small dataset vs. a larger one. the last execution plan was ran against the large dataset. This one has 4 levels. With 34,689 records return in around 49 seconds. The table has 10,831,321 records

mister.magoo SSC-Forever Points: 47068 More actions · Answer 11

Hi, I am going to post a test script so we can be sure we are comparing apples with apples...

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

mister.magoo SSC-Forever Points: 47068 More actions · Answer 12

Here is my sample data creation: 10,000,000 + rows in tempdb - you may want to put it somewhere else...it's about 1GB of data.

Oh, and it relies on a TALLY table called TALLY with a column called N.

See here for one if you need it : http://qa.sqlservercentral.com/scripts/Advanced+SQL/62486/

-- Include this if you want to rebuild the source table later - takes about 2 to 3 minutes



-- DROP TABLE #SOURCE;

IF OBJECT_ID('TEMPDB..#SOURCE') IS NULL

BEGIN

	CREATE TABLE #SOURCE(ID INT IDENTITY(1,1) NOT NULL, PARENTID UNIQUEIDENTIFIER NOT NULL, CHILDID UNIQUEIDENTIFIER, ISDIRTY CHAR(1) DEFAULT('N'));

	CREATE UNIQUE CLUSTERED INDEX #IX_CLUST ON #SOURCE (PARENTID,CHILDID);

	-- CREATE LEVEL 1 <- 100 ROWS

	INSERT #SOURCE(PARENTID,CHILDID)

	SELECT TOP 100 NEWID(),NEWID()

	FROM TALLY

	-- CREATE LEVEL 2 <- ~ 10,000 ROWS

	INSERT #SOURCE(PARENTID,CHILDID)

	SELECT CHILDID,NEWID()

	FROM #SOURCE,TALLY

	WHERE N<100

	-- CREATE LEVEL 3 <- ~ 100,000 ROWS

	INSERT #SOURCE(PARENTID,CHILDID)

	SELECT CHILDID,NEWID()

	FROM #SOURCE,TALLY

	WHERE N<100

	AND ID >100

	-- CREATE LEVEL 4 <- ~ 10,000,000 ROWS

	INSERT #SOURCE(PARENTID,CHILDID,ISDIRTY)

	SELECT CHILDID,NEWID(),'Y'

	FROM #SOURCE,TALLY

	WHERE N<10

	AND ID >10100

	-- Add an index just to make the selection of a starting point quicker

	CREATE INDEX #IX_ID ON #SOURCE (ID);

END

And the results table :

-- Prepare a temp table to hold the result set



IF OBJECT_ID('TEMPDB..#RESULTS') IS NOT NULL

	DROP TABLE #RESULTS;

CREATE TABLE #RESULTS(ID INT NOT NULL, PARENTID UNIQUEIDENTIFIER NOT NULL, CHILDID UNIQUEIDENTIFIER, LEVEL INT IDENTITY(1,1) NOT NULL,ISDIRTY CHAR(1)) ;

-- Add an index to aid with recursion

CREATE INDEX #IXS ON #RESULTS(LEVEL) INCLUDE(CHILDID);

-- Add an index to help with the decision of when to stop - when ISDIRTY="Y"

CREATE INDEX #IXDIRTY ON #RESULTS(ISDIRTY);

-- We need this to be able to insert the LEVEL value

SET IDENTITY_INSERT #RESULTS ON;

Now add the first level to the results table:



-- Get the starting point, arbritrarily chosen as row with ID=1 - they all have the same number of children, so it doesn't matter.

DECLARE @START UNIQUEIDENTIFIER;

SELECT TOP 1 @START = PARENTID FROM #SOURCE ORDER BY ID OPTION(FAST 1);

-- INSERT the first row in our hierarchy - level 1

INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL,ISDIRTY)

SELECT ID,PARENTID,CHILDID,1,ISDIRTY

FROM #SOURCE

WHERE PARENTID=@START;

And process the rest...

-- Do a dirty old loop (only iterates once per level in the hierarchy) while we haven't hit out stop condition and there were rows found



WHILE @@ROWCOUNT>0 AND NOT EXISTS(SELECT 1 FROM #RESULTS WHERE ISDIRTY='Y')

	-- pick up the last inserted set and go find their children to insert to the table with an incremented LEVEL

	INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL,ISDIRTY)

	SELECT S.ID,S.PARENTID,S.CHILDID,SCOPE_IDENTITY()+1,S.ISDIRTY

	FROM #RESULTS AS R

	JOIN #SOURCE AS S

	ON S.PARENTID = R.CHILDID

	WHERE LEVEL = SCOPE_IDENTITY()

	OPTION( RECOMPILE );

-- the WHILE loop only repeats the INSERT, nothing else

SET IDENTITY_INSERT #RESULTS OFF;

With this data, you will see the highest wait times at level 3 and 4, but if you stop the query at level 3 by adding

AND SCOPE_IDENTITY()<3

to the WHILE condition, then run this - the level 4 SELECT:



	SELECT S.ID,S.PARENTID,S.CHILDID,S.ISDIRTY

	FROM #RESULTS AS R

	JOIN #SOURCE AS S

	ON S.PARENTID = R.CHILDID

	WHERE LEVEL = 3

You will see that it takes a long time (15-20 secs) just to select the data.

I get this execution plan: http://qa.sqlservercentral.com/Forums/Attachment12601.aspx

If something can be done to speed up this select, then we've cracked it - but at the moment I cannot see how.

The fact that that last SELECT takes 15-20 secs with a clean buffer, but zero seconds once the data is cached kind of points towards there just being a physical limit on the speed of reading that many rows from this sort of data? maybe? Or is it just a matter of the join being slow? I can't see the wood for the trees now, so maybe one of the more intelligent beings can help...

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

haiao2000 Hall of Fame Points: 3734 More actions · Answer 13

mister.magoo (10/12/2012)

Here is my sample data creation: 10,000,000 + rows in tempdb - you may want to put it somewhere else...it's about 1GB of data.

Oh, and it relies on a TALLY table called TALLY with a column called N.

See here for one if you need it : http://qa.sqlservercentral.com/scripts/Advanced+SQL/62486/

-- Include this if you want to rebuild the source table later - takes about 2 to 3 minutes
-- DROP TABLE #SOURCE; IF OBJECT_ID('TEMPDB..#SOURCE') IS NULL BEGIN CREATE TABLE #SOURCE(ID INT IDENTITY(1,1) NOT NULL, PARENTID UNIQUEIDENTIFIER NOT NULL, CHILDID UNIQUEIDENTIFIER, ISDIRTY CHAR(1) DEFAULT('N')); CREATE UNIQUE CLUSTERED INDEX #IX_CLUST ON #SOURCE (PARENTID,CHILDID); -- CREATE LEVEL 1 <- 100 ROWS INSERT #SOURCE(PARENTID,CHILDID) SELECT TOP 100 NEWID(),NEWID() FROM TALLY -- CREATE LEVEL 2 <- ~ 10,000 ROWS INSERT #SOURCE(PARENTID,CHILDID) SELECT CHILDID,NEWID() FROM #SOURCE,TALLY WHERE N<100 -- CREATE LEVEL 3 <- ~ 100,000 ROWS INSERT #SOURCE(PARENTID,CHILDID) SELECT CHILDID,NEWID() FROM #SOURCE,TALLY WHERE N<100 AND ID >100 -- CREATE LEVEL 4 <- ~ 10,000,000 ROWS INSERT #SOURCE(PARENTID,CHILDID,ISDIRTY) SELECT CHILDID,NEWID(),'Y' FROM #SOURCE,TALLY WHERE N<10 AND ID >10100 -- Add an index just to make the selection of a starting point quicker CREATE INDEX #IX_ID ON #SOURCE (ID); END

And the results table :

-- Prepare a temp table to hold the result set
IF OBJECT_ID('TEMPDB..#RESULTS') IS NOT NULL DROP TABLE #RESULTS; CREATE TABLE #RESULTS(ID INT NOT NULL, PARENTID UNIQUEIDENTIFIER NOT NULL, CHILDID UNIQUEIDENTIFIER, LEVEL INT IDENTITY(1,1) NOT NULL,ISDIRTY CHAR(1)) ; -- Add an index to aid with recursion CREATE INDEX #IXS ON #RESULTS(LEVEL) INCLUDE(CHILDID); -- Add an index to help with the decision of when to stop - when ISDIRTY="Y" CREATE INDEX #IXDIRTY ON #RESULTS(ISDIRTY); -- We need this to be able to insert the LEVEL value SET IDENTITY_INSERT #RESULTS ON;

Now add the first level to the results table:

-- Get the starting point, arbritrarily chosen as row with ID=1 - they all have the same number of children, so it doesn't matter. DECLARE @START UNIQUEIDENTIFIER; SELECT TOP 1 @START = PARENTID FROM #SOURCE ORDER BY ID OPTION(FAST 1); -- INSERT the first row in our hierarchy - level 1 INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL,ISDIRTY) SELECT ID,PARENTID,CHILDID,1,ISDIRTY FROM #SOURCE WHERE PARENTID=@START;

And process the rest...

-- Do a dirty old loop (only iterates once per level in the hierarchy) while we haven't hit out stop condition and there were rows found
WHILE @@ROWCOUNT>0 AND NOT EXISTS(SELECT 1 FROM #RESULTS WHERE ISDIRTY='Y') -- pick up the last inserted set and go find their children to insert to the table with an incremented LEVEL INSERT #RESULTS(ID,PARENTID,CHILDID,LEVEL,ISDIRTY) SELECT S.ID,S.PARENTID,S.CHILDID,SCOPE_IDENTITY()+1,S.ISDIRTY FROM #RESULTS AS R JOIN #SOURCE AS S ON S.PARENTID = R.CHILDID WHERE LEVEL = SCOPE_IDENTITY() OPTION( RECOMPILE ); -- the WHILE loop only repeats the INSERT, nothing else
SET IDENTITY_INSERT #RESULTS OFF;

With this data, you will see the highest wait times at level 3 and 4, but if you stop the query at level 3 by adding

AND SCOPE_IDENTITY()<3

to the WHILE condition, then run this - the level 4 SELECT:

SELECT S.ID,S.PARENTID,S.CHILDID,S.ISDIRTY FROM #RESULTS AS R JOIN #SOURCE AS S ON S.PARENTID = R.CHILDID WHERE LEVEL = 3

You will see that it takes a long time (15-20 secs) just to select the data.

I get this execution plan: http://qa.sqlservercentral.com/Forums/Attachment12601.aspx

If something can be done to speed up this select, then we've cracked it - but at the moment I cannot see how.

The fact that that last SELECT takes 15-20 secs with a clean buffer, but zero seconds once the data is cached kind of points towards there just being a physical limit on the speed of reading that many rows from this sort of data? maybe? Or is it just a matter of the join being slow? I can't see the wood for the trees now, so maybe one of the more intelligent beings can help...

Hi MM.Thank you for continue looking into this, I was experiencing another method,

-I created a separated table to hold materialized path with 2 fields (ID, Path) reason I used separated table because a child can have multiple parents

-added 2 indexes to both fields ID and Path

-then use your identity hack method to populate the materialized path table by passing the top most ParentId to it.It runs remarkably fast.

-now with materialized path i can query the source table using ID (indexed) which yield good performance definitely beat the other method.



SELECT @PATH = Path from MaterializedPath WHERE ID=@ID)

SELECT ParentID, ChildID

FROM #Source Where ID in

(Select ID from MaterializedPath where Path like @PATH+'%')