Recursion with a Twist

Question

Post reply

Recursion with a Twist

Viewing 15 posts - 16 through 30 (of 35 total)

You must be logged in to reply to this topic. Login to reply

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 1

Phil Parkin (11/29/2012)

I wrote the above response before trying Craig's solution, which certainly does the business - thanks! Maybe this is one instance where a loop is the best solution. I'll leave it a while longer (no pun intended) before I implement it, to see whether anyone comes up with a set-based solution.

I poked at it a bit more yesterday and while I can make that process look cleaner (it's a bit disorganized) it's the best solution I have available without digging into indexes and possible shortcuts for some of the sub-data, like the BookingID list. The killer is in the self-referencing bookingIDs and the bi-directional hierarchy, where you're going both 'up and down' the chain simultaneously.

Glad to help, though. 🙂

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

mickyT SSChampion Points: 10360 More actions · Answer 2

Hi

The following recursive option might work. It walks up and down from the queried invoice and joins the results.



DECLARE @invoiceID int = 3

;with rListUp as (

	select a.invoiceid invoiceid, a.invoiceid currentid, a.bookingid, b.invoiceid nextinvoiceid

	from #IDs a

		CROSS APPLY (SELECT invoiceid, bookingid FROM #IDs m WHERE m.bookingid = a.bookingid ) b

	where b.invoiceid > a.invoiceid

	union all

	select a.invoiceid, a.nextinvoiceid, a.bookingid, c.invoiceid

	from rListUp a

		CROSS APPLY (SELECT l.invoiceid, l.bookingid FROM #IDs l WHERE l.invoiceID = a.nextinvoiceid) b

		CROSS APPLY (SELECT m.invoiceid, m.bookingid FROM #IDs m WHERE m.bookingid = b.bookingid and m.invoiceid > a.nextinvoiceid ) c

	),

	rListDown as (

	select a.invoiceid invoiceid, a.invoiceid currentid, a.bookingid, b.invoiceid nextinvoiceid

	from #IDs a

		CROSS APPLY (SELECT invoiceid, bookingid FROM #IDs m WHERE m.bookingid = a.bookingid ) b

	where b.invoiceid < a.invoiceid

	union all

	select a.invoiceid, a.nextinvoiceid, a.bookingid, c.invoiceid

	from rListDown a

		CROSS APPLY (SELECT l.invoiceid, l.bookingid FROM #IDs l WHERE l.invoiceID = a.nextinvoiceid) b

		CROSS APPLY (SELECT m.invoiceid, m.bookingid FROM #IDs m WHERE m.bookingid = b.bookingid and m.invoiceid < a.nextinvoiceid ) c

	),

	rList AS (

	SELECT invoiceid, nextinvoiceid FROM rListUp

	UNION

	SELECT invoiceid, nextinvoiceid FROM rListDown

	)

select invoiceid, bookingid

from #IDs

where invoiceid in (

	select nextinvoiceid from rlist where invoiceID = @InvoiceID

	union

	select @InvoiceID

)

It could probably be prettied up a bit more and there is still a chance that recursion limits will get hit if there is a long change of invoice/bookings

Dwain Camps SSC Guru Points: 86913 More actions · Answer 3

mickyT (11/29/2012)

dwain.c (11/29/2012)

To all who participated in this thread:

You may wish to review this new article:

http://qa.sqlservercentral.com/articles/String+Manipulation/94365/

It provides a good utility function for solving this case and many other similar ones. Thanks to the OP for being the inspiration for the article!

Thanks Dwain

I've already had a look at it and I am going to have a really good read through it soon. Even though I haven't had a chance to dig into the functionality and digest it properly I was impressed with the methods you came up with. Great article:-)

Thanks for noticing and reading it, and the praise!

By all means leave some comments in the discussion thread and don't forget to rate the article!

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86913 More actions · Answer 4

Hi Phil - allow me to propose something for you.



DECLARE @RowCount INT

DECLARE @IDs TABLE

    (InvoiceId int not null

    ,BookingId int not null)

INSERT INTO @IDs

SELECT InvoiceID, BookingID

FROM #IDs

WHERE InvoiceID = 1

SET @RowCount = @@ROWCOUNT

WHILE @RowCount <> 0

BEGIN

    INSERT INTO @IDs

    SELECT a.InvoiceID, a.BookingID

    FROM #IDs a

    INNER JOIN @IDs b ON a.BookingID = b.BookingID

    UNION ALL

    SELECT a.InvoiceID, a.BookingID

    FROM #IDs a

    INNER JOIN @IDs b ON a.InvoiceID = b.InvoiceID

    EXCEPT

    SELECT a.InvoiceID, a.BookingID

    FROM @IDs a

    SET @RowCount = @@ROWCOUNT

END

SELECT * FROM @IDs

I think this resolves the original and the alternate data setup (from Kraig?) properly. As it turns out, I recently solved an problem quite similar to this and this solution has its basis in that.

Not to disappoint you Alan (as this uses a loop), it is at least a set-based loop and in the similar problem I solved any attempt at a recursive solution did not perform nearly as well. I have not done that testing here so I can't say for sure whether it will be faster or not.

The other advantage to this approach is that it eliminates feedback loops which will completely disintegrate any effort to use a rCTE.

Let me know of any cases it doesn't resolve. It is possible that another UNION ALL amidst the loop will take care of it.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86913 More actions · Answer 5

Post deleted - SSC hung and posted this one to the wrong thread.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Steven Willis SSCrazy Eights Points: 9893 More actions · Answer 6

No loops, no cursors, no cte



IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL

   DROP TABLE #TempTable

CREATE TABLE #TempTable

    (

     InvoiceId INT NOT NULL

    ,BookingId INT NOT NULL

    ,PRIMARY KEY (InvoiceId,BookingId)

    )

INSERT #TempTable

    (

     InvoiceId

    ,BookingId

    )

    SELECT

        *

    FROM

        ( VALUES ( 1, 9), ( 1, 10), ( 1, 11), ( 2, 11), ( 3, 11), ( 3, 12), ( 3, 13), ( 4, 14), ( 5, 14) ) data (InvoiceId,BookingId)

SELECT DISTINCT

    t2.InvoiceId

   ,t2.BookingId

FROM

    (

    SELECT

        MAX(BookingId) AS BookingId

    FROM

        #TempTable

    GROUP BY

        BookingId

    ) AS Sub1

INNER JOIN

    #TempTable AS t1

    ON Sub1.BookingId = t1.BookingId

INNER JOIN

    #TempTable AS t2

    ON t1.BookingId = t2.BookingId + 1

CapnHector SSCoach Points: 16143 More actions · Answer 7

Steven Willis (11/30/2012)

No loops, no cursors, no cte

IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL DROP TABLE #TempTable CREATE TABLE #TempTable ( InvoiceId INT NOT NULL ,BookingId INT NOT NULL ,PRIMARY KEY (InvoiceId,BookingId) ) INSERT #TempTable ( InvoiceId ,BookingId ) SELECT * FROM ( VALUES ( 1, 9), ( 1, 10), ( 1, 11), ( 2, 11), ( 3, 11), ( 3, 12), ( 3, 13), ( 4, 14), ( 5, 14) ) data (InvoiceId,BookingId) SELECT DISTINCT t2.InvoiceId ,t2.BookingId FROM ( SELECT MAX(BookingId) AS BookingId FROM #TempTable GROUP BY BookingId ) AS Sub1 INNER JOIN #TempTable AS t1 ON Sub1.BookingId = t1.BookingId INNER JOIN #TempTable AS t2 ON t1.BookingId = t2.BookingId + 1

The problem with your code is that if the value (3,14) is present it does not get picked up and neither does (4,14) or (5,14)

IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL



   DROP TABLE #TempTable

CREATE TABLE #TempTable

    (

     InvoiceId INT NOT NULL

    ,BookingId INT NOT NULL

    ,PRIMARY KEY (InvoiceId,BookingId)

    )

INSERT #TempTable

    (

     InvoiceId

    ,BookingId

    )

    SELECT

        *

    FROM

        ( VALUES ( 1, 9), ( 1, 10), ( 1, 11), ( 2, 11), ( 3, 11), ( 3, 12), ( 3, 13), (3,14), ( 4, 14), ( 5, 14) ) data (InvoiceId,BookingId)

so far an extension of my adjacency list by searching up and down (not the fastest thanks Mickey T. by the way) and dwains loop solution are the only ones that seem to work fully on the set of data we have.

For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 8

capnhector (11/30/2012)

so far an extension of my adjacency list by searching up and down (not the fastest thanks Mickey T. by the way) and dwains loop solution are the only ones that seem to work fully on the set of data we have.

Erm, that was mine, as well. Dwain's is basically a variation. I still need a chance to check performance between the two.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 9

Steven Willis (11/30/2012)

No loops, no cursors, no cte

You've answered the data, not the problem, with this. You cannot recurse in both directions, you cannot deal with more than 2 layers of heirarchy. You need to be able to traverse bi-directionally across bookingIDs and InvoiceIDs as well as deal with multiple invoice-booking-invoice-booking-invoice chains.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Steven Willis SSCrazy Eights Points: 9893 More actions · Answer 10

Steven Willis

SSCrazy Eights

Points: 9893

November 30, 2012 at 8:51 pm

#1564616

OK, sorry. I only read the first few posts and made some faulty assumptions. After considering the last few comments I can see that it really is more complicated than it first appeared. Mea culpa.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 11

Alan.B (11/28/2012)

Evil Kraig F (11/28/2012)

This is NOT pretty, but it IS functional.

I am trying to understand why went with a loop vs a set-based approach. I posted a more set based method earlier that gets the same results notably faster. I am not trying to be confrontational, I'm just trying to understand your approach or what I did wrong.

If the problem requires that, no matter which InvoiceID you pick, all associated rows for all associated InvoiceIDs will be returned, then it actually can't be done in what most folks consider to be a "set based" manner. There needs to be some form of recursion either as a WHILE loop or as an rCTE (Recursive CTE).

I will say that the WHILE loop and the rCTE method can more closely come to "set based" in that sets of rows can be returned for each iteration but it can't be done in a single itermation (what most folks think of as "set based") without giving up on the trait of "unknown depth" as some of the set based solutions on this thread have.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 12

Post withdrawn. I made a mistake in the code somewhere. I'll post it as a new post if I can figure out what the heck I did wrong and fix it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 13

*wipes away a tear from his eye* Damn, Jeff... that's... that's just beautiful. *sniff*

I couldn't for the life of me figure out how to get out of the IN (SELECT InvoiceID From rCTE) situation.

Only one problem.

It doesn't recurse past the first series of booking IDs. Check this out:



if object_id('tempdb..#IDs') is not null

drop table #IDs

create table #IDs (

InvoiceId int not null

,BookingId int not null

) on [PRIMARY]

go

alter table #IDs add constraint PK_IDs primary key clustered (

	InvoiceId

	,BookingId

	)

	with (

			STATISTICS_NORECOMPUTE = off

			,IGNORE_DUP_KEY = off

			,ALLOW_ROW_LOCKS = on

			,ALLOW_PAGE_LOCKS = on

			) on [PRIMARY]

go

INSERT #IDs VALUES (1,9)

INSERT #IDs VALUES  (1,10)

INSERT #IDs VALUES  (1,11)

INSERT #IDs VALUES  (2,11)

INSERT #IDs VALUES  (2,12)

INSERT #IDs VALUES  (3,12)

INSERT #IDs VALUES  (3,13)

INSERT #IDs VALUES  (4,14)

INSERT #IDs VALUES  (5,14)

GO

--insert #IDs ( InvoiceId, BookingId)

--select * from (values (1,9), (1,10), (1,11), (2,11), (3,11), (3,12), (3,13), (4,14), (5,14)) data(InvoiceId,BookingId)

select * from #IDs

--=================================================================================

--      Solve the problem using some "standard" "Traveling Salesman" or "Airline

--      Flight Schedule" code.  I believe you'll be absolutely amazed as the

--      performance of this code.

--      This code could easily be turned into an iTVF to be used in CROSS APPLY as

--      if it were a parameterized view.

--=================================================================================

--===== Start a timer because I don't trust SET STATISTICS any more (1).

     -- Ref (1): http://qa.sqlservercentral.com/articles/T-SQL/91724/

DECLARE @StartTime DATETIME;

 SELECT @StartTime = GETDATE();

--===== Declare a variable for the original InvoiceID that we want to find and

     -- stack the others against.  This is like identifying the "Start City"

     -- in a "Traveling Salesman" problem.

DECLARE @StartInvoiceID INT;

 SELECT @StartInvoiceID = 1; --<<< Change this to do other tests.

                                  --<<< Any number from 1 to 300,000 will do.

--===== Solve the problem as if it were a "Traveling Salesman" problem.  The key

     -- to this is to leave "breadcrumbs" as to where we've been by using a

     -- "Materialized Hierarchical Path" (the "BeenThere" column).

WITH

cteBuildPath AS

( --=== Find the "top" ID as we would in any hierarchy or net.

 SELECT a.InvoiceId, a.BookingID,

        BeenThere = CAST(QUOTENAME(a.InvoiceID,'"') AS VARCHAR(8000))

   FROM #IDs AS a --as in "anchor"

  WHERE InvoiceID = @StartInvoiceID

  UNION ALL

 --==== "Loop" through all of the possibilities with some "stop code" to prevent

     -- the inevitable cycles that will appear because it's not a "DAG".

 SELECT b.InvoiceID, b.BookingID,

        BeenThere = r.BeenThere + CAST(QUOTENAME(b.InvoiceID,'"') AS VARCHAR(8000))

   FROM #IDs b          --as in "base table"

   JOIN cteBuildPath r  --as in "recursive part"

     ON r.BookingID = b.BookingId

  WHERE r.BeenThere NOT LIKE '%'+QUOTENAME(b.InvoiceId,'"')+'%'

) 

SELECT * FROM cteBuildPath

--

----=== Find the full data for the InvoiceID's that we've touched at least once.

--     -- Note that the WHERE IN acts as if it were a DISTINCT so no need to explicity

--     -- use the word DISTINCT.

-- SELECT o.*

--   FROM #IDs o --as in "original"

--  WHERE o.InvoiceID IN (SELECT InvoiceId FROM cteBuildPath)

--  ORDER BY o.InvoiceID, o.BookingID

--;

--===== Let's see how long that took.

  PRINT 'Total Duration = ' + CONVERT(CHAR(12),GETDATE()-@StartTime,114)

GO

The result set stops at InvoiceID 2, and doesn't chain to 3 after picking up the BookingIDs for Invoice 2. Sorry for the horrible INSERT statement but I've only got 2k5 at the house.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 14

Heh... dammit. I had it working and made a final tweek to the code. Now I have to go back and figure out what the heck I changed. Sorry for the mistake. I'll fix it (I think) and will repost it. I took down my original post because it was wrong.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 15

Looking at the data that you've included, Craig, I'm thinking that maybe I didn't have it working at all when it comes to "disconnected" data like you did with the 2,12 entry. Thanks for the test. Back to the drawing board. :blush:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions