Recursive CTE: Is it possible to count records without an aggregate function?

  • sgmunson (2/1/2010)


    Matt,

    I really thought I had it licked when I read your post. I added a ROW_NUMBER() function (alias RN) based on your post to the original selection table, so that I could take the NOT EXISTS clause and compare the Prefs table's RN values to ensure no record existed with the same values of OId and AId but a larger RN. It seems I'm still in an infinite loop, and I'm not quite sure why... If you have any further general advice, it would be appreciated.

    Here's the updated code base for the concept:

    ;WITH MATCHES AS (

    SELECT OId, MIN(AId) AS AId, MIN(RN) AS RN

    FROM PREFS

    GROUP BY OId

    HAVING COUNT(DISTINCT AId) = 1

    UNION ALL

    SELECT P.OId, P.AId

    FROM PREFS P INNER JOIN MATCHES M

    ON P.OId <> M.OId AND P.AId <> M.AId

    WHERE EXISTS (

    SELECT 1

    FROM PREFS P2

    WHERE P2.OId = P.OId

    AND P2.AId <> P.AId

    AND NOT EXISTS (

    SELECT 1

    FROM PREFS P3

    WHERE P3.OId = P2.OId

    AND P3.AId = P2.AId

    AND P3.RN > P2.RN

    )

    )

    )

    SELECT ODATA.Name AS OName, ADATA.Name As AName

    FROM MATCHES M1 INNER JOIN ODATA ON M.OId = ODATA.OId

    INNER JOIN ADATA ON M.AId = ADATA.AId

    ORDER BY M.OId

    OPTION (MAXRECURSION 0)

    Steve

    (aka sgmunson)

    :-):-):-)

    Matt Miller (#4) (2/1/2010)


    I'm having a bit of a low-brainpower day, but I think you can achieve what you're looking for using ROW_NUMBER(). If you use whatever you were passing as the GROUP BY clause in the PARTITION BY portiong of the ROW_NUMBER definition, you can then look for anything with ROW_NUMBER()>1 (for duplicates within that group based on the PARTITION lineup).

    I think you're running into an infinite loop because of the <>'s. Since you're linking the table to itself several times, you likely want to only use unidirectional's (so that you force it to keep "going up" the OID's, and not go up, then jump back down then go up again.

    Also - if you use a CTE, it occurs to me that you can find the one value in the duplicate bunch you want without it having to be recursive. I haven't quite gathered EXACTLY what you're trying to accomplish, so perhaps state the criterion you're seeking in english to see if there is a pitfall there.

    Finally - this part is going to cause you trouble:

    FROM PREFS P INNER JOIN MATCHES M

    ON P.OId <> M.OId AND P.AId <> M.AId

    That will likely create a HUGE set to work off of, which also why you might be plowing (that's going to be pretty close in cardinality to a straight CROSS JOIN, or n*m rows)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Is this based on the coding contest they announced here last week? It sure sounds like it.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I guess I'm going to have to let my cat out of the bag, as you're the 2nd person to mention the TSQL Challenge #22. However, my reason for being so gung-ho on finding a recursive solution is the fact that such a solution could solve a rather broad range of similar problems that are traditionally difficult to deal with in set-based code. My guess is that it would be a high-value solution.

    Here's my recursive attempt as of my last go round with it - it seems to be failing to get past the 2nd recursion, and is also duplicating records, and I'm sure the cross join is in part responsible, but outside of a prohibited outer join, I can't find a way to eliminate the dupes the cross join is producing.

    ;WITH BOX_MATCHES AS (

    SELECT BoxId, MIN(BallId) AS BallId, MIN(PreferenceId) AS PreferenceId

    FROM TC22_Preferences

    GROUP BY BoxId

    HAVING COUNT(DISTINCT BallId) = 1

    UNION ALL

    SELECT P.BoxId, P.BallId, P.PreferenceId

    FROM TC22_Preferences P CROSS APPLY BOX_MATCHES BM

    WHERE P.PreferenceId <> BM.PreferenceId

    AND EXISTS (

    SELECT 1

    FROM TC22_Preferences P2

    WHERE P2.BallId = P.BallId

    AND NOT EXISTS (

    SELECT 1

    FROM TC22_Preferences P3

    WHERE P3.BallId = P2.BallId

    --AND P3.BoxId <> P2.BoxId

    AND P3.PreferenceId <> P2.PreferenceId

    )

    )

    )

    SELECT BOX.BoxName AS Box, BALL.BallName As Ball

    FROM BOX_MATCHES BM INNER JOIN TC22_Boxes BOX ON BM.BoxId = BOX.BoxId

    INNER JOIN TC22_Balls BALL ON BM.BallId = BALL.BallId

    --ORDER BY BM.BoxId

    OPTION (MAXRECURSION 0)

    Steve

    (aka sgmunson)

    :-):-):-)

  • My solution on the SQL challenge would be to dynamically generate a series of inner joins, one for each ball ID. Building the query string would take a few milliseconds, then the query itself would take a variable amount of time based on the number of boxes/balls, but it's quite fast.

    May not be the most performant solution, but it's easy to document and easy to read, and it'll perform well enough.

    That'd be much simpler, and almost certainly more performant than a complex recursive CTE.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • sgmunson (2/2/2010)


    ...such a solution could solve a rather broad range of similar problems that are traditionally difficult to deal with in set-based code. My guess is that it would be a high-value solution...

    Steve

    (aka sgmunson)

    :-):-):-)

    The classroom / class challenge is the same idea. I've never come across a real-life scenario which matches this logic but surely they are out there.

    1. Save off the ID's of the boxes which can have only one ball, in the available set.

    2. Remove those boxes, and balls, from the available set - using the saved-off ID's to derive an exclusion list.

    Cycle back to 1 until there are no rows left in the available set, then output the saved-off ID's.

    Since AFAIK each iteration of a recursive CTE only exposes the result of the last iteration to the current iteration, you can't save off the ID's cumulatively to derive an exclusion list. It wouldn't look nice, but I bet you could save off the ID's cumulatively in a comma-delimited list using code similar to the running total algorithm.

    Cheers

    ChrisM@home


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • I'm going to give up until I see SQL Server relax some of the constraints on recursive CTEs. I either can't use an aggregate or can't make multiple recursive references even though I would want all such references to refer to the same set. That's just too restrictive, as the ideal would be to just keep the anchor member and the recursive member darn near identical. Oh well... I'll just submit my procedural code. It just repeats an insert into a solutions table until there have been as many inserts as there are boxes.

    Steve

    (aka sgmunson)

    :-):-):-)

  • Hi Steve

    I've had a play with this one as well - obviously - and from what I remember you can get it out in 4 passes: 2, 2, 1, 1.

    The only way I can currently solve it is same as you, pour the results of each pass (box/ball combo) into a temp table and use this as an exclusion list.

    I made a point of designing the query with recursive CTE's in mind - no GROUP BY, no outer joins, no subqueries etc.

    It gets around the above limitations of recursive CTE's, but it still doesn't work because of the last iteration only thing.

    I'll send you the code tomorrow, I don't have it here.

    Best wishes

    ChrisM@home


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Hi Steve

    As promised.

    Here's the multistatement code:

    -- create table for results

    CREATE TABLE #Iteration (PreferenceId INT, BoxId INT, BallId INT)

    -- four iterations to solve:

    INSERT INTO #Iteration

    SELECT PreferenceId, BoxId, BallId

    FROM ( -- x

    SELECT BoxCount = COUNT(p.BoxId) OVER(PARTITION BY p.BoxId), p.*

    FROM #TC22_Preferences p

    INNER JOIN ( -- ex

    SELECT PreferenceId FROM #TC22_Preferences

    EXCEPT

    (SELECT p.PreferenceId

    FROM #TC22_Preferences p

    INNER JOIN #Iteration i ON i.Ballid = p.BallId OR i.Boxid = p.BoxId)

    ) ex ON ex.PreferenceId = p.PreferenceId

    ) x WHERE BoxCount = 1

    -- display result

    SELECT p.*

    FROM #TC22_Preferences p

    INNER JOIN #Iteration i ON i.PreferenceId = p.PreferenceId

    ORDER BY p.PreferenceId

    An anchor for a recursive CTE could be the first and therefore simplest iteration of the solving query:

    SELECT PreferenceId, BoxId, BallId

    FROM (

    SELECT BoxCount = COUNT(p.BoxId) OVER(PARTITION BY p.BoxId), p.*

    FROM #TC22_Preferences p

    ) x WHERE BoxCount = 1

    Giving the following, which looks superficially like it might work:

    ;WITH SuperSelect (PreferenceId, BoxId, BallId) AS (

    SELECT PreferenceId, BoxId, BallId

    FROM (

    SELECT BoxCount = COUNT(p.BoxId) OVER(PARTITION BY p.BoxId), p.*

    FROM #TC22_Preferences p

    ) x WHERE BoxCount = 1

    UNION ALL

    SELECT PreferenceId, BoxId, BallId

    FROM ( -- x

    SELECT BoxCount = COUNT(p.BoxId) OVER(PARTITION BY p.BoxId), p.*

    FROM #TC22_Preferences p

    INNER JOIN ( -- ex

    SELECT PreferenceId FROM #TC22_Preferences

    EXCEPT

    (SELECT p.PreferenceId

    FROM #TC22_Preferences p

    INNER JOIN SuperSelect i ON i.Ballid = p.BallId OR i.Boxid = p.BoxId)

    ) ex ON ex.PreferenceId = p.PreferenceId

    ) x WHERE BoxCount = 1

    ) -- WITH SuperSelect

    SELECT * FROM SuperSelect

    The results are rubbish from the perspective of the challenge: PreferenceID 1 and 14 come out from the anchor, then 1 and 12 - which makes it look like only PreferenceID 14 is excluded in the first iteration of the recursive part. This is repeated until MAXRECURSION is reached.

    Hope this gives you some ideas Steve.

    Cheers

    ChrisM

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • That gave me at least an idea on how to use the windowed COUNT aggregate. Here's my latest attempt, and it's results. I don't quite understand how a duplicate record appears, however. Anyone that can explain, please do so.

    ;WITH BOX_MATCHES AS (

    SELECT BoxId, MIN(BallId) AS BallId, MIN(PreferenceId) AS PreferenceId

    FROM TC22_Preferences

    GROUP BY BoxId

    HAVING COUNT(DISTINCT BallId) = 1

    UNION ALL

    SELECT P.BoxId, P.BallId, P.PreferenceId

    FROM TC22_Preferences P

    INNER JOIN (

    SELECT BoxId, BallId, PreferenceId, COUNT(PreferenceId) OVER (PARTITION BY BallId) AS COUNTER

    FROM (

    SELECT BoxId, BallId, PreferenceId

    FROM TC22_Preferences

    EXCEPT

    SELECT BoxId, BallId, PreferenceId

    FROM BOX_MATCHES

    ) X

    ) BM

    ON P.BoxId = BM.BoxId AND P.BallId = BM.BallId

    WHERE BM.COUNTER = 1

    )

    SELECT BOX.BoxName AS Box, BALL.BallName As Ball

    FROM BOX_MATCHES BM INNER JOIN TC22_Boxes BOX ON BM.BoxId = BOX.BoxId

    INNER JOIN TC22_Balls BALL ON BM.BallId = BALL.BallId

    --ORDER BY BM.BoxId

    OPTION (MAXRECURSION 0)

    Results:

    Box Ball

    ------ ------

    Box 1 Ball 1

    Box 6 Ball 5

    Box 4 Ball 6

    Box 4 Ball 6

    Steve

    (aka sgmunson)

    :-):-):-)

  • Might have a play with this at home tonight, it looks promising. This method works as a single-query solution but it looks horrible.

    Cheers

    ChrisM

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Might be of interest

    https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=318084

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Thanks for this, Mark. Interesting note - the example query actually does run in compatibility 100 mode.

    Edit: On second thoughts, that's quite scary, expecially considering the simplicity of the query.

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I'd really love to understand the reasoning that went into that post's comment that "the result is not the expected one". Here's what SQL Server 2005 Developer Edition 64-Bit running on my Windows 7 Ultimate 64-Bit system has to say for the results of that query:

    id parentid name ord lvl

    ------ ------ ------- ------ ------

    1 NULL root 1 0

    2 1 .1.2. 1 1

    3 1 .1.3. 2 1

    7 3 .1.3.7. 1 2

    8 3 .1.3.8. 2 2

    9 3 .1.3.9. 3 2

    4 2 .1.2.4. 1 2

    5 2 .1.2.5. 2 2

    6 2 .1.2.6. 3 2

    Why, exactly, was that "not expected" ? Here's my output from SELECT @@VERSION:

    Microsoft SQL Server 2005 - 9.00.4053.00 (X64) May 26 2009 14:13:01 Copyright (c) 1988-2005 Microsoft Corporation Developer Edition (64-bit) on Windows NT 6.1 (Build 7600: )

    Steve

    (aka sgmunson)

    :-):-):-)

  • The lvl value should be 3 where the parentid value is 3. The windowing function worked fine, but the lvl increment in the recursive part of the CTE didn't.

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Why? That's not the path this is going to take... The level is only going to be relative to the root level, not to any other value. The hierarchy shown only has 3 levels - the root, the 1st level and the 2nd level. It traversed the hierarchy quite correctly, from my perspective. Also, the root level was designated as level 0, so you have a 0-based setup here. The 3rd level is thus level 2 instead of 3. If I'm somehow confused here, please point out where and why... I'm always eager to learn...

    Steve

    (aka sgmunson)

    :-):-):-)

    Chris Morris-439714 (2/3/2010)


    The lvl value should be 3 where the parentid value is 3. The windowing function worked fine, but the lvl increment in the recursive part of the CTE didn't.

Viewing 15 posts - 16 through 30 (of 37 total)

You must be logged in to reply to this topic. Login to reply