Recursive CTE: Is it possible to count records without an aggregate function?

  • Here's the scenario - I have a recursive CTE, where the anchor member uses GROUP BY and COUNT, and the ideal would be if I could just repeat an ever so slightly different version of that anchor as the recursive piece. The idea is to select items from a table by limiting the selection to any rows that meet the criteria but have not already been selected, such that each recursion eliminates enough records from the source by virtue of the criteria, which prevents selection of items already in the set, but gets items that would represent unique additions to the overall set.

    This is something that would be a piece of cake with procedural code, but the objective is to do it with entirely set-based code. Any ideas? Any way to use EXCEPT or INTERSECT in some unique way to determine whether a subquery returns more than one record or not - again, without using any aggregate functions or methods?

    Steve

    (aka sgmunson)

    :-):-):-)

  • Care to provide an example of what you are talking about? Table def(s) (CREATE TABLE statement(s)), sample data (series of INSERT INTO statements for the table(s)), expected results based on the sample data, and the code you currently have written would go a long way to getting you a lot of help.

  • Well... let's see... I can't tear out the base logic from the actual code, so the best I can do is this:

    Assume you have a table with two values, each of which is used as a foreign key into a separate table. Those separate tables have an identical number of records, and they exist solely to tie a character value (a name, or label) associated with the foreign key value. Then you have a selection table that identifies the which records from one table are allowed to match up with a record from the other table - it's sole contents are the two keys for those other two tables. There can be multiple possible matches for a given key value of the first table to match with values from the second table, however, ultimately, only one possible selection of values from each table will satisfy the "conditions" specified by the selection table. I'm looking at a recursive CTE like this:

    ;WITH MATCHES AS (

    SELECT OId, MIN(Ad) AS AId

    FROM Prefs

    GROUP BY OId

    HAVING COUNT(DISTINCT AId) = 1

    UNION ALL

    SELECT P.OId, P.AId

    FROM Prefs P INNER JOIN MATCHES M

    ON P.OId <> M.OId AND P.AId <> M.AId

    WHERE EXISTS (

    SELECT 1

    FROM Prefs T

    WHERE T.OId = P.OId

    AND T.AId <> P.AId

    AND NOT EXISTS (

    SELECT 1

    FROM Prefs T2

    WHERE T2.OId = P.OId

    AND T2.AId <> P.AId

    AND T2.AId <> T.AId

    )

    )

    )

    SELECT ODATA.Name AS OName, ADATA.Name As AName

    FROM MATCHES M1 INNER JOIN ODATA ON M.OId = ODATA.OId

    INNER JOIN ADATA ON M.AId = ADATA.AId

    ORDER BY M.OId

    OPTION (MAXRECURSION 0)

    The problem with this code is that it will fail because the NOT EXISTS isn't going to quite operate like a cross apply. I just need some kind of condition such that the first EXISTS clause can be restricted to only those situations where that query returns only one record. If it can be done, let me know... My gut is starting to think it just isn't possible.

    Steve

    (aka sgmunson)

    :-):-):-)

    Lynn Pettis (2/1/2010)


    Care to provide an example of what you are talking about? Table def(s) (CREATE TABLE statement(s)), sample data (series of INSERT INTO statements for the table(s)), expected results based on the sample data, and the code you currently have written would go a long way to getting you a lot of help.

  • I'm having a bit of a low-brainpower day, but I think you can achieve what you're looking for using ROW_NUMBER(). If you use whatever you were passing as the GROUP BY clause in the PARTITION BY portiong of the ROW_NUMBER definition, you can then look for anything with ROW_NUMBER()>1 (for duplicates within that group based on the PARTITION lineup).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I'm sorry, but I really need something to work with to help. Tables, sample data, expected results all would help greatly with understanding the problem.

  • Matt,

    I really thought I had it licked when I read your post. I added a ROW_NUMBER() function (alias RN) based on your post to the original selection table, so that I could take the NOT EXISTS clause and compare the Prefs table's RN values to ensure no record existed with the same values of OId and AId but a larger RN. It seems I'm still in an infinite loop, and I'm not quite sure why... If you have any further general advice, it would be appreciated.

    Here's the updated code base for the concept:

    ;WITH MATCHES AS (

    SELECT OId, MIN(AId) AS AId, MIN(RN) AS RN

    FROM PREFS

    GROUP BY OId

    HAVING COUNT(DISTINCT AId) = 1

    UNION ALL

    SELECT P.OId, P.AId

    FROM PREFS P INNER JOIN MATCHES M

    ON P.OId <> M.OId AND P.AId <> M.AId

    WHERE EXISTS (

    SELECT 1

    FROM PREFS P2

    WHERE P2.OId = P.OId

    AND P2.AId <> P.AId

    AND NOT EXISTS (

    SELECT 1

    FROM PREFS P3

    WHERE P3.OId = P2.OId

    AND P3.AId = P2.AId

    AND P3.RN > P2.RN

    )

    )

    )

    SELECT ODATA.Name AS OName, ADATA.Name As AName

    FROM MATCHES M1 INNER JOIN ODATA ON M.OId = ODATA.OId

    INNER JOIN ADATA ON M.AId = ADATA.AId

    ORDER BY M.OId

    OPTION (MAXRECURSION 0)

    Steve

    (aka sgmunson)

    :-):-):-)

    Matt Miller (#4) (2/1/2010)


    I'm having a bit of a low-brainpower day, but I think you can achieve what you're looking for using ROW_NUMBER(). If you use whatever you were passing as the GROUP BY clause in the PARTITION BY portiong of the ROW_NUMBER definition, you can then look for anything with ROW_NUMBER()>1 (for duplicates within that group based on the PARTITION lineup).

  • Again, if you would please post table defs (CREATE TABLE statements), sample data (series of INSERT INTO statements for the tables), and expected results that properly illustrate what it is you are trying to accomplish we could provide you with much better assistance.

  • Isn't this just a simple paging problem - return page 'n' from a set of ordered results?

    Hard to tell without a clearer explanation, sample data, and expected output, as Lynn keeps requesting.

    Happy to help if you make it a little easier to do so πŸ™‚

  • Matt Miller (#4) (2/1/2010)


    I'm having a bit of a low-brainpower day, but I think you can achieve what you're looking for using ROW_NUMBER(). If you use whatever you were passing as the GROUP BY clause in the PARTITION BY portiong of the ROW_NUMBER definition, you can then look for anything with ROW_NUMBER()>1 (for duplicates within that group based on the PARTITION lineup).

    COUNT(column) OVER(PARTITION BY column)

    will identify dupes in the recursive part of a recursive CTE where GROUP BY and subqueries can't be used.

    This looks like a valiant attempt at the balls / boxes quiz;-)

    Cheers

    ChrisM

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Nice idea on the windowed aggregate. Unfortunately, I don't see how you can use that kind of count to return a single record when you need to count how many records you're getting that are having a windowed aggregate computed.

    I've got code now that does this task using a procedural methodology, but it would be a lot more useful to get the recursion to work. If you can modify my provided base code to do what you're suggesting, please do. The objective is to limit the recursive selection to rows that aren't already in our selection, and as a result of looking at the PREFS table and eliminating all previous selections, find rows where only one record for a given AId exists.

    Steve

    (aka sgmunson)

    :-):-):-)

    Chris Morris-439714 (2/2/2010)


    Matt Miller (#4) (2/1/2010)


    I'm having a bit of a low-brainpower day, but I think you can achieve what you're looking for using ROW_NUMBER(). If you use whatever you were passing as the GROUP BY clause in the PARTITION BY portiong of the ROW_NUMBER definition, you can then look for anything with ROW_NUMBER()>1 (for duplicates within that group based on the PARTITION lineup).

    COUNT(column) OVER(PARTITION BY column)

    will identify dupes in the recursive part of a recursive CTE where GROUP BY and subqueries can't be used.

    This looks like a valiant attempt at the balls / boxes quiz;-)

    Cheers

    ChrisM

  • <<limit the recursive selection to rows that aren't already in our selection>>

    Hi Steve

    I'm not sure a recursive CTE will easily lend itself to this, because what your recursive query 'sees' as the 'content' of the CTE in any recursive iteration is the result of the last iteration. To get a visual of this, here's a CTE version of running totals:

    CREATE TABLE #Test ([ID] INT, val NUMERIC(10,2), rtot NUMERIC(10,2))

    INSERT INTO #Test VALUES (1, 1.2, NULL), (3, 3.1, NULL), (19, 7.5, NULL), (20, 0.21, NULL)

    ;WITH

    OrderedTest AS (SELECT RowID = ROW_NUMBER() OVER (ORDER BY [ID]), * FROM #Test),

    RunningTotal AS (

    SELECT RowID, [ID], val, CAST(val AS NUMERIC(10,2)) AS rt -- get the 'first' iteration

    FROM OrderedTest

    WHERE RowID = 1

    UNION ALL -- get a new iteration based upon the last iteration

    SELECT t.RowID, t.[ID], t.val, CAST(t.val + rt.rt AS NUMERIC(10,2))

    FROM OrderedTest t

    INNER JOIN RunningTotal rt ON t.RowID = rt.RowID + 1)

    UPDATE t SET rtot = r.rt

    FROM #Test t

    INNER JOIN RunningTotal r ON r.[ID] = t.[ID]

    SELECT * FROM #Test

    You can see from this, that for any one iteration, only the result of the last iteration is available to the next iteration. Otherwise there would be rather more rows in the output.

    If there's another way of putting together a CTE in the way you are describing, then I'd be interested too. Something like

    SELECT * FROM MyTable

    EXCEPT SELECT * FROM MyCTE (so far)

    Cheers

    ChrisM

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Actually, that concept is exactly what I'm hoping for - just eliminating the addition of any records from the PREFS table on each iteration unless there's only one record for a given AId that hasn't already been added. My procedural code can do this in 4 iterations with the data I'm working with, and I was hoping to just use recursion to do the same thing. I have yet to see how to limit the records coming in from PREFS to those not already in the set and that are then the remaining unique record(s) within PREFS based on AId. It just seems like such a perfect candidate for recursion, that I'm having a hard time accepting that there just might not be any way to do it.

    Steve

    (aka sgmunson)

    :-):-):-)

  • sgmunson (2/2/2010)


    I've got code now that does this task using a procedural methodology

    Given a procedural method, couldn't a CLR UDA be made to work quite nicely? If so, I'd bet a beer it'd blow the socks off a recursive CTE...

  • CLR is prohibited on the server in question, but even so, I'm not entirely sure that's true, as each iteration further limits the record selection until finally there aren't any more records to be added. Even a much larger dataset than I'm using might well have a relatively small number of iterations, as the conditions of the data guarantee that a solution exists that's unique.

    Steve

    (aka sgmunson)

    :-):-):-)

    Paul White (2/2/2010)


    sgmunson (2/2/2010)


    I've got code now that does this task using a procedural methodology

    Given a procedural method, couldn't a CLR UDA be made to work quite nicely? If so, I'd bet a beer it'd blow the socks off a recursive CTE...

  • sgmunson (2/2/2010)


    CLR is prohibited on the server in question, but even so, I'm not entirely sure that's true, as each iteration further limits the record selection until finally there aren't any more records to be added. Even a much larger dataset than I'm using might well have a relatively small number of iterations, as the conditions of the data guarantee that a solution exists that's unique.

    Ok Steve, I'll take your word for it. Unless the OP comes back and poses the question in a form comprhensible to me without spending an hour on it - then I might take another look, just for giggles.

Viewing 15 posts - 1 through 15 (of 37 total)

You must be logged in to reply to this topic. Login to reply