Recursive CTE: Is it possible to count records without an aggregate function?

Question

Post reply

Recursive CTE: Is it possible to count records without an aggregate function?

sgmunson

SSC Guru

Points: 110588
More actions
February 1, 2010 at 2:34 pm

#216067

Here's the scenario - I have a recursive CTE, where the anchor member uses GROUP BY and COUNT, and the ideal would be if I could just repeat an ever so slightly different version of that anchor as the recursive piece. The idea is to select items from a table by limiting the selection to any rows that meet the criteria but have not already been selected, such that each recursion eliminates enough records from the source by virtue of the criteria, which prevents selection of items already in the set, but gets items that would represent unique additions to the overall set.

This is something that would be a piece of cake with procedural code, but the objective is to do it with entirely set-based code. Any ideas? Any way to use EXCEPT or INTERSECT in some unique way to determine whether a subquery returns more than one record or not - again, without using any aggregate functions or methods?

Steve

(aka sgmunson)

:-):-):-)

Steve?(aka sgmunson)?:) 🙂 :)?
Health & Nutrition
Rent Servers for Income (picks and shovels strategy)

Viewing 15 posts - 1 through 15 (of 37 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 1

Lynn Pettis

SSC Guru

Points: 442447

February 1, 2010 at 2:40 pm

#1111959

Care to provide an example of what you are talking about? Table def(s) (CREATE TABLE statement(s)), sample data (series of INSERT INTO statements for the table(s)), expected results based on the sample data, and the code you currently have written would go a long way to getting you a lot of help.

sgmunson SSC Guru Points: 110588 More actions · Answer 2

Well... let's see... I can't tear out the base logic from the actual code, so the best I can do is this:

Assume you have a table with two values, each of which is used as a foreign key into a separate table. Those separate tables have an identical number of records, and they exist solely to tie a character value (a name, or label) associated with the foreign key value. Then you have a selection table that identifies the which records from one table are allowed to match up with a record from the other table - it's sole contents are the two keys for those other two tables. There can be multiple possible matches for a given key value of the first table to match with values from the second table, however, ultimately, only one possible selection of values from each table will satisfy the "conditions" specified by the selection table. I'm looking at a recursive CTE like this:



;WITH MATCHES AS (

	SELECT OId, MIN(Ad) AS AId

	FROM Prefs

	GROUP BY OId

	HAVING COUNT(DISTINCT AId) = 1

	UNION ALL

	SELECT P.OId, P.AId

	FROM Prefs P INNER JOIN MATCHES M

		ON P.OId <> M.OId AND P.AId <> M.AId

	WHERE EXISTS (

			SELECT 1

			FROM Prefs T

			WHERE T.OId = P.OId

				AND T.AId <> P.AId

				AND NOT EXISTS (

					SELECT 1

					FROM Prefs T2

					WHERE T2.OId = P.OId

						AND T2.AId <> P.AId

						AND T2.AId <> T.AId

				)

		)

)

SELECT ODATA.Name AS OName, ADATA.Name As AName

FROM MATCHES M1 INNER JOIN ODATA ON M.OId = ODATA.OId

	INNER JOIN ADATA ON M.AId = ADATA.AId

ORDER BY M.OId

OPTION (MAXRECURSION 0)

The problem with this code is that it will fail because the NOT EXISTS isn't going to quite operate like a cross apply. I just need some kind of condition such that the first EXISTS clause can be restricted to only those situations where that query returns only one record. If it can be done, let me know... My gut is starting to think it just isn't possible.

Steve

(aka sgmunson)

:-):-):-)

Lynn Pettis (2/1/2010)

Care to provide an example of what you are talking about? Table def(s) (CREATE TABLE statement(s)), sample data (series of INSERT INTO statements for the table(s)), expected results based on the sample data, and the code you currently have written would go a long way to getting you a lot of help.

Steve?(aka sgmunson)?:) 🙂 :)?
Health & Nutrition
Rent Servers for Income (picks and shovels strategy)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 3

I'm having a bit of a low-brainpower day, but I think you can achieve what you're looking for using ROW_NUMBER(). If you use whatever you were passing as the GROUP BY clause in the PARTITION BY portiong of the ROW_NUMBER definition, you can then look for anything with ROW_NUMBER()>1 (for duplicates within that group based on the PARTITION lineup).

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 4

Lynn Pettis

SSC Guru

Points: 442447

February 1, 2010 at 3:31 pm

#1111990

I'm sorry, but I really need something to work with to help. Tables, sample data, expected results all would help greatly with understanding the problem.

sgmunson SSC Guru Points: 110588 More actions · Answer 5

Matt,

I really thought I had it licked when I read your post. I added a ROW_NUMBER() function (alias RN) based on your post to the original selection table, so that I could take the NOT EXISTS clause and compare the Prefs table's RN values to ensure no record existed with the same values of OId and AId but a larger RN. It seems I'm still in an infinite loop, and I'm not quite sure why... If you have any further general advice, it would be appreciated.

Here's the updated code base for the concept:



;WITH MATCHES AS (

        SELECT OId, MIN(AId) AS AId, MIN(RN) AS RN

        FROM PREFS

        GROUP BY OId

        HAVING COUNT(DISTINCT AId) = 1

        UNION ALL

        SELECT P.OId, P.AId

        FROM PREFS P INNER JOIN MATCHES M

                ON P.OId <> M.OId AND P.AId <> M.AId

        WHERE EXISTS (

                        SELECT 1

                        FROM PREFS P2

                        WHERE P2.OId = P.OId

                                AND P2.AId <> P.AId

                                AND NOT EXISTS (

                                        SELECT 1

                                        FROM PREFS P3

                                        WHERE P3.OId = P2.OId

                                                AND P3.AId = P2.AId

                                                AND P3.RN > P2.RN

                                )

                )

)

SELECT ODATA.Name AS OName, ADATA.Name As AName

FROM MATCHES M1 INNER JOIN ODATA ON M.OId = ODATA.OId

        INNER JOIN ADATA ON M.AId = ADATA.AId

ORDER BY M.OId

OPTION (MAXRECURSION 0)

Steve

(aka sgmunson)

:-):-):-)

Matt Miller (#4) (2/1/2010)

I'm having a bit of a low-brainpower day, but I think you can achieve what you're looking for using ROW_NUMBER(). If you use whatever you were passing as the GROUP BY clause in the PARTITION BY portiong of the ROW_NUMBER definition, you can then look for anything with ROW_NUMBER()>1 (for duplicates within that group based on the PARTITION lineup).

Steve?(aka sgmunson)?:) 🙂 :)?
Health & Nutrition
Rent Servers for Income (picks and shovels strategy)

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 6

Lynn Pettis

SSC Guru

Points: 442447

February 1, 2010 at 6:21 pm

#1112035

Again, if you would please post table defs (CREATE TABLE statements), sample data (series of INSERT INTO statements for the tables), and expected results that properly illustrate what it is you are trying to accomplish we could provide you with much better assistance.

Paul White SSC Guru Points: 150457 More actions · Answer 7

Paul White

SSC Guru

Points: 150457

February 2, 2010 at 4:00 am

#1112164

Isn't this just a simple paging problem - return page 'n' from a set of ordered results?

Hard to tell without a clearer explanation, sample data, and expected output, as Lynn keeps requesting.

Happy to help if you make it a little easier to do so 🙂

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 8

Matt Miller (#4) (2/1/2010)

I'm having a bit of a low-brainpower day, but I think you can achieve what you're looking for using ROW_NUMBER(). If you use whatever you were passing as the GROUP BY clause in the PARTITION BY portiong of the ROW_NUMBER definition, you can then look for anything with ROW_NUMBER()>1 (for duplicates within that group based on the PARTITION lineup).

COUNT(column) OVER(PARTITION BY column)

will identify dupes in the recursive part of a recursive CTE where GROUP BY and subqueries can't be used.

This looks like a valiant attempt at the balls / boxes quiz;-)

Cheers

ChrisM

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

sgmunson SSC Guru Points: 110588 More actions · Answer 9

Nice idea on the windowed aggregate. Unfortunately, I don't see how you can use that kind of count to return a single record when you need to count how many records you're getting that are having a windowed aggregate computed.

I've got code now that does this task using a procedural methodology, but it would be a lot more useful to get the recursion to work. If you can modify my provided base code to do what you're suggesting, please do. The objective is to limit the recursive selection to rows that aren't already in our selection, and as a result of looking at the PREFS table and eliminating all previous selections, find rows where only one record for a given AId exists.

Steve

(aka sgmunson)

:-):-):-)

Chris Morris-439714 (2/2/2010)

Matt Miller (#4) (2/1/2010)

I'm having a bit of a low-brainpower day, but I think you can achieve what you're looking for using ROW_NUMBER(). If you use whatever you were passing as the GROUP BY clause in the PARTITION BY portiong of the ROW_NUMBER definition, you can then look for anything with ROW_NUMBER()>1 (for duplicates within that group based on the PARTITION lineup).

COUNT(column) OVER(PARTITION BY column)

will identify dupes in the recursive part of a recursive CTE where GROUP BY and subqueries can't be used.

This looks like a valiant attempt at the balls / boxes quiz;-)

Cheers

ChrisM

Steve?(aka sgmunson)?:) 🙂 :)?
Health & Nutrition
Rent Servers for Income (picks and shovels strategy)

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 10

<<limit the recursive selection to rows that aren't already in our selection>>

Hi Steve

I'm not sure a recursive CTE will easily lend itself to this, because what your recursive query 'sees' as the 'content' of the CTE in any recursive iteration is the result of the last iteration. To get a visual of this, here's a CTE version of running totals:

CREATE TABLE #Test ([ID] INT, val NUMERIC(10,2), rtot NUMERIC(10,2))

INSERT INTO #Test VALUES (1, 1.2, NULL), (3, 3.1, NULL), (19, 7.5, NULL), (20, 0.21, NULL)

;WITH

OrderedTest AS (SELECT RowID = ROW_NUMBER() OVER (ORDER BY [ID]), * FROM #Test),

RunningTotal AS (

SELECT RowID, [ID], val, CAST(val AS NUMERIC(10,2)) AS rt -- get the 'first' iteration

FROM OrderedTest

WHERE RowID = 1

UNION ALL -- get a new iteration based upon the last iteration

SELECT t.RowID, t.[ID], t.val, CAST(t.val + rt.rt AS NUMERIC(10,2))

FROM OrderedTest t

INNER JOIN RunningTotal rt ON t.RowID = rt.RowID + 1)

UPDATE t SET rtot = r.rt

FROM #Test t

INNER JOIN RunningTotal r ON r.[ID] = t.[ID]

SELECT * FROM #Test

You can see from this, that for any one iteration, only the result of the last iteration is available to the next iteration. Otherwise there would be rather more rows in the output.

If there's another way of putting together a CTE in the way you are describing, then I'd be interested too. Something like

SELECT * FROM MyTable

EXCEPT SELECT * FROM MyCTE (so far)

Cheers

ChrisM

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

sgmunson SSC Guru Points: 110588 More actions · Answer 11

Actually, that concept is exactly what I'm hoping for - just eliminating the addition of any records from the PREFS table on each iteration unless there's only one record for a given AId that hasn't already been added. My procedural code can do this in 4 iterations with the data I'm working with, and I was hoping to just use recursion to do the same thing. I have yet to see how to limit the records coming in from PREFS to those not already in the set and that are then the remaining unique record(s) within PREFS based on AId. It just seems like such a perfect candidate for recursion, that I'm having a hard time accepting that there just might not be any way to do it.

Steve

(aka sgmunson)

:-):-):-)

Steve?(aka sgmunson)?:) 🙂 :)?
Health & Nutrition
Rent Servers for Income (picks and shovels strategy)

Paul White SSC Guru Points: 150457 More actions · Answer 12

Paul White

SSC Guru

Points: 150457

February 2, 2010 at 10:42 am

#1112417

sgmunson (2/2/2010)

I've got code now that does this task using a procedural methodology

Given a procedural method, couldn't a CLR UDA be made to work quite nicely? If so, I'd bet a beer it'd blow the socks off a recursive CTE...

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

sgmunson SSC Guru Points: 110588 More actions · Answer 13

CLR is prohibited on the server in question, but even so, I'm not entirely sure that's true, as each iteration further limits the record selection until finally there aren't any more records to be added. Even a much larger dataset than I'm using might well have a relatively small number of iterations, as the conditions of the data guarantee that a solution exists that's unique.

Steve

(aka sgmunson)

:-):-):-)

Paul White (2/2/2010)

sgmunson (2/2/2010)

I've got code now that does this task using a procedural methodology

Given a procedural method, couldn't a CLR UDA be made to work quite nicely? If so, I'd bet a beer it'd blow the socks off a recursive CTE...

Steve?(aka sgmunson)?:) 🙂 :)?
Health & Nutrition
Rent Servers for Income (picks and shovels strategy)

Paul White SSC Guru Points: 150457 More actions · Answer 14

sgmunson (2/2/2010)

CLR is prohibited on the server in question, but even so, I'm not entirely sure that's true, as each iteration further limits the record selection until finally there aren't any more records to be added. Even a much larger dataset than I'm using might well have a relatively small number of iterations, as the conditions of the data guarantee that a solution exists that's unique.

Ok Steve, I'll take your word for it. Unless the OP comes back and poses the question in a form comprhensible to me without spending an hour on it - then I might take another look, just for giggles.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi