Linking to the Previous Row

Question

Post reply

Linking to the Previous Row

Viewing 15 posts - 61 through 75 (of 147 total)

You must be logged in to reply to this topic. Login to reply

David McKinney SSChampion Points: 10469 More actions · Answer 1

David McKinney

SSChampion

Points: 10469

February 12, 2009 at 9:45 am

#943002

Paul Paiva (2/12/2009)

Excellent and clearly written!

I'm always reading CTE articles and this is one of the best, because it explains a situation which is very commonly encountered.

- Paul

Thanks, Paul. I'm glad you liked it!

David McKinney SSChampion Points: 10469 More actions · Answer 2

David McKinney

SSChampion

Points: 10469

February 12, 2009 at 9:49 am

#943007

mishaluba (1/29/2009)

Absolutely loved the article. I am sure everyone had to do something like this at one time or another. Thank you for sharing this technique and nice explanations.

My pleasure - thank you for your kind words.

David.

Ashu Mozh SSC Eights! Points: 820 More actions · Answer 3

Ashu Mozh

SSC Eights!

Points: 820

May 13, 2009 at 8:39 am

#993784

Very helpful article. Thank you.:-)

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

peter-757102 (1/23/2009)

I understand your point of view as well as the theory you put forward as to why it might be different in case of my proof. Therefore I did the query once more on my modified view to see if the statistics and query plan where any different from the original. They were not and that is 100% proof that the original also generates no intermediate results.

The reasons I feel it is important to change:

1. Half the discussion goes about using CTEs and you confirmed it is one of the key ingredients that you build your case upon and is in fact a focus.

2. The article is very easy to correct without negative effects.

If I am right and I think I am, people new to CTE get put on the wrong track from the start. When they think a CTE is a means to get an intermediate result they will code with that in mind. Also, this forum is about learning and sharing knowledge, so it is important the information is as correct as possible. Therefore I hope you will reconsidder after taking a look at the code I posted. There is a case to be made to keep the article as is and add a correction at the bottom or top. This way the following discussion keeps making sense for new readers.

Other then that, enjoy your weekend....I sure will 🙂

Nice work, Peter, but I'm curious - if a CTE doesn't generate any kind of intermediate result, then how might a recursive CTE work? Bear in mind that for each recursion, a different rowset from the "intermediate result" may be exposed to the source table. I don't think this necessarily clashes with your evidence that CTE's could be views by another name.

Cheers

ChrisM

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

peter-757102 SSCertifiable Points: 6877 More actions · Answer 5

Chris Morris-439714 (2/8/2010)

peter-757102 (1/23/2009)

I understand your point of view as well as the theory you put forward as to why it might be different in case of my proof. Therefore I did the query once more on my modified view to see if the statistics and query plan where any different from the original. They were not and that is 100% proof that the original also generates no intermediate results.

The reasons I feel it is important to change:

1. Half the discussion goes about using CTEs and you confirmed it is one of the key ingredients that you build your case upon and is in fact a focus.

2. The article is very easy to correct without negative effects.

If I am right and I think I am, people new to CTE get put on the wrong track from the start. When they think a CTE is a means to get an intermediate result they will code with that in mind. Also, this forum is about learning and sharing knowledge, so it is important the information is as correct as possible. Therefore I hope you will reconsidder after taking a look at the code I posted. There is a case to be made to keep the article as is and add a correction at the bottom or top. This way the following discussion keeps making sense for new readers.

Other then that, enjoy your weekend....I sure will 🙂

Nice work, Peter, but I'm curious - if a CTE doesn't generate any kind of intermediate result, then how might a recursive CTE work? Bear in mind that for each recursion, a different rowset from the "intermediate result" may be exposed to the source table. I don't think this necessarily clashes with your evidence that CTE's could be views by another name.

Cheers

ChrisM

As with any query, with or without views and/or CTEs, SQL Sever uses RAM for intermediate results and falls back to tempdb as needed (similar to table variables and derived tables). For small sets of data even complex constructs such as 3 nested derived feel very fast. I never did formal testing on this however, the situations I used this in just performed way better as I would expect with storage taking place.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

peter-757102 (2/8/2010)

Chris Morris-439714 (2/8/2010)

peter-757102 (1/23/2009)

Nice work, Peter, but I'm curious - if a CTE doesn't generate any kind of intermediate result, then how might a recursive CTE work? Bear in mind that for each recursion, a different rowset from the "intermediate result" may be exposed to the source table. I don't think this necessarily clashes with your evidence that CTE's could be views by another name.

Cheers

ChrisM

As with any query, with or without views and/or CTEs, SQL Sever uses RAM for intermediate results and falls back to tempdb as needed (similar to table variables and derived tables). For small sets of data even complex constructs such as 3 nested derived feel very fast. I never did formal testing on this however, the situations I used this in just performed way better as I would expect with storage taking place.

So you are still saying that it's incorrect to refer to an intermediate result set from a CTE? The reason I'm asking, to be clear, is because I don't know what to call the result set which is returned by a CTE in a recursive step, when each recursive call is quite likely to return a different result from the CTE.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

peter-757102 SSCertifiable Points: 6877 More actions · Answer 7

Look at the recursive part as just as another expression that is transplanted everywhere it is being used. Just as would be the case if the CTE would be located in a view that is used by a query. It is a functional description of what you want to do, no different from a select, an update or any SQL statement or expression and thus treated identical.

The optimizing process first transplants any view/CTE in the statement as if it is fully written out, then optimize the statement itself and then execute according to the compiled plan. Note that during compilation, optimizations might result in different execution plans for the same CTE if that CTE is used different in different parts of the consuming query (just as a view would).

For a demonstration, any condition that changes the effective output of the CTE in the spot where it is used should do the trick. Like a top 1 in a select or when you use a field generated in the CTE by a sub select that you do not use in one spot, but do in the other. Such cases should become visible as different operations in the query plan.

Gursoy Yerli Valued Member Points: 51 More actions · Answer 8

This might be brought up earlier but same task can be accomplished by using simple VIEW without using CTE. In any case using row_num() in a separate query and recursive use of that query is a good idea.

CREATE VIEW abc

AS

SELECT i.Item,

ph.ItemId,

ph.PriceStartDate,

ph.Price,

ROW_NUMBER() OVER ( PARTITION BY ph.ItemId

ORDER BY PriceStartDate ) AS rownum

FROM Items i

INNER JOIN PriceHistory ph

ON i.ItemId = ph.ItemId

GO

SELECT currow.Item,

prevrow.Price AS OldPrice,

currow.Price AS RangePrice,

currow.PriceStartDate AS StartDate,

nextrow.PriceStartDate AS EndDate

FROM abc currow

LEFT JOIN abc nextrow

ON currow.rownum = nextrow.rownum - 1

AND currow.ItemId = nextrow.ItemId

LEFT JOIN abc prevrow

ON currow.rownum = prevrow.rownum + 1

AND currow.ItemId = prevrow.ItemId

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 9

Gursoy Yerli (8/17/2010)

This might be brought up earlier but same task can be accomplished by using simple VIEW without using CTE. In any case using row_num() in a separate query and recursive use of that query is a good idea.

CREATE VIEW abc

AS

SELECT i.Item,

ph.ItemId,

ph.PriceStartDate,

ph.Price,

ROW_NUMBER() OVER ( PARTITION BY ph.ItemId

ORDER BY PriceStartDate ) AS rownum

FROM Items i

INNER JOIN PriceHistory ph

ON i.ItemId = ph.ItemId

GO

SELECT currow.Item,

prevrow.Price AS OldPrice,

currow.Price AS RangePrice,

currow.PriceStartDate AS StartDate,

nextrow.PriceStartDate AS EndDate

FROM abc currow

LEFT JOIN abc nextrow

ON currow.rownum = nextrow.rownum - 1

AND currow.ItemId = nextrow.ItemId

LEFT JOIN abc prevrow

ON currow.rownum = prevrow.rownum + 1

AND currow.ItemId = prevrow.ItemId

Just keep in mind that whether it's a CTE or a view, since you're calling a query (view) twice, it will be executed twice. It may be a whole lot less expensive to dump to a temp table, first.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Stuart Davis Old Hand Points: 342 More actions · Answer 10

Stuart Davis

Old Hand

Points: 342

August 20, 2010 at 2:57 am

#1209529

deleted

peter-757102 SSCertifiable Points: 6877 More actions · Answer 11

CTE's cannot be replaced by mere views, see below why:

1. If you need to query specific operations, that do not make any sense when viewed separately from the main query, then using a view would be silly.

2. Considder ad-hoc / dynamic queries from applications, you don't want to create a view for each and every one of those first before executing the query, would you?

3. A CTE can be used to implement recursion, expensive but faster then building your own procedure or non-inline function every time you need it.

4. A CTE can just like any other select statement as part of a batch or function body, use parameters or local variables. This cannot be said for code in a view, with the exception of equal parameterized views, which are in fact inline user defined functions.

A CTE is a very useful construct, which once you understand it, will be experienced as indispensible.

niyinks SSC Enthusiast Points: 190 More actions · Answer 12

Thanks very much for the article I found it very useful,

I use data which has to do with readmissions, and I can figure out how to use the code for that, what I would like to use it for is getting enddate (discharge date) and subtracting it from a new start date(admission date ) but where the start date occurs after the end date .

It gets worse for people with multiple readmissions

charles.gildawie Grasshopper Points: 22 More actions · Answer 13

It's a good introduction for ROW_NUMBER but the query plan it generates is a little horrible. I wouldn't recommend ROW_NUMBER() for this task.

With the ROW_NUMBER() approach you can't use any INDEX when doing your LEFT JOINS back to your CTE.

So what happens is that you end up performing at least 2 CI SCANS on priceHistory table, regardless of how selective the rest of your query is. The example you gave actually uses 3 CI scans -- one because you are using the whole of priceHistory and 2 from the LEFT JOINS back to the CTE.

Perhaps a better approach would be to use another 2005+ construct -- OUTER APPLY

Example:



SELECT

	i.[item] AS [Item]

	, previousPrice.[Price] AS [Old Price]

	, ph.[price] AS [RangePrice]

	, ph.[priceStartDate] AS [Startdate]

	, nextPriceStart.[nextPriceStartDate] AS [EndDate]

FROM

	items AS i

	JOIN priceHistory ph ON ph.[itemId] = i.[itemID]

	OUTER APPLY (

		SELECT TOP 1

			phNext.[priceStartDate] AS [nextPriceStartDate]

		FROM

			priceHistory AS phNext

		WHERE

			phNext.[itemId] = ph.[itemID]

			AND phNext.[priceStartDate] > ph.[priceStartDate]

		ORDER BY

			phNext.[priceStartDate] ASC

		)

		AS nextPriceStart

	OUTER APPLY (

		SELECT TOP 1

			phPrev.[price] AS [Price]

		FROM

			priceHistory phPrev

		WHERE

			phPrev.[itemId] = ph.[ItemID]

			AND phPrev.[priceStartDate] < ph.[priceStartDate]

		ORDER BY

			phPrev.[priceStartDate] DESC

		)

		AS previousPrice

Which generates the same results but only does 1 CI scan (because we are using every row in priceHistory. If we were more selective (for example only doing this for vacuum cleaner's then there would be no scans and all seeks)

Give it a whirl and compare the execution plans. I bet OUTER APPLY would be a lot faster on a large dataSet (and where you are being selective on the products / dates you want to bring back).

Regards,

Transact_Charlie.

David McKinney SSChampion Points: 10469 More actions · Answer 14

charles.gildawie (8/20/2010)

It's a good introduction for ROW_NUMBER but the query plan it generates is a little horrible. I wouldn't recommend ROW_NUMBER() for this task.

With the ROW_NUMBER() approach you can't use any INDEX when doing your LEFT JOINS back to your CTE.

So what happens is that you end up performing at least 2 CI SCANS on priceHistory table, regardless of how selective the rest of your query is. The example you gave actually uses 3 CI scans -- one because you are using the whole of priceHistory and 2 from the LEFT JOINS back to the CTE.

Perhaps a better approach would be to use another 2005+ construct -- OUTER APPLY

Thanks, Charlie,

..or as Jeff Moden suggests (and my experience has borne this out) create an index a temp table, and do a self join on that. The CTE results in pretty, compact and elegant looking code, and I do like and use them a lot. It enables you to write code quickly to do certain operations which would otherwise be very tricky. But on more than one occasion, I've been forced to replace CTEs with more 'traditional' approaches, such as the temp table one.

Indeed, at the time of writing the article (early 2008), I was extremely enthusiastic and evangelical about CTEs. Thanks to the comments in this thread over the last 2 1/2 years, and my own experience over the same period, my enthusiasm is more guarded, and my advice more qualified. However the article was always intended as more of a tutorial and scenario on the use of CTEs, and to provide some insight into when you might find them applicable and useful.

Regards,

David.

Cade Roux Ten Centuries Points: 1326 More actions · Answer 15

I scrolled through the discussion and I didn't see any mention that CTEs can be stacked. This makes some operations a lot easier to read and maintain (format appropriately into a stack):

WITH A AS (), B AS (SELECT * FROM A), C AS (SELECT * FROM B INNER JOIN A) SELECT * FROM C

Yes, it may be slower than intermediate tables, but of course if you tweak something in an intermediate result, it's a lot less work since you don't have to go change the table structure.