Linking to the Previous Row

  • Paul Paiva (2/12/2009)


    Excellent and clearly written!

    I'm always reading CTE articles and this is one of the best, because it explains a situation which is very commonly encountered.

    - Paul

    Thanks, Paul. I'm glad you liked it!

  • mishaluba (1/29/2009)


    Absolutely loved the article. I am sure everyone had to do something like this at one time or another. Thank you for sharing this technique and nice explanations.

    My pleasure - thank you for your kind words.

    David.

  • Very helpful article. Thank you.:-)

  • peter-757102 (1/23/2009)


    I understand your point of view as well as the theory you put forward as to why it might be different in case of my proof. Therefore I did the query once more on my modified view to see if the statistics and query plan where any different from the original. They were not and that is 100% proof that the original also generates no intermediate results.

    The reasons I feel it is important to change:

    1. Half the discussion goes about using CTEs and you confirmed it is one of the key ingredients that you build your case upon and is in fact a focus.

    2. The article is very easy to correct without negative effects.

    If I am right and I think I am, people new to CTE get put on the wrong track from the start. When they think a CTE is a means to get an intermediate result they will code with that in mind. Also, this forum is about learning and sharing knowledge, so it is important the information is as correct as possible. Therefore I hope you will reconsidder after taking a look at the code I posted. There is a case to be made to keep the article as is and add a correction at the bottom or top. This way the following discussion keeps making sense for new readers.

    Other then that, enjoy your weekend....I sure will 🙂

    Nice work, Peter, but I'm curious - if a CTE doesn't generate any kind of intermediate result, then how might a recursive CTE work? Bear in mind that for each recursion, a different rowset from the "intermediate result" may be exposed to the source table. I don't think this necessarily clashes with your evidence that CTE's could be views by another name.

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris Morris-439714 (2/8/2010)


    peter-757102 (1/23/2009)


    I understand your point of view as well as the theory you put forward as to why it might be different in case of my proof. Therefore I did the query once more on my modified view to see if the statistics and query plan where any different from the original. They were not and that is 100% proof that the original also generates no intermediate results.

    The reasons I feel it is important to change:

    1. Half the discussion goes about using CTEs and you confirmed it is one of the key ingredients that you build your case upon and is in fact a focus.

    2. The article is very easy to correct without negative effects.

    If I am right and I think I am, people new to CTE get put on the wrong track from the start. When they think a CTE is a means to get an intermediate result they will code with that in mind. Also, this forum is about learning and sharing knowledge, so it is important the information is as correct as possible. Therefore I hope you will reconsidder after taking a look at the code I posted. There is a case to be made to keep the article as is and add a correction at the bottom or top. This way the following discussion keeps making sense for new readers.

    Other then that, enjoy your weekend....I sure will 🙂

    Nice work, Peter, but I'm curious - if a CTE doesn't generate any kind of intermediate result, then how might a recursive CTE work? Bear in mind that for each recursion, a different rowset from the "intermediate result" may be exposed to the source table. I don't think this necessarily clashes with your evidence that CTE's could be views by another name.

    Cheers

    ChrisM

    As with any query, with or without views and/or CTEs, SQL Sever uses RAM for intermediate results and falls back to tempdb as needed (similar to table variables and derived tables). For small sets of data even complex constructs such as 3 nested derived feel very fast. I never did formal testing on this however, the situations I used this in just performed way better as I would expect with storage taking place.

  • peter-757102 (2/8/2010)


    Chris Morris-439714 (2/8/2010)


    peter-757102 (1/23/2009)


    Nice work, Peter, but I'm curious - if a CTE doesn't generate any kind of intermediate result, then how might a recursive CTE work? Bear in mind that for each recursion, a different rowset from the "intermediate result" may be exposed to the source table. I don't think this necessarily clashes with your evidence that CTE's could be views by another name.

    Cheers

    ChrisM

    As with any query, with or without views and/or CTEs, SQL Sever uses RAM for intermediate results and falls back to tempdb as needed (similar to table variables and derived tables). For small sets of data even complex constructs such as 3 nested derived feel very fast. I never did formal testing on this however, the situations I used this in just performed way better as I would expect with storage taking place.

    So you are still saying that it's incorrect to refer to an intermediate result set from a CTE? The reason I'm asking, to be clear, is because I don't know what to call the result set which is returned by a CTE in a recursive step, when each recursive call is quite likely to return a different result from the CTE.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Look at the recursive part as just as another expression that is transplanted everywhere it is being used. Just as would be the case if the CTE would be located in a view that is used by a query. It is a functional description of what you want to do, no different from a select, an update or any SQL statement or expression and thus treated identical.

    The optimizing process first transplants any view/CTE in the statement as if it is fully written out, then optimize the statement itself and then execute according to the compiled plan. Note that during compilation, optimizations might result in different execution plans for the same CTE if that CTE is used different in different parts of the consuming query (just as a view would).

    For a demonstration, any condition that changes the effective output of the CTE in the spot where it is used should do the trick. Like a top 1 in a select or when you use a field generated in the CTE by a sub select that you do not use in one spot, but do in the other. Such cases should become visible as different operations in the query plan.

  • This might be brought up earlier but same task can be accomplished by using simple VIEW without using CTE. In any case using row_num() in a separate query and recursive use of that query is a good idea.

    CREATE VIEW abc

    AS

    SELECT i.Item,

    ph.ItemId,

    ph.PriceStartDate,

    ph.Price,

    ROW_NUMBER() OVER ( PARTITION BY ph.ItemId

    ORDER BY PriceStartDate ) AS rownum

    FROM Items i

    INNER JOIN PriceHistory ph

    ON i.ItemId = ph.ItemId

    GO

    SELECT currow.Item,

    prevrow.Price AS OldPrice,

    currow.Price AS RangePrice,

    currow.PriceStartDate AS StartDate,

    nextrow.PriceStartDate AS EndDate

    FROM abc currow

    LEFT JOIN abc nextrow

    ON currow.rownum = nextrow.rownum - 1

    AND currow.ItemId = nextrow.ItemId

    LEFT JOIN abc prevrow

    ON currow.rownum = prevrow.rownum + 1

    AND currow.ItemId = prevrow.ItemId

  • Gursoy Yerli (8/17/2010)


    This might be brought up earlier but same task can be accomplished by using simple VIEW without using CTE. In any case using row_num() in a separate query and recursive use of that query is a good idea.

    CREATE VIEW abc

    AS

    SELECT i.Item,

    ph.ItemId,

    ph.PriceStartDate,

    ph.Price,

    ROW_NUMBER() OVER ( PARTITION BY ph.ItemId

    ORDER BY PriceStartDate ) AS rownum

    FROM Items i

    INNER JOIN PriceHistory ph

    ON i.ItemId = ph.ItemId

    GO

    SELECT currow.Item,

    prevrow.Price AS OldPrice,

    currow.Price AS RangePrice,

    currow.PriceStartDate AS StartDate,

    nextrow.PriceStartDate AS EndDate

    FROM abc currow

    LEFT JOIN abc nextrow

    ON currow.rownum = nextrow.rownum - 1

    AND currow.ItemId = nextrow.ItemId

    LEFT JOIN abc prevrow

    ON currow.rownum = prevrow.rownum + 1

    AND currow.ItemId = prevrow.ItemId

    Just keep in mind that whether it's a CTE or a view, since you're calling a query (view) twice, it will be executed twice. It may be a whole lot less expensive to dump to a temp table, first.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • deleted

  • CTE's cannot be replaced by mere views, see below why:

    1. If you need to query specific operations, that do not make any sense when viewed separately from the main query, then using a view would be silly.

    2. Considder ad-hoc / dynamic queries from applications, you don't want to create a view for each and every one of those first before executing the query, would you?

    3. A CTE can be used to implement recursion, expensive but faster then building your own procedure or non-inline function every time you need it.

    4. A CTE can just like any other select statement as part of a batch or function body, use parameters or local variables. This cannot be said for code in a view, with the exception of equal parameterized views, which are in fact inline user defined functions.

    A CTE is a very useful construct, which once you understand it, will be experienced as indispensible.

  • Thanks very much for the article I found it very useful,

    I use data which has to do with readmissions, and I can figure out how to use the code for that, what I would like to use it for is getting enddate (discharge date) and subtracting it from a new start date(admission date ) but where the start date occurs after the end date .

    It gets worse for people with multiple readmissions

  • It's a good introduction for ROW_NUMBER but the query plan it generates is a little horrible. I wouldn't recommend ROW_NUMBER() for this task.

    With the ROW_NUMBER() approach you can't use any INDEX when doing your LEFT JOINS back to your CTE.

    So what happens is that you end up performing at least 2 CI SCANS on priceHistory table, regardless of how selective the rest of your query is. The example you gave actually uses 3 CI scans -- one because you are using the whole of priceHistory and 2 from the LEFT JOINS back to the CTE.

    Perhaps a better approach would be to use another 2005+ construct -- OUTER APPLY

    Example:

    SELECT

    i.[item] AS [Item]

    , previousPrice.[Price] AS [Old Price]

    , ph.[price] AS [RangePrice]

    , ph.[priceStartDate] AS [Startdate]

    , nextPriceStart.[nextPriceStartDate] AS [EndDate]

    FROM

    items AS i

    JOIN priceHistory ph ON ph.[itemId] = i.[itemID]

    OUTER APPLY (

    SELECT TOP 1

    phNext.[priceStartDate] AS [nextPriceStartDate]

    FROM

    priceHistory AS phNext

    WHERE

    phNext.[itemId] = ph.[itemID]

    AND phNext.[priceStartDate] > ph.[priceStartDate]

    ORDER BY

    phNext.[priceStartDate] ASC

    )

    AS nextPriceStart

    OUTER APPLY (

    SELECT TOP 1

    phPrev.[price] AS [Price]

    FROM

    priceHistory phPrev

    WHERE

    phPrev.[itemId] = ph.[ItemID]

    AND phPrev.[priceStartDate] < ph.[priceStartDate]

    ORDER BY

    phPrev.[priceStartDate] DESC

    )

    AS previousPrice

    Which generates the same results but only does 1 CI scan (because we are using every row in priceHistory. If we were more selective (for example only doing this for vacuum cleaner's then there would be no scans and all seeks)

    Give it a whirl and compare the execution plans. I bet OUTER APPLY would be a lot faster on a large dataSet (and where you are being selective on the products / dates you want to bring back).

    Regards,

    Transact_Charlie.

  • charles.gildawie (8/20/2010)


    It's a good introduction for ROW_NUMBER but the query plan it generates is a little horrible. I wouldn't recommend ROW_NUMBER() for this task.

    With the ROW_NUMBER() approach you can't use any INDEX when doing your LEFT JOINS back to your CTE.

    So what happens is that you end up performing at least 2 CI SCANS on priceHistory table, regardless of how selective the rest of your query is. The example you gave actually uses 3 CI scans -- one because you are using the whole of priceHistory and 2 from the LEFT JOINS back to the CTE.

    Perhaps a better approach would be to use another 2005+ construct -- OUTER APPLY

    Thanks, Charlie,

    ..or as Jeff Moden suggests (and my experience has borne this out) create an index a temp table, and do a self join on that. The CTE results in pretty, compact and elegant looking code, and I do like and use them a lot. It enables you to write code quickly to do certain operations which would otherwise be very tricky. But on more than one occasion, I've been forced to replace CTEs with more 'traditional' approaches, such as the temp table one.

    Indeed, at the time of writing the article (early 2008), I was extremely enthusiastic and evangelical about CTEs. Thanks to the comments in this thread over the last 2 1/2 years, and my own experience over the same period, my enthusiasm is more guarded, and my advice more qualified. However the article was always intended as more of a tutorial and scenario on the use of CTEs, and to provide some insight into when you might find them applicable and useful.

    Regards,

    David.

  • I scrolled through the discussion and I didn't see any mention that CTEs can be stacked. This makes some operations a lot easier to read and maintain (format appropriately into a stack):

    WITH A AS (), B AS (SELECT * FROM A), C AS (SELECT * FROM B INNER JOIN A) SELECT * FROM C

    Yes, it may be slower than intermediate tables, but of course if you tweak something in an intermediate result, it's a lot less work since you don't have to go change the table structure.

Viewing 15 posts - 61 through 75 (of 147 total)

You must be logged in to reply to this topic. Login to reply