Usage of CTE - Trick with Dates

  • ChrisM@Work (6/18/2013)


    sagesmith (6/18/2013)


    Please post the times each approach takes, this is the only basis for a conclusion. As others have already mentioned, in many scenarios the cursor far outperforms the recursive cte.

    Over to you then. Sample data setup is a page or two back. I've compared iTally vs rCTE here:

    http://qa.sqlservercentral.com/Forums/FindPost1464526.aspx

    Feel free to write your fastest cursor-based solution πŸ˜‰

    πŸ˜€ That was really funny.

    But, I think( and truly hope ) it was a typo by the poster when he said "cursor far outperforms the recursive cte".

    If in case he really meant it, even I would love to be enlightened πŸ™‚


    Kingston Dhasian

    How to post data/code on a forum to get the best help - Jeff Moden
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • ChrisM@Work (6/18/2013)


    sagesmith (6/18/2013)


    Please post the times each approach takes, this is the only basis for a conclusion. As others have already mentioned, in many scenarios the cursor far outperforms the recursive cte.

    Over to you then. Sample data setup is a page or two back. I've compared iTally vs rCTE here:

    http://qa.sqlservercentral.com/Forums/FindPost1464526.aspx

    Feel free to write your fastest cursor-based solution πŸ˜‰

    My comment was for the author of the article who was comparing rCTE to Cursor. Your Tally approach is great for that sample scenario, no argument. I have found, as others have mentioned, that as you attempt to scale the rCTE approach the performance seems to go in an inverse relationship to the amount of rows. In these scenarios a Cursor or the Tally approach (inline or maybe faster as a Numbers table on disk?) does much better than the rCTE.

    I have seen these enlightened articles on rCTE before and I think they should come with a word of caution (unless I'm the one who is not enlightened? please prove me wrong).

  • If I may make a fine point:

    It is more accurate to say that in a multi-statement batch, the statement (query) preceding a CTE should be terminated by a semi-colon.

    I find the following just a little easier to grasp: (Change MonthCount to int)

    With CTE_Base (SeqNo, Begin_Date, Month_Count, Payment)

    as

    (select SeqNo, Date_Field AS Begin_Date,Month_Count, Payment from dbo.abc

    UNION all

    select SeqNo, dateadd(mm, 1, Begin_Date), Month_Count -1, Payment from CTE_Base

    where Month_Count > 1

    )

    SELECT SeqNo, Begin_Date, Month_Count, Payment

    FROM CTE_Base

    order by SeqNo, Begin_Date

  • your Tally kills the rCTE and hybrid Cursor/Numbers as you can see. On my box:

    Inline Tally = 146 milliseconds

    rCTE = 2113 milliseconds

    Cursor/Numbers = 1183 milliseconds

    The Cursor based solution is only partially so, I still couldn't bring myself to copy the author's while inner loop...yikes. But the point I'm trying to make is with bashing cursors:

    http://stackoverflow.com/questions/287445/why-do-people-hate-sql-cursors-so-much

    see most popular answer under The Size Issue heading.

    --If the table exists, drop it.

    if exists (select * from sys.tables

    where name = 'abc'

    and Schema_Name(schema_id) = 'dbo'

    and [type] = 'U')

    drop table dbo.abc

    go

    --Create the source table, dbo.abc.

    create table dbo.abc

    (SeqNo smallint

    ,Date_Field datetime

    ,Month_Count int

    ,Payment decimal(10,2))

    go

    --Populate the source table, dbo.abc

    insert into dbo.abc (SeqNo, Date_Field, Month_Count, Payment)

    values (1, '20090101', 95000, 100)

    ,(2, '20100101', 7000, 200)

    ,(3, '20110101', 5000, 300)

    go

    --If exists, drop the destination table dbo.def

    if exists (select * from sys.tables

    where name = 'def'

    and Schema_NAME(schema_id) = 'dbo'

    and [type] = 'U')

    drop table dbo.def

    go

    --Create the destination table, dbo.def

    create table dbo.def

    (SeqNo smallint

    ,Date_Field smalldatetime

    ,Payment decimal(10,2))

    go

    ------------ Inline Tally version ------------------

    -- black hole variables:

    DECLARE @SeqNo SMALLINT, @Date_Field DATETIME, @Payment DECIMAL (10,2)

    DECLARE @StartTime DATETIME;

    -- time store

    SET @StartTime = GETDATE()

    ;WITH E1(n) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), -- 10 rows

    E2(n) AS (SELECT 1 FROM E1 a, E1 b), -- 10x10 rows

    E3(n) AS (SELECT 1 FROM E2 a, E2 b, E2 c) -- 100x100x100 rows

    SELECT

    @SeqNo = SeqNo,

    @Date_Field = DATEADD(MONTH,x.n,Date_Field),

    @Payment = Payment

    FROM dbo.abc

    CROSS APPLY(SELECT TOP(Month_Count) n = -1+ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E3) x

    SELECT 'Inline Tally = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'

    PRINT '-------------------------------------------------------------------------'

    -------------- rCTE version ------------------------------

    SET @StartTime = GETDATE()

    ;with CTE_Base (SeqNo, Date_Field, Month_Count, Payment, Begin_Date, End_Date, Frequency)

    as

    (select SeqNo, Date_Field, Month_Count, Payment, Date_Field, dateadd(mm, Month_Count-1, Date_Field), 1 from dbo.abc

    union all

    select SeqNo, dateadd(mm, Frequency, Date_Field), Month_Count, Payment, Begin_Date, End_Date, Frequency

    from CTE_Base

    where dateadd(mm, Frequency, Date_Field) between Begin_Date and End_Date)

    select

    @SeqNo = SeqNo,

    @Date_Field = Date_Field,

    @Payment = Payment

    from CTE_Base

    where Date_Field between Begin_Date and End_Date

    order by SeqNo, Date_Field

    OPTION(MAXRECURSION 0)

    SELECT 'rCTE = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'

    PRINT '-------------------------------------------------------------------------'

    -------------- CURSOR version ------------------------------

    SET @StartTime = GETDATE()

    declare @l_SeqNo int

    ,@l_date_field DATETIME

    ,@l_Month_Count int

    ,@l_Payment decimal(10, 2)

    ,@l_counter SMALLINT

    ,@l_max_month_count INT = (SELECT MAX(Month_Count) FROM dbo.abc);

    SELECT TOP (@l_max_month_count) IDENTITY(int,1,1) AS number

    into #Numbers

    FROM sys.objects s1

    CROSS JOIN sys.objects s2

    select @l_counter = 0

    set nocount on;

    declare i_Cursor CURSOR FAST_FORWARD

    FOR

    select SeqNo, Date_Field, Month_Count, Payment from dbo.abc

    open i_Cursor

    fetch next from i_Cursor into

    @l_SeqNo

    ,@l_date_field

    ,@l_Month_Count

    ,@l_Payment

    while @@fetch_status = 0

    begin

    select number, dateadd(mm, number-1, @l_date_field), @l_Payment

    FROM #Numbers

    WHERE number<=@l_Month_Count

    fetch next from i_Cursor into

    @l_SeqNo

    ,@l_date_field

    ,@l_Month_Count

    ,@l_Payment

    end

    close i_Cursor

    deallocate i_Cursor

    set nocount off;

    SELECT 'Cursor/Numbers = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'

    PRINT '-------------------------------------------------------------------------'

    DROP TABLE #Numbers;

  • sagesmith (6/18/2013)


    ChrisM@Work (6/18/2013)


    sagesmith (6/18/2013)


    Please post the times each approach takes, this is the only basis for a conclusion. As others have already mentioned, in many scenarios the cursor far outperforms the recursive cte.

    Over to you then. Sample data setup is a page or two back. I've compared iTally vs rCTE here:

    http://qa.sqlservercentral.com/Forums/FindPost1464526.aspx

    Feel free to write your fastest cursor-based solution πŸ˜‰

    My comment was for the author of the article who was comparing rCTE to Cursor. Your Tally approach is great for that sample scenario, no argument. I have found, as others have mentioned, that as you attempt to scale the rCTE approach the performance seems to go in an inverse relationship to the amount of rows. In these scenarios a Cursor or the Tally approach (inline or maybe faster as a Numbers table on disk?) does much better than the rCTE.

    I have seen these enlightened articles on rCTE before and I think they should come with a word of caution (unless I'm the one who is not enlightened? please prove me wrong).

    I can show you rCTE's scaling well and performing remarkably well too - threads and articles are linked in Dwain's article in my signature below. One or two will surprise you including a super-fast distinct documented by Paul White, and very fast running totals. I can tell you haven't yet read it πŸ˜‰

    I've seen rCTE's performing poorly against other coding methods too. Even posted one or two. What I haven't seen is this "inverse relationship" i.e. where the rCTE fails to scale. Do you have an example?

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Jonathan AC Roberts (6/18/2013)


    It can also be done even more simply with an INNER JOIN and tally table (and is faster than the rCTE version)

    SELECT A.Seqno,

    DATEADD(month, T.N, A.Date_field) Date_Field,

    A.Payment

    FROM dbo.abc A

    INNER JOIN dbo.tsqlc_Tally T

    ON T.N < A.Month_Count

    I use this tally table it starts from zero and has just over a million rows.

    That's the solution I was thinking about while reading this scenario.

    I was also wondering if this was close to real code or an entirely contrived example - I don't understand the business case for creating data like this.

    Also, what is the impact on logs for potentially large (10M+) insert-all-at-once versus the cursor solution? Was the target table being used in a transactional system? If yes, the locks held to do a large insert might be felt by transaction customers while the row-locks for a single insert might not. Another extreme use-case would be cross-server inserts - distributed transaction overhead can become prohibitive... and sometimes is sidestepped by calling a 4part name SP with cursor variables.

    fwiw: Of course I'm an advocate of set-based solutions. These are just a few examples of why the default answer for almost any performance question is "it depends" πŸ™‚

  • If "Show Actual Execution Plan" is set prior to executing the code and then both the queries (the CURSOR based one and the recursive CTE based one) are executed, you will note that the query cost of the conventional CURSOR based approach to this problem's solution is around 91% where as it is around 9% with the recursive CTE based solution. That is almost a ten fold savings in resource utilization with the recursive CTE based solution.

    As far as I understand, query costs are only comparable within the single execution plan. You can compare execution plans to determine how the optimizer handles the different queries, but the percent values are only relative to a single execution plan. 9% of a 10 minute query is more than 91% of a 10 second query. To compare different queries, you need to use more absolute metrics such as I/O or CPU time.

    Wes
    (A solid design is always preferable to a creative workaround)

  • One other thing:

    Please note that I have used TOP 100 PERCENT in the SELECT statement in this function because I want to retain the ORDER BY clause, which is otherwise not allowed in InLine table-values functions.

    Top 100 Percent... Order by no longer has any effect on the result. SQL 2008 ignores that combination. You can read more here.

    http://blog.sqlauthority.com/2009/11/24/sql-server-interesting-observation-top-100-percent-and-order-by/

    Wes
    (A solid design is always preferable to a creative workaround)

  • whenriksen (6/18/2013)


    If "Show Actual Execution Plan" is set prior to executing the code and then both the queries (the CURSOR based one and the recursive CTE based one) are executed, you will note that the query cost of the conventional CURSOR based approach to this problem's solution is around 91% where as it is around 9% with the recursive CTE based solution. That is almost a ten fold savings in resource utilization with the recursive CTE based solution.

    As far as I understand, query costs are only comparable within the single execution plan. You can compare execution plans to determine how the optimizer handles the different queries, but the percent values are only relative to a single execution plan. 9% of a 10 minute query is more than 91% of a 10 second query. To compare different queries, you need to use more absolute metrics such as I/O or CPU time.

    Comparing query costs in this manner simply doesn't work with rCTE's, the relative cost is almost always wildly underestimated. In the example I posted a few back, the inline tally solution and the rCTE solution were costed at very roughly 50% each - about 30-fold out.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • What I haven't seen is this "inverse relationship" i.e. where the rCTE fails to scale. Do you have an example?

    The first article by Jeff Moden in your signature makes the point pretty well.

  • sagesmith (6/18/2013)


    What I haven't seen is this "inverse relationship" i.e. where the rCTE fails to scale. Do you have an example?

    The first article by Jeff Moden in your signature makes the point pretty well.

    Forum etiquette? :blush: have I upset someone?

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi Mike,

    Thanks for the nice discussion.

    This was indeed a real life example in Finance domain. I just broke down its various components for creating a simple example. In the real scenario, we had around 63 million records to process like this. In other words, there were 63 million distinct SeqNo's in the source table, dbo.abc. But the value of month count would never go above 12, the number of months in a financial year. This was just a one time process though. Of course, we had other fields in our source table that made the logic more complex. But for this article, all I did was remove the unnecessary complexity out so that the concept could be brought forward. In our case, the cursor based apporach took roughly six hours for execution. On the other hand, the rCTE based solution took only 3 minutes. I agree that I could have used the Tally table as well. In our real case, I was satisfied with the rCTE usage as a solution to this problem. Tally would have been even better performance-wise but I did not want to create another physical table or a temp table to store the Tally records. My self-imposed goal was to use only existing tables.

    ------Thanks and Best Regards,

    Suresh.


    Kindest Regards,

    M Suresh Kumar

  • I think you missed the point that chris was making, if you look at his example he uses an Inline CTE to act as a Tally table.

    As you have a fixed number of Periods (12) you could have done something like

    SELECT

    Tabe.columnlist

    , DATEADD(MONTH,y.N,Table.aDate) Period

    From

    Table

    CROSS APPLY (SELECT TOP(MONTH_COUNT) N-1

    FROM (VALUES(1),(2),(3),(4),(5),(6)

    ,(7),(8),(9),(10),(11),(12))

    x (N)) y

    if you really wanted you could get rid of the N-1 by starting the VALUES seqence from 0-11.

    If you want a large tally table then Chris's CROSS JOIN CTE method works well.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Jason-299789 (6/18/2013)


    I think you missed the point that chris was making, if you look at his example he uses an Inline CTE to act as a Tally table.

    As you have a fixed number of Periods (12) you could have done something like

    SELECT

    Tabe.columnlist

    , DATEADD(MONTH,y.N,Table.aDate) Period

    From

    Table

    CROSS APPLY (SELECT TOP(MONTH_COUNT) N-1

    FROM (VALUES(1),(2),(3),(4),(5),(6)

    ,(7),(8),(9),(10),(11),(12))

    x (N)) y

    if you really wanted you could get rid of the N-1 by starting the VALUES seqence from 0-11.

    If you want a large tally table then Chris's CROSS JOIN CTE method works well.

    Hi Jason, I completely agree with your method above and Chris's method with Tally inside CTE. There is no doubt that both the methods would be faster. I was just replying to Mike's query above where the Tally table is being separately built and populated prior to being used in the main query.

    Another reason why I chose a set-based approach rather than a row-by-row insert in the cursor-based approach is that in my scenario, I did not want to lock the destination table dbo.def around seven hundred million times, once per insert (I had 63 million distinct rows in dbo.abc, the source table). In the set-based approach, it would lock dbo.def only once and then release the lock post the insert operation. I preferred a minor inconvenience for 3 minutes over a nagging problem for 6 hours. I am sure that with Tally, the 3 minutes would get even further reduced. Great solution. Thanks to you and Chris for the insights.

    ----Thanks and Best Regards,

    Suresh.


    Kindest Regards,

    M Suresh Kumar

  • Please forgive my lack of etiquette. I meant to reference this article by Jeff Moden[/url].

    I am humbly submitting this code for review by the gurus here on this forum. I think it might show that the CURSOR might be unfairly cast as the villain for this problem. By using the CURSOR in a similar way and just changing the INNER WHILE loop I am able to halve the time it takes the rCTE on my box. Is the CURSOR, as it is used in the example, the bottleneck?

Viewing 15 posts - 16 through 30 (of 68 total)

You must be logged in to reply to this topic. Login to reply