Usage of CTE - Trick with Dates

  • sagesmith (6/18/2013)


    Please forgive my lack of etiquette.

    I am humbly submitting this code for review by the gurus here on this forum. I think it might show that the CURSOR might be unfairly cast as the villain for this problem. By using the CURSOR in a similar way and just changing the INNER WHILE loop I am able to halve the time it takes the rCTE on my box. Is the CURSOR, as it is used in the example, the bottleneck?

    Yes, but I would like to point out that your cursor is only selecting 3 rows from a table with only 3 rows. So the overall processing time of the cursor is tiny.

    Would your solution scale out if the number of rows in table abc were massively increased as in the real case given by Suresh Kumar?

  • Yes, but I would like to point out that your cursor is only selecting 3 rows from a table with only 3 rows. So the overall processing time of the cursor is tiny.

    Would your solution scale out if the number of rows in table abc were massively increased as in the real case given by Suresh Kumar?

    You're absolutely right. I am working with the data as provided in the example (I've taken the liberty to up the Month count to a ridiculous level like ChrisM@Work). Now that we are learning more about the problem then what was in the original article, the analysis might make a little more sense. As written it seemed to be a confusing indictment of cursors.

    Regardless, the 3 rows are the same for the rCTE being used, so on a relative basis I think the point stands.

  • 999,999 rows in abc to preserve the inline CTE method as is. Enforced the max 12 month count we now know as well as the insert.

    results on my machine:

    Inline Tally = 12.97 seconds

    rCTE = 187.983 seconds

    Cursor/Classic Tally= 69.66 seconds

    on a relative basis the spreads aren't getting more compelling for rCTE as the row count goes up.

    --If the table exists, drop it.

    IF OBJECT_ID('tempdb..#Results') IS NOT NULL DROP TABLE #Results

    go

    --If the table exists, drop it.

    IF OBJECT_ID('tempdb..#Numbers') IS NOT NULL DROP TABLE #Numbers

    go

    --If the table exists, drop it.

    if exists (select * from sys.tables

    where name = 'abc'

    and Schema_Name(schema_id) = 'dbo'

    and [type] = 'U')

    drop table dbo.abc

    go

    --Populate and create the source table, dbo.abc

    SELECT TOP (999999) IDENTITY(int,1,1) AS SeqNo,

    CAST('20090101' AS SMALLDATETIME) AS Date_Field, 12 AS Month_Count, 100.00 AS Payment

    INTO dbo.abc

    FROM sys.objects s1

    CROSS JOIN sys.objects s2

    go

    --If exists, drop the destination table dbo.def

    if exists (select * from sys.tables

    where name = 'def'

    and Schema_NAME(schema_id) = 'dbo'

    and [type] = 'U')

    drop table dbo.def

    go

    --Create the destination table, dbo.def

    create table dbo.def

    (SeqNo smallint

    ,Date_Field smalldatetime

    ,Payment decimal(10,2))

    go

    ------------ Inline Tally version ------------------

    -- black hole variables:

    DECLARE @SeqNo INT, @Date_Field DATETIME, @Payment DECIMAL (10,2)

    DECLARE @StartTime DATETIME;

    -- time store

    SET @StartTime = GETDATE()

    ;WITH E1(n) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), -- 10 rows

    E2(n) AS (SELECT 1 FROM E1 a, E1 b), -- 10x10 rows

    E3(n) AS (SELECT 1 FROM E2 a, E2 b, E2 c) -- 100x100x100 rows

    SELECT

    SeqNo,

    DATEADD(MONTH,x.n,Date_Field) AS Date_Field,

    Payment

    INTO #Results

    FROM dbo.abc

    CROSS APPLY(SELECT TOP(Month_Count) n = -1+ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E3) x

    SELECT 'Inline Tally = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'

    PRINT '-------------------------------------------------------------------------'

    --If the table exists, drop it.

    IF OBJECT_ID('tempdb..#Results') IS NOT NULL DROP TABLE #Results

    go

    -------------- rCTE version ------------------------------

    DECLARE @StartTime DATETIME = GETDATE()

    ;with CTE_Base (SeqNo, Date_Field, Month_Count, Payment, Begin_Date, End_Date, Frequency)

    as

    (select SeqNo, Date_Field, Month_Count, Payment, Date_Field, dateadd(mm, Month_Count-1, Date_Field), 1 from dbo.abc

    union all

    select SeqNo, dateadd(mm, Frequency, Date_Field), Month_Count, Payment, Begin_Date, End_Date, Frequency

    from CTE_Base

    where dateadd(mm, Frequency, Date_Field) between Begin_Date and End_Date)

    select

    SeqNo,

    Date_Field,

    Payment

    INTO #Results

    from CTE_Base

    where Date_Field between Begin_Date and End_Date

    order by SeqNo, Date_Field

    OPTION(MAXRECURSION 0)

    SELECT 'rCTE = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'

    PRINT '-------------------------------------------------------------------------'

    --If the table exists, drop it.

    IF OBJECT_ID('tempdb..#Results') IS NOT NULL DROP TABLE #Results

    go

    -------------- CURSOR version ------------------------------

    DECLARE @StartTime DATETIME = GETDATE()

    declare @l_SeqNo int

    ,@l_date_field DATETIME

    ,@l_Month_Count int

    ,@l_Payment decimal(10, 2)

    ,@l_counter SMALLINT

    ,@l_max_month_count INT = (SELECT MAX(Month_Count) FROM dbo.abc)

    CREATE TABLE #Results (SeqNo INT NOT NULL, Date_Field SMALLDATETIME, Payment DECIMAL(10,2));

    SELECT TOP (@l_max_month_count) IDENTITY(int,1,1) AS number

    into #Numbers

    FROM sys.objects s1

    CROSS JOIN sys.objects s2

    select @l_counter = 0

    set nocount on;

    declare i_Cursor CURSOR FAST_FORWARD

    FOR

    select SeqNo, Date_Field, Month_Count, Payment from dbo.abc

    open i_Cursor

    fetch next from i_Cursor into

    @l_SeqNo

    ,@l_date_field

    ,@l_Month_Count

    ,@l_Payment

    while @@fetch_status = 0

    BEGIN

    INSERT INTO #Results

    select number, dateadd(mm, number-1, @l_date_field), @l_Payment

    FROM #Numbers

    WHERE number<=@l_Month_Count

    fetch next from i_Cursor into

    @l_SeqNo

    ,@l_date_field

    ,@l_Month_Count

    ,@l_Payment

    end

    close i_Cursor

    deallocate i_Cursor

    set nocount off;

    SELECT 'Cursor/Numbers = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'

    PRINT '-------------------------------------------------------------------------'

    DROP TABLE #Results

    GO

    DROP TABLE #Numbers

    GO

  • sagesmith (6/18/2013)

    on a relative basis the spreads aren't getting more compelling for rCTE as the row count goes up.

    Yes, thinking about it, the runtime for the all of the methods (cursor, rCTE and in-line tally version) should all go up with a linear relationship to the number of rows in the table abc as it is just doing the same work multiplied by the number of rows in table abc. Computer scientists would use big O notation and say it goes up Order n {O(n)} with the number of rows on table abc.

  • Suresh Kumar Maganti

    I agree the bulk insert method is the better way, especially with the One off lock on the destination table, rather than the 700 million lock and commits you were doing using a cursor.

    I suppose the angle Chris and others, myself included) are coming from are one of simplicity, and making queries as simple as possible helps those who take over support.

    Dont get me wrong its a creative solution, I just think the article could have been refined a little to show a better progression of SQL, such as Cursor vs Recursive CTE vs Tally.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Jason-299789 (6/19/2013)


    Suresh Kumar Maganti

    I agree the bulk insert method is the better way, especially with the One off lock on the destination table, rather than the 700 million lock and commits you were doing using a cursor.

    I suppose the angle Chris and others, myself included) are coming from are one of simplicity, and making queries as simple as possible helps those who take over support.

    Dont get me wrong its a creative solution, I just think the article could have been refined a little to show a better progression of SQL, such as Cursor vs Recursive CTE vs Tally.

    Hi Jason,

    Thanks for the constructive feedback. I really do appreciate the same. I certainly would keep the guidance in mind when writing next.

    --Thanks and Best Regards,

    Suresh.


    Kindest Regards,

    M Suresh Kumar

  • nycdotnet (6/18/2013)


    Interesting article. Regarding the paradox of semicolons on CTEs - the rule is "the statement before the CTE has to end with a semicolon". The reason is that the WITH keyword is overloaded in T-SQL. Without the semicolon, SQL can't tell if you're saying with as a statement start or qualifying something about the previous statement.

    People have gotten used to putting a semicolon in front of the WITH, but this is not required - you can just as easily put it at the end of your last statement.

    This also explains why this causes an error in an inline Table-value function. Inline table-value functions must be only a single statement. Therefore if you use a semi-colon in it (which is the statement terminator symbol), then you are causing the TVF to have two statements and it breaks.

    Thank you for your excellent explanation about the semi-colon and its purpose. (I have seen so many comments about putting the semi-colon at the beginning of a CTE expression.)

  • whenriksen (6/18/2013)


    One other thing:

    Please note that I have used TOP 100 PERCENT in the SELECT statement in this function because I want to retain the ORDER BY clause, which is otherwise not allowed in InLine table-values functions.

    Top 100 Percent... Order by no longer has any effect on the result. SQL 2008 ignores that combination. You can read more here.

    http://blog.sqlauthority.com/2009/11/24/sql-server-interesting-observation-top-100-percent-and-order-by/

    Absolutely true. They changed the optimizer to think that ORDER BY isn't needed when TOP 100% is used as an "optimization" so it doesn't work anymore. However, TOP 2147483647 (Largest number for an integer) still works.

    Even with that, I agree that if you need the final result set to be sorted, you cannot rely on the sort of a sub-query to do it for you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • "Being a CTE, its code must be preceded by a semicolon."

    That's not quite true. Would rather suggest:

    Any preceding statement must be terminated by a semicolon.

    In in inline CTE, there is no preceding statement, consequently no semicolon,

    and no paradox either.

    Anyway, appreciate the lot of thought you have given to the subject.

  • In this case I would use a simple tally table:

    DECLARE @source TABLE (SegNo int, Date_Field smalldatetime, Month_Count int, Payment decimal(10,2))

    DECLARE @tally TABLE (N int)

    INSERT INTO @tally(N)

    SELECT ROW_NUMBER() OVER (ORDER BY column_id) - 1

    FROM sys.columns

    INSERT INTO @source

    values (1, '20090101', 10, 100)

    ,(2, '20100101', 7, 200)

    ,(3, '20110101', 5, 300)

    SELECT ss.SegNo, ca.Date_Field, ss.Payment

    FROM @source ss

    CROSS APPLY (SELECT DATEADD(MONTH, N, ss.Date_Field) FROM @tally WHERE N < ss.Month_Count) ca(Date_Field)

    ORDER BY ss.SegNo, ca.Date_Field

  • Interesting Solution way back i was looking such solution for printing bar Code Labels for given no of item for given number of times.

    some expert gentlemen on MSDN forum had helped me to solve it using CTE

    here is the code

    ALTER PROCEDURE [dbo].[sp_ItemLables]

    (

    @repeatlables numeric( 3,0),

    @blanklables int,

    @lbls2prn itemforlabels readonly

    )

    AS

    WITH

    Blanks_Lines(N)

    AS

    (

    SELECT

    1

    WHERE

    @blanklables >= 1

    UNION ALL

    SELECT

    N + 1

    FROM

    Blanks_Lines

    WHERE

    N < @blanklables

    ),

    Numbers(N)

    AS

    (

    SELECT

    1

    UNION ALL

    SELECT

    N + 1

    FROM

    Numbers

    WHERE

    N < @repeatlables

    )

    SELECT

    NULL as line1,

    NULL as line2,

    NULL as line3,

    NULL as line4,

    NULL as line5,

    NULL as line6,

    NULL as line7,

    NULL as line8,

    NULL as ui

    FROM

    Blanks_Lines

    UNION ALL

    SELECT

    ItemList.Line1,

    ItemList.Line2,

    ItemList.Line3,

    ItemList.Line4,

    ItemList.Line5,

    ItemList.Line6,

    ItemList.Line7,

    ItemList.Line8,

    itemlist.id

    FROM

    @lbls2prn b,

    dbo.Labels itemlist,

    Numbers

    WHERE

    -- Join itemlist to get the labels for the passed list.

    itemlist.id = b.id

    OPTION

    (MAXRECURSION 0);

  • asushil (5/15/2015)


    Interesting Solution way back i was looking such solution for printing bar Code Labels for given no of item for given number of times.

    some expert gentlemen on MSDN forum had helped me to solve it using CTE

    here is the code

    ALTER PROCEDURE [dbo].[sp_ItemLables]

    (

    @repeatlables numeric( 3,0),

    @blanklables int,

    @lbls2prn itemforlabels readonly

    )

    AS

    WITH

    Blanks_Lines(N)

    AS

    (

    SELECT

    1

    WHERE

    @blanklables >= 1

    UNION ALL

    SELECT

    N + 1

    FROM

    Blanks_Lines

    WHERE

    N < @blanklables

    ),

    Numbers(N)

    AS

    (

    SELECT

    1

    UNION ALL

    SELECT

    N + 1

    FROM

    Numbers

    WHERE

    N < @repeatlables

    )

    SELECT

    NULL as line1,

    NULL as line2,

    NULL as line3,

    NULL as line4,

    NULL as line5,

    NULL as line6,

    NULL as line7,

    NULL as line8,

    NULL as ui

    FROM

    Blanks_Lines

    UNION ALL

    SELECT

    ItemList.Line1,

    ItemList.Line2,

    ItemList.Line3,

    ItemList.Line4,

    ItemList.Line5,

    ItemList.Line6,

    ItemList.Line7,

    ItemList.Line8,

    itemlist.id

    FROM

    @lbls2prn b,

    dbo.Labels itemlist,

    Numbers

    WHERE

    -- Join itemlist to get the labels for the passed list.

    itemlist.id = b.id

    OPTION

    (MAXRECURSION 0);

    I'd be interested in seeing the DDL, some sample data, and expected results based on the sample data to see if the recursion could be eliminated.

  • asushil (5/15/2015)


    Interesting Solution way back i was looking such solution for printing bar Code Labels for given no of item for given number of times.

    some expert gentlemen on MSDN forum had helped me to solve it using CTE

    here is the code

    The same result could have been achieved in much more straight forward manner if you'd just used a tally table:

    ALTER PROCEDURE [dbo].[sp_ItemLables]

    (

    @repeatlables numeric(3,0),

    @blanklables int,

    @lbls2prn itemforlabels readonly

    )

    AS

    BEGIN

    SELECT NULL as line1,

    NULL as line2,

    NULL as line3,

    NULL as line4,

    NULL as line5,

    NULL as line6,

    NULL as line7,

    NULL as line8,

    NULL as ui

    FROM dbo.Tally

    WHERE N <= @blanklables

    UNION ALL

    SELECT ItemList.Line1,

    ItemList.Line2,

    ItemList.Line3,

    ItemList.Line4,

    ItemList.Line5,

    ItemList.Line6,

    ItemList.Line7,

    ItemList.Line8,

    itemlist.id

    FROM @lbls2prn b

    INNER JOIN dbo.Labels itemlist

    ON itemlist.id = b.id -- Join itemlist to get the labels for the passed list.

    CROSS APPLY (SELECT TOP(@repeatlables) * FROM dbo.Tally) AS A

    END

    GO

  • Thomas Schutte (5/15/2015)


    In this case I would use a simple tally table:

    DECLARE @source TABLE (SegNo int, Date_Field smalldatetime, Month_Count int, Payment decimal(10,2))

    DECLARE @tally TABLE (N int)

    INSERT INTO @tally(N)

    SELECT ROW_NUMBER() OVER (ORDER BY column_id) - 1

    FROM sys.columns

    INSERT INTO @source

    values (1, '20090101', 10, 100)

    ,(2, '20100101', 7, 200)

    ,(3, '20110101', 5, 300)

    SELECT ss.SegNo, ca.Date_Field, ss.Payment

    FROM @source ss

    CROSS APPLY (SELECT DATEADD(MONTH, N, ss.Date_Field) FROM @tally WHERE N < ss.Month_Count) ca(Date_Field)

    ORDER BY ss.SegNo, ca.Date_Field

    I love the way you think - a tally table-based solution will be much better, hands down. That was my first thought when I read this article. A couple things to point out though. ChrisM beat you to it a couple years ago (look at the first couple pages of this thread.) Second, there are a few issues with how you are using a tally table:

    First, I would not use a temp table to for a tally table like you did, you are slowing things down by inserting rows into it and then you're doing a table scan against it for your solution. Ideally you want a well-indexed tally table in your schema or you can create one on the fly using a CTE. Compare the query plan for the queries below....

    -- Your way:

    DECLARE @source TABLE (SegNo int, Date_Field smalldatetime, Month_Count int, Payment decimal(10,2))

    DECLARE @tally TABLE (N int)

    INSERT INTO @tally(N)

    SELECT ROW_NUMBER() OVER (ORDER BY column_id) - 1

    FROM sys.columns;

    SELECT TOP(12) N FROM @tally;

    -- CTE Tally

    WITH iTally(N) AS

    (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1 -- No Sort will be performed

    FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) x(N)

    )

    SELECT *

    FROM iTally;

    Your solution (Queries 1 & 2) are hugely expensive relative to the CTE tally. In this case, because we only need the numbers 0-12 rows, you don't even need to use ROW_NUMBER(); you could just so this:

    SELECT * FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11)) x(N);

    The performance improvement is minuscule but you'll get a cleaner query plan:

    Lastly, the ORDER BY is not necessary and adds a lot overhead for no benefit. Below is your solution and an improved version.

    DECLARE @source TABLE (SegNo int, Date_Field smalldatetime, Month_Count int, Payment decimal(10,2));

    DECLARE @tally TABLE (N int);

    INSERT INTO @tally(N)

    SELECT ROW_NUMBER() OVER (ORDER BY column_id) - 1

    FROM sys.columns;

    INSERT INTO @source

    values (1, '20090101', 10, 100)

    ,(2, '20100101', 7, 200)

    ,(3, '20110101', 5, 300)

    -- ORIGINAL SOLUTION

    SELECT ss.SegNo, ca.Date_Field, ss.Payment

    FROM @source ss

    CROSS APPLY (SELECT DATEADD(MONTH, N, ss.Date_Field) FROM @tally WHERE N < ss.Month_Count) ca(Date_Field)

    ORDER BY ss.SegNo, ca.Date_Field;

    -- IMPROVED SOLUTION: tally on-the-fly, no sort

    SELECT ss.SegNo, ca.Date_Field, ss.Payment

    FROM @source ss

    CROSS APPLY

    (

    SELECT DATEADD(MONTH, N, ss.Date_Field)

    FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11)) Tally(N) -- here's you tally table

    WHERE N < ss.Month_Count

    ) ca(Date_Field)

    --ORDER BY ss.SegNo, ca.Date_Field;

    GO

    Forgive my unsolicited advice, just showing how to make your tally table faster. 😉

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • The "TOP 100 PERCENT" seems to be ignored by the optimizer. The "TOP verylargernumber" can produce the recordset in the "expected" order. However, this is incidental. If the plan has to sort the records for a selection, then that just happens to be the order at that time. A plan that has other following steps or a plan that goes parallel will not add an additional sort to insure the final order matches the initial selective order. Another, thought - what's the final order of a query with two sub-queries each with a selective order by? The answers is in whatever order that the optimizer selects as most efficient. After data growth, the optimizer might select a different index and the result could be a different order.

    Randy
    Helpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

Viewing 15 posts - 31 through 45 (of 68 total)

You must be logged in to reply to this topic. Login to reply