Usage of CTE - Trick with Dates

Question

Post reply

Usage of CTE - Trick with Dates

Viewing 15 posts - 31 through 45 (of 68 total)

You must be logged in to reply to this topic. Login to reply

Jonathan AC Roberts SSCoach Points: 17601 More actions · Answer 1

sagesmith (6/18/2013)

Please forgive my lack of etiquette.

I am humbly submitting this code for review by the gurus here on this forum. I think it might show that the CURSOR might be unfairly cast as the villain for this problem. By using the CURSOR in a similar way and just changing the INNER WHILE loop I am able to halve the time it takes the rCTE on my box. Is the CURSOR, as it is used in the example, the bottleneck?

Yes, but I would like to point out that your cursor is only selecting 3 rows from a table with only 3 rows. So the overall processing time of the cursor is tiny.

Would your solution scale out if the number of rows in table abc were massively increased as in the real case given by Suresh Kumar?

sagesmith SSC Veteran Points: 251 More actions · Answer 2

Yes, but I would like to point out that your cursor is only selecting 3 rows from a table with only 3 rows. So the overall processing time of the cursor is tiny.

Would your solution scale out if the number of rows in table abc were massively increased as in the real case given by Suresh Kumar?

You're absolutely right. I am working with the data as provided in the example (I've taken the liberty to up the Month count to a ridiculous level like ChrisM@Work). Now that we are learning more about the problem then what was in the original article, the analysis might make a little more sense. As written it seemed to be a confusing indictment of cursors.

Regardless, the 3 rows are the same for the rCTE being used, so on a relative basis I think the point stands.

sagesmith SSC Veteran Points: 251 More actions · Answer 3

999,999 rows in abc to preserve the inline CTE method as is. Enforced the max 12 month count we now know as well as the insert.

results on my machine:

Inline Tally = 12.97 seconds

rCTE = 187.983 seconds

Cursor/Classic Tally= 69.66 seconds

on a relative basis the spreads aren't getting more compelling for rCTE as the row count goes up.



--If the table exists, drop it.

IF OBJECT_ID('tempdb..#Results') IS NOT NULL DROP TABLE #Results

go

--If the table exists, drop it.

IF OBJECT_ID('tempdb..#Numbers') IS NOT NULL DROP TABLE #Numbers

go

--If the table exists, drop it.

if exists (select * from sys.tables

           where name = 'abc'

           and   Schema_Name(schema_id) = 'dbo'

           and   [type] = 'U')

drop table dbo.abc

go

--Populate and create the source table, dbo.abc

SELECT TOP (999999) IDENTITY(int,1,1) AS SeqNo,

			CAST('20090101' AS SMALLDATETIME) AS Date_Field, 12 AS Month_Count, 100.00 AS Payment

INTO dbo.abc

FROM sys.objects s1

CROSS JOIN sys.objects s2

go

--If exists, drop the destination table dbo.def

if exists (select * from sys.tables

           where name = 'def'

           and   Schema_NAME(schema_id) = 'dbo'

           and   [type] = 'U')

drop table dbo.def

go

--Create the destination table, dbo.def

create table dbo.def

(SeqNo smallint

,Date_Field smalldatetime

,Payment decimal(10,2))

go

------------ Inline Tally version ------------------

-- black hole variables:

DECLARE @SeqNo INT, @Date_Field DATETIME, @Payment DECIMAL (10,2)

DECLARE @StartTime DATETIME;

-- time store

SET @StartTime = GETDATE()

;WITH E1(n) AS (

	SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

	SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

	SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), -- 10 rows

E2(n) AS (SELECT 1 FROM E1 a, E1 b),									-- 10x10 rows

E3(n) AS (SELECT 1 FROM E2 a, E2 b, E2 c)								-- 100x100x100 rows 

SELECT

	SeqNo,

	DATEADD(MONTH,x.n,Date_Field) AS Date_Field,

	Payment

INTO #Results

FROM dbo.abc

CROSS APPLY(SELECT TOP(Month_Count) n = -1+ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E3) x

SELECT 'Inline Tally = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'

PRINT '-------------------------------------------------------------------------'

--If the table exists, drop it.

IF OBJECT_ID('tempdb..#Results') IS NOT NULL DROP TABLE #Results

go

-------------- rCTE version ------------------------------

DECLARE @StartTime DATETIME = GETDATE()

;with CTE_Base (SeqNo, Date_Field, Month_Count, Payment, Begin_Date, End_Date, Frequency)

as

(select SeqNo, Date_Field, Month_Count, Payment, Date_Field, dateadd(mm, Month_Count-1, Date_Field), 1 from dbo.abc

union all

select SeqNo, dateadd(mm, Frequency, Date_Field), Month_Count, Payment, Begin_Date, End_Date, Frequency

from CTE_Base

where dateadd(mm, Frequency, Date_Field) between Begin_Date and End_Date)

select

	SeqNo,

	Date_Field,

	Payment

INTO #Results

from CTE_Base

where Date_Field between Begin_Date and End_Date

order by SeqNo, Date_Field

OPTION(MAXRECURSION 0)

SELECT 'rCTE = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'

PRINT '-------------------------------------------------------------------------'

--If the table exists, drop it.

IF OBJECT_ID('tempdb..#Results') IS NOT NULL DROP TABLE #Results

go

-------------- CURSOR version ------------------------------

DECLARE @StartTime DATETIME = GETDATE() 

declare @l_SeqNo int

           ,@l_date_field DATETIME

           ,@l_Month_Count int

           ,@l_Payment decimal(10, 2)

           ,@l_counter SMALLINT

		   ,@l_max_month_count INT = (SELECT MAX(Month_Count) FROM dbo.abc)

CREATE TABLE #Results (SeqNo INT NOT NULL, Date_Field SMALLDATETIME, Payment DECIMAL(10,2));        

SELECT TOP (@l_max_month_count) IDENTITY(int,1,1) AS number

into #Numbers

FROM sys.objects s1

CROSS JOIN sys.objects s2

select @l_counter = 0

set nocount on;

declare i_Cursor CURSOR FAST_FORWARD

FOR

select SeqNo, Date_Field, Month_Count, Payment from dbo.abc

open i_Cursor

fetch next from i_Cursor into

                             @l_SeqNo

                            ,@l_date_field

                            ,@l_Month_Count

                            ,@l_Payment

while @@fetch_status = 0

BEGIN

		INSERT INTO #Results

		select number, dateadd(mm, number-1, @l_date_field), @l_Payment

		FROM #Numbers

		WHERE number<=@l_Month_Count

      fetch next from i_Cursor into

                                   @l_SeqNo

                                  ,@l_date_field

                                  ,@l_Month_Count

                                  ,@l_Payment

end

close i_Cursor

deallocate i_Cursor

set nocount off;

SELECT 'Cursor/Numbers = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'

PRINT '-------------------------------------------------------------------------'

DROP TABLE #Results

GO

DROP TABLE #Numbers

GO

Jonathan AC Roberts SSCoach Points: 17601 More actions · Answer 4

sagesmith (6/18/2013)

on a relative basis the spreads aren't getting more compelling for rCTE as the row count goes up.

Yes, thinking about it, the runtime for the all of the methods (cursor, rCTE and in-line tally version) should all go up with a linear relationship to the number of rows in the table abc as it is just doing the same work multiplied by the number of rows in table abc. Computer scientists would use big O notation and say it goes up Order n {O(n)} with the number of rows on table abc.

Jason-299789 SSC-Insane Points: 21601 More actions · Answer 5

Suresh Kumar Maganti

I agree the bulk insert method is the better way, especially with the One off lock on the destination table, rather than the 700 million lock and commits you were doing using a cursor.

I suppose the angle Chris and others, myself included) are coming from are one of simplicity, and making queries as simple as possible helps those who take over support.

Dont get me wrong its a creative solution, I just think the article could have been refined a little to show a better progression of SQL, such as Cursor vs Recursive CTE vs Tally.

_________________________________________________________________________
SSC Guide to Posting and Best Practices

Suresh Kumar Maganti Default port Points: 1429 More actions · Answer 6

Jason-299789 (6/19/2013)

Suresh Kumar Maganti

I agree the bulk insert method is the better way, especially with the One off lock on the destination table, rather than the 700 million lock and commits you were doing using a cursor.

I suppose the angle Chris and others, myself included) are coming from are one of simplicity, and making queries as simple as possible helps those who take over support.

Dont get me wrong its a creative solution, I just think the article could have been refined a little to show a better progression of SQL, such as Cursor vs Recursive CTE vs Tally.

Hi Jason,

Thanks for the constructive feedback. I really do appreciate the same. I certainly would keep the guidance in mind when writing next.

--Thanks and Best Regards,

Suresh.

Kindest Regards,

M Suresh Kumar

SqlOnMyMind SSCertifiable Points: 5060 More actions · Answer 7

nycdotnet (6/18/2013)

Interesting article. Regarding the paradox of semicolons on CTEs - the rule is "the statement before the CTE has to end with a semicolon". The reason is that the WITH keyword is overloaded in T-SQL. Without the semicolon, SQL can't tell if you're saying with as a statement start or qualifying something about the previous statement.

People have gotten used to putting a semicolon in front of the WITH, but this is not required - you can just as easily put it at the end of your last statement.

This also explains why this causes an error in an inline Table-value function. Inline table-value functions must be only a single statement. Therefore if you use a semi-colon in it (which is the statement terminator symbol), then you are causing the TVF to have two statements and it breaks.

Thank you for your excellent explanation about the semi-colon and its purpose. (I have seen so many comments about putting the semi-colon at the beginning of a CTE expression.)

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 8

whenriksen (6/18/2013)

One other thing:

Please note that I have used TOP 100 PERCENT in the SELECT statement in this function because I want to retain the ORDER BY clause, which is otherwise not allowed in InLine table-values functions.

Top 100 Percent... Order by no longer has any effect on the result. SQL 2008 ignores that combination. You can read more here.

http://blog.sqlauthority.com/2009/11/24/sql-server-interesting-observation-top-100-percent-and-order-by/

Absolutely true. They changed the optimizer to think that ORDER BY isn't needed when TOP 100% is used as an "optimization" so it doesn't work anymore. However, TOP 2147483647 (Largest number for an integer) still works.

Even with that, I agree that if you need the final result set to be sorted, you cannot rely on the sort of a sub-query to do it for you.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

h.tobisch SSCommitted Points: 1671 More actions · Answer 9

"Being a CTE, its code must be preceded by a semicolon."

That's not quite true. Would rather suggest:

Any preceding statement must be terminated by a semicolon.

In in inline CTE, there is no preceding statement, consequently no semicolon,

and no paradox either.

Anyway, appreciate the lot of thought you have given to the subject.

Thomas Schutte SSChasing Mays Points: 620 More actions · Answer 10

In this case I would use a simple tally table:

DECLARE @source TABLE (SegNo int, Date_Field smalldatetime, Month_Count int, Payment decimal(10,2))



DECLARE @tally TABLE (N int)

INSERT INTO @tally(N)

SELECT ROW_NUMBER() OVER (ORDER BY column_id) - 1

FROM sys.columns

INSERT INTO @source

values (1, '20090101', 10, 100)

      ,(2, '20100101', 7, 200)

      ,(3, '20110101', 5, 300)

SELECT ss.SegNo, ca.Date_Field, ss.Payment

FROM @source ss

CROSS APPLY (SELECT DATEADD(MONTH, N, ss.Date_Field) FROM @tally WHERE N < ss.Month_Count) ca(Date_Field)

ORDER BY ss.SegNo, ca.Date_Field

asushil SSC Journeyman Points: 97 More actions · Answer 11

Interesting Solution way back i was looking such solution for printing bar Code Labels for given no of item for given number of times.

some expert gentlemen on MSDN forum had helped me to solve it using CTE

here is the code

ALTER PROCEDURE [dbo].[sp_ItemLables]

(

@repeatlables numeric( 3,0),

@blanklables int,

@lbls2prn itemforlabels readonly

)

AS

WITH

Blanks_Lines(N)

AS

(

SELECT

1

WHERE

@blanklables >= 1

UNION ALL

SELECT

N + 1

FROM

Blanks_Lines

WHERE

N < @blanklables

),

Numbers(N)

AS

(

SELECT

1

UNION ALL

SELECT

N + 1

FROM

Numbers

WHERE

N < @repeatlables

)

SELECT

NULL as line1,

NULL as line2,

NULL as line3,

NULL as line4,

NULL as line5,

NULL as line6,

NULL as line7,

NULL as line8,

NULL as ui

FROM

Blanks_Lines

UNION ALL

SELECT

ItemList.Line1,

ItemList.Line2,

ItemList.Line3,

ItemList.Line4,

ItemList.Line5,

ItemList.Line6,

ItemList.Line7,

ItemList.Line8,

itemlist.id

FROM

@lbls2prn b,

dbo.Labels itemlist,

Numbers

WHERE

-- Join itemlist to get the labels for the passed list.

itemlist.id = b.id

OPTION

(MAXRECURSION 0);

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 12

asushil (5/15/2015)

Interesting Solution way back i was looking such solution for printing bar Code Labels for given no of item for given number of times.

some expert gentlemen on MSDN forum had helped me to solve it using CTE

here is the code

ALTER PROCEDURE [dbo].[sp_ItemLables]

(

@repeatlables numeric( 3,0),

@blanklables int,

@lbls2prn itemforlabels readonly

)

AS

WITH

Blanks_Lines(N)

AS

(

SELECT

1

WHERE

@blanklables >= 1

UNION ALL

SELECT

N + 1

FROM

Blanks_Lines

WHERE

N < @blanklables

),

Numbers(N)

AS

(

SELECT

1

UNION ALL

SELECT

N + 1

FROM

Numbers

WHERE

N < @repeatlables

)

SELECT

NULL as line1,

NULL as line2,

NULL as line3,

NULL as line4,

NULL as line5,

NULL as line6,

NULL as line7,

NULL as line8,

NULL as ui

FROM

Blanks_Lines

UNION ALL

SELECT

ItemList.Line1,

ItemList.Line2,

ItemList.Line3,

ItemList.Line4,

ItemList.Line5,

ItemList.Line6,

ItemList.Line7,

ItemList.Line8,

itemlist.id

FROM

@lbls2prn b,

dbo.Labels itemlist,

Numbers

WHERE

-- Join itemlist to get the labels for the passed list.

itemlist.id = b.id

OPTION

(MAXRECURSION 0);

I'd be interested in seeing the DDL, some sample data, and expected results based on the sample data to see if the recursion could be eliminated.

Jonathan AC Roberts SSCoach Points: 17601 More actions · Answer 13

asushil (5/15/2015)

Interesting Solution way back i was looking such solution for printing bar Code Labels for given no of item for given number of times.

some expert gentlemen on MSDN forum had helped me to solve it using CTE

here is the code

The same result could have been achieved in much more straight forward manner if you'd just used a tally table:

ALTER PROCEDURE [dbo].[sp_ItemLables]



(

    @repeatlables numeric(3,0),

    @blanklables int,

    @lbls2prn  itemforlabels readonly

)

AS

BEGIN

    SELECT NULL as line1,

           NULL as line2,

           NULL as line3,

           NULL as line4,

           NULL as line5,

           NULL as line6,

           NULL as line7,

           NULL as line8,

           NULL as ui

      FROM dbo.Tally

     WHERE N <= @blanklables

    UNION ALL

    SELECT ItemList.Line1,

           ItemList.Line2,

           ItemList.Line3,

           ItemList.Line4,

           ItemList.Line5,

           ItemList.Line6,

           ItemList.Line7,

           ItemList.Line8,

           itemlist.id

      FROM @lbls2prn b

     INNER JOIN dbo.Labels itemlist

             ON itemlist.id = b.id -- Join itemlist to get the labels for the passed list.

     CROSS APPLY (SELECT TOP(@repeatlables) * FROM dbo.Tally) AS A

END

GO

Alan Burstein SSC Guru Points: 61136 More actions · Answer 14

Thomas Schutte (5/15/2015)

In this case I would use a simple tally table:

DECLARE @source TABLE (SegNo int, Date_Field smalldatetime, Month_Count int, Payment decimal(10,2))
DECLARE @tally TABLE (N int) INSERT INTO @tally(N) SELECT ROW_NUMBER() OVER (ORDER BY column_id) - 1 FROM sys.columns INSERT INTO @source values (1, '20090101', 10, 100) ,(2, '20100101', 7, 200) ,(3, '20110101', 5, 300) SELECT ss.SegNo, ca.Date_Field, ss.Payment FROM @source ss CROSS APPLY (SELECT DATEADD(MONTH, N, ss.Date_Field) FROM @tally WHERE N < ss.Month_Count) ca(Date_Field)
ORDER BY ss.SegNo, ca.Date_Field

I love the way you think - a tally table-based solution will be much better, hands down. That was my first thought when I read this article. A couple things to point out though. ChrisM beat you to it a couple years ago (look at the first couple pages of this thread.) Second, there are a few issues with how you are using a tally table:

First, I would not use a temp table to for a tally table like you did, you are slowing things down by inserting rows into it and then you're doing a table scan against it for your solution. Ideally you want a well-indexed tally table in your schema or you can create one on the fly using a CTE. Compare the query plan for the queries below....

-- Your way:



DECLARE @source TABLE (SegNo int, Date_Field smalldatetime, Month_Count int, Payment decimal(10,2))

DECLARE @tally TABLE (N int)

INSERT INTO @tally(N)

SELECT ROW_NUMBER() OVER (ORDER BY column_id) - 1

FROM sys.columns;

SELECT TOP(12) N FROM @tally;

-- CTE Tally

WITH iTally(N) AS

(

  SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1 -- No Sort will be performed

  FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) x(N)

)

SELECT *

FROM iTally;

Your solution (Queries 1 & 2) are hugely expensive relative to the CTE tally. In this case, because we only need the numbers 0-12 rows, you don't even need to use ROW_NUMBER(); you could just so this:

SELECT * FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11)) x(N);

The performance improvement is minuscule but you'll get a cleaner query plan:

Lastly, the ORDER BY is not necessary and adds a lot overhead for no benefit. Below is your solution and an improved version.



DECLARE @source TABLE (SegNo int, Date_Field smalldatetime, Month_Count int, Payment decimal(10,2));

DECLARE @tally TABLE (N int);

INSERT INTO @tally(N)

SELECT ROW_NUMBER() OVER (ORDER BY column_id) - 1

FROM sys.columns;

INSERT INTO @source

values (1, '20090101', 10, 100)

      ,(2, '20100101', 7, 200)

      ,(3, '20110101', 5, 300)

-- ORIGINAL SOLUTION

SELECT ss.SegNo, ca.Date_Field, ss.Payment

FROM @source ss

CROSS APPLY (SELECT DATEADD(MONTH, N, ss.Date_Field) FROM @tally WHERE N < ss.Month_Count) ca(Date_Field)

ORDER BY ss.SegNo, ca.Date_Field;

-- IMPROVED SOLUTION: tally on-the-fly, no sort

SELECT ss.SegNo, ca.Date_Field, ss.Payment

FROM @source ss

CROSS APPLY

(

  SELECT DATEADD(MONTH, N, ss.Date_Field)

  FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11)) Tally(N) -- here's you tally table

  WHERE N < ss.Month_Count

) ca(Date_Field)

--ORDER BY ss.SegNo, ca.Date_Field;

GO

Forgive my unsolicited advice, just showing how to make your tally table faster. 😉

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

rstone SSCertifiable Points: 6086 More actions · Answer 15

The "TOP 100 PERCENT" seems to be ignored by the optimizer. The "TOP verylargernumber" can produce the recordset in the "expected" order. However, this is incidental. If the plan has to sort the records for a selection, then that just happens to be the order at that time. A plan that has other following steps or a plan that goes parallel will not add an additional sort to insure the final order matches the initial selective order. Another, thought - what's the final order of a query with two sub-queries each with a selective order by? The answers is in whatever order that the optimizer selects as most efficient. After data growth, the optimizer might select a different index and the result could be a different order.

Randy
Helpdesk: Perhaps Im not the only one that does not know what you are doing. 😉