Usage of CTE - Trick with Dates

Question

Post reply

Usage of CTE - Trick with Dates

Viewing 15 posts - 16 through 30 (of 68 total)

You must be logged in to reply to this topic. Login to reply

Kingston Dhasian SSCoach Points: 19795 More actions · Answer 1

ChrisM@Work (6/18/2013)

sagesmith (6/18/2013)

Please post the times each approach takes, this is the only basis for a conclusion. As others have already mentioned, in many scenarios the cursor far outperforms the recursive cte.

Over to you then. Sample data setup is a page or two back. I've compared iTally vs rCTE here:

http://qa.sqlservercentral.com/Forums/FindPost1464526.aspx

Feel free to write your fastest cursor-based solution 😉

😀 That was really funny.

But, I think( and truly hope ) it was a typo by the poster when he said "cursor far outperforms the recursive cte".

If in case he really meant it, even I would love to be enlightened 🙂

Kingston Dhasian

How to post data/code on a forum to get the best help - Jeff Moden
http://www.sqlservercentral.com/articles/Best+Practices/61537/

sagesmith SSC Veteran Points: 251 More actions · Answer 2

ChrisM@Work (6/18/2013)

sagesmith (6/18/2013)

Please post the times each approach takes, this is the only basis for a conclusion. As others have already mentioned, in many scenarios the cursor far outperforms the recursive cte.

Over to you then. Sample data setup is a page or two back. I've compared iTally vs rCTE here:

http://qa.sqlservercentral.com/Forums/FindPost1464526.aspx

Feel free to write your fastest cursor-based solution 😉

My comment was for the author of the article who was comparing rCTE to Cursor. Your Tally approach is great for that sample scenario, no argument. I have found, as others have mentioned, that as you attempt to scale the rCTE approach the performance seems to go in an inverse relationship to the amount of rows. In these scenarios a Cursor or the Tally approach (inline or maybe faster as a Numbers table on disk?) does much better than the rCTE.

I have seen these enlightened articles on rCTE before and I think they should come with a word of caution (unless I'm the one who is not enlightened? please prove me wrong).

Habib Salim Grasshopper Points: 18 More actions · Answer 3

If I may make a fine point:

It is more accurate to say that in a multi-statement batch, the statement (query) preceding a CTE should be terminated by a semi-colon.

I find the following just a little easier to grasp: (Change MonthCount to int)

With CTE_Base (SeqNo, Begin_Date, Month_Count, Payment)

as

(select SeqNo, Date_Field AS Begin_Date,Month_Count, Payment from dbo.abc

UNION all

select SeqNo, dateadd(mm, 1, Begin_Date), Month_Count -1, Payment from CTE_Base

where Month_Count > 1

)

SELECT SeqNo, Begin_Date, Month_Count, Payment

FROM CTE_Base

order by SeqNo, Begin_Date

sagesmith SSC Veteran Points: 251 More actions · Answer 4

your Tally kills the rCTE and hybrid Cursor/Numbers as you can see. On my box:

Inline Tally = 146 milliseconds

rCTE = 2113 milliseconds

Cursor/Numbers = 1183 milliseconds

The Cursor based solution is only partially so, I still couldn't bring myself to copy the author's while inner loop...yikes. But the point I'm trying to make is with bashing cursors:

http://stackoverflow.com/questions/287445/why-do-people-hate-sql-cursors-so-much

see most popular answer under The Size Issue heading.



--If the table exists, drop it.

if exists (select * from sys.tables

           where name = 'abc'

           and   Schema_Name(schema_id) = 'dbo'

           and   [type] = 'U')

drop table dbo.abc

go

--Create the source table, dbo.abc.

create table dbo.abc

(SeqNo smallint

,Date_Field datetime

,Month_Count int

,Payment decimal(10,2))

go

--Populate the source table, dbo.abc

insert into dbo.abc (SeqNo, Date_Field, Month_Count, Payment)

values (1, '20090101', 95000, 100)

      ,(2, '20100101', 7000, 200)

      ,(3, '20110101', 5000, 300)

go

--If exists, drop the destination table dbo.def

if exists (select * from sys.tables

           where name = 'def'

           and   Schema_NAME(schema_id) = 'dbo'

           and   [type] = 'U')

drop table dbo.def

go

--Create the destination table, dbo.def

create table dbo.def

(SeqNo smallint

,Date_Field smalldatetime

,Payment decimal(10,2))

go

------------ Inline Tally version ------------------

-- black hole variables:

DECLARE @SeqNo SMALLINT, @Date_Field DATETIME, @Payment DECIMAL (10,2)

DECLARE @StartTime DATETIME;

-- time store

SET @StartTime = GETDATE()

;WITH E1(n) AS (

	SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

	SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

	SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), -- 10 rows

E2(n) AS (SELECT 1 FROM E1 a, E1 b),									-- 10x10 rows

E3(n) AS (SELECT 1 FROM E2 a, E2 b, E2 c)								-- 100x100x100 rows 

SELECT

	@SeqNo = SeqNo,

	@Date_Field = DATEADD(MONTH,x.n,Date_Field),

	@Payment = Payment

FROM dbo.abc

CROSS APPLY(SELECT TOP(Month_Count) n = -1+ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E3) x

SELECT 'Inline Tally = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'

PRINT '-------------------------------------------------------------------------'

-------------- rCTE version ------------------------------

SET @StartTime = GETDATE()

;with CTE_Base (SeqNo, Date_Field, Month_Count, Payment, Begin_Date, End_Date, Frequency)

as

(select SeqNo, Date_Field, Month_Count, Payment, Date_Field, dateadd(mm, Month_Count-1, Date_Field), 1 from dbo.abc

union all

select SeqNo, dateadd(mm, Frequency, Date_Field), Month_Count, Payment, Begin_Date, End_Date, Frequency

from CTE_Base

where dateadd(mm, Frequency, Date_Field) between Begin_Date and End_Date)

select

	@SeqNo = SeqNo,

	@Date_Field = Date_Field,

	@Payment = Payment

from CTE_Base

where Date_Field between Begin_Date and End_Date

order by SeqNo, Date_Field

OPTION(MAXRECURSION 0)

SELECT 'rCTE = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'

PRINT '-------------------------------------------------------------------------'

-------------- CURSOR version ------------------------------

SET @StartTime = GETDATE()

declare @l_SeqNo int

           ,@l_date_field DATETIME

           ,@l_Month_Count int

           ,@l_Payment decimal(10, 2)

           ,@l_counter SMALLINT

		   ,@l_max_month_count INT = (SELECT MAX(Month_Count) FROM dbo.abc);        

SELECT TOP (@l_max_month_count) IDENTITY(int,1,1) AS number

into #Numbers

FROM sys.objects s1

CROSS JOIN sys.objects s2

select @l_counter = 0

set nocount on;

declare i_Cursor CURSOR FAST_FORWARD

FOR

select SeqNo, Date_Field, Month_Count, Payment from dbo.abc

open i_Cursor

fetch next from i_Cursor into

                             @l_SeqNo

                            ,@l_date_field

                            ,@l_Month_Count

                            ,@l_Payment

while @@fetch_status = 0

begin

		select number, dateadd(mm, number-1, @l_date_field), @l_Payment

		FROM #Numbers

		WHERE number<=@l_Month_Count

      fetch next from i_Cursor into

                                   @l_SeqNo

                                  ,@l_date_field

                                  ,@l_Month_Count

                                  ,@l_Payment

end

close i_Cursor

deallocate i_Cursor

set nocount off;

SELECT 'Cursor/Numbers = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'

PRINT '-------------------------------------------------------------------------'

DROP TABLE #Numbers;

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

sagesmith (6/18/2013)

ChrisM@Work (6/18/2013)

sagesmith (6/18/2013)

Please post the times each approach takes, this is the only basis for a conclusion. As others have already mentioned, in many scenarios the cursor far outperforms the recursive cte.

Over to you then. Sample data setup is a page or two back. I've compared iTally vs rCTE here:

http://qa.sqlservercentral.com/Forums/FindPost1464526.aspx

Feel free to write your fastest cursor-based solution 😉

My comment was for the author of the article who was comparing rCTE to Cursor. Your Tally approach is great for that sample scenario, no argument. I have found, as others have mentioned, that as you attempt to scale the rCTE approach the performance seems to go in an inverse relationship to the amount of rows. In these scenarios a Cursor or the Tally approach (inline or maybe faster as a Numbers table on disk?) does much better than the rCTE.

I have seen these enlightened articles on rCTE before and I think they should come with a word of caution (unless I'm the one who is not enlightened? please prove me wrong).

I can show you rCTE's scaling well and performing remarkably well too - threads and articles are linked in Dwain's article in my signature below. One or two will surprise you including a super-fast distinct documented by Paul White, and very fast running totals. I can tell you haven't yet read it 😉

I've seen rCTE's performing poorly against other coding methods too. Even posted one or two. What I haven't seen is this "inverse relationship" i.e. where the rCTE fails to scale. Do you have an example?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Mike Dougherty-384281 SSCrazy Points: 2794 More actions · Answer 6

Jonathan AC Roberts (6/18/2013)

It can also be done even more simply with an INNER JOIN and tally table (and is faster than the rCTE version)

SELECT A.Seqno,
DATEADD(month, T.N, A.Date_field) Date_Field, A.Payment FROM dbo.abc A INNER JOIN dbo.tsqlc_Tally T ON T.N < A.Month_Count

I use this tally table it starts from zero and has just over a million rows.

That's the solution I was thinking about while reading this scenario.

I was also wondering if this was close to real code or an entirely contrived example - I don't understand the business case for creating data like this.

Also, what is the impact on logs for potentially large (10M+) insert-all-at-once versus the cursor solution? Was the target table being used in a transactional system? If yes, the locks held to do a large insert might be felt by transaction customers while the row-locks for a single insert might not. Another extreme use-case would be cross-server inserts - distributed transaction overhead can become prohibitive... and sometimes is sidestepped by calling a 4part name SP with cursor variables.

fwiw: Of course I'm an advocate of set-based solutions. These are just a few examples of why the default answer for almost any performance question is "it depends" 🙂

whenriksen SSCarpal Tunnel Points: 4752 More actions · Answer 7

If "Show Actual Execution Plan" is set prior to executing the code and then both the queries (the CURSOR based one and the recursive CTE based one) are executed, you will note that the query cost of the conventional CURSOR based approach to this problem's solution is around 91% where as it is around 9% with the recursive CTE based solution. That is almost a ten fold savings in resource utilization with the recursive CTE based solution.

As far as I understand, query costs are only comparable within the single execution plan. You can compare execution plans to determine how the optimizer handles the different queries, but the percent values are only relative to a single execution plan. 9% of a 10 minute query is more than 91% of a 10 second query. To compare different queries, you need to use more absolute metrics such as I/O or CPU time.

Wes
(A solid design is always preferable to a creative workaround)

whenriksen SSCarpal Tunnel Points: 4752 More actions · Answer 8

One other thing:

Please note that I have used TOP 100 PERCENT in the SELECT statement in this function because I want to retain the ORDER BY clause, which is otherwise not allowed in InLine table-values functions.

Top 100 Percent... Order by no longer has any effect on the result. SQL 2008 ignores that combination. You can read more here.

http://blog.sqlauthority.com/2009/11/24/sql-server-interesting-observation-top-100-percent-and-order-by/

Wes
(A solid design is always preferable to a creative workaround)

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

whenriksen (6/18/2013)

If "Show Actual Execution Plan" is set prior to executing the code and then both the queries (the CURSOR based one and the recursive CTE based one) are executed, you will note that the query cost of the conventional CURSOR based approach to this problem's solution is around 91% where as it is around 9% with the recursive CTE based solution. That is almost a ten fold savings in resource utilization with the recursive CTE based solution.

As far as I understand, query costs are only comparable within the single execution plan. You can compare execution plans to determine how the optimizer handles the different queries, but the percent values are only relative to a single execution plan. 9% of a 10 minute query is more than 91% of a 10 second query. To compare different queries, you need to use more absolute metrics such as I/O or CPU time.

Comparing query costs in this manner simply doesn't work with rCTE's, the relative cost is almost always wildly underestimated. In the example I posted a few back, the inline tally solution and the rCTE solution were costed at very roughly 50% each - about 30-fold out.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

sagesmith SSC Veteran Points: 251 More actions · Answer 10

sagesmith

SSC Veteran

Points: 251

June 18, 2013 at 9:04 am

#1625084

What I haven't seen is this "inverse relationship" i.e. where the rCTE fails to scale. Do you have an example?

The first article by Jeff Moden in your signature makes the point pretty well.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 11

sagesmith (6/18/2013)

What I haven't seen is this "inverse relationship" i.e. where the rCTE fails to scale. Do you have an example?

The first article by Jeff Moden in your signature makes the point pretty well.

Forum etiquette? :blush: have I upset someone?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Suresh Kumar Maganti Default port Points: 1429 More actions · Answer 12

Hi Mike,

Thanks for the nice discussion.

This was indeed a real life example in Finance domain. I just broke down its various components for creating a simple example. In the real scenario, we had around 63 million records to process like this. In other words, there were 63 million distinct SeqNo's in the source table, dbo.abc. But the value of month count would never go above 12, the number of months in a financial year. This was just a one time process though. Of course, we had other fields in our source table that made the logic more complex. But for this article, all I did was remove the unnecessary complexity out so that the concept could be brought forward. In our case, the cursor based apporach took roughly six hours for execution. On the other hand, the rCTE based solution took only 3 minutes. I agree that I could have used the Tally table as well. In our real case, I was satisfied with the rCTE usage as a solution to this problem. Tally would have been even better performance-wise but I did not want to create another physical table or a temp table to store the Tally records. My self-imposed goal was to use only existing tables.

------Thanks and Best Regards,

Suresh.

Kindest Regards,

M Suresh Kumar

Jason-299789 SSC-Insane Points: 21601 More actions · Answer 13

I think you missed the point that chris was making, if you look at his example he uses an Inline CTE to act as a Tally table.

As you have a fixed number of Periods (12) you could have done something like



SELECT

Tabe.columnlist

, DATEADD(MONTH,y.N,Table.aDate) Period

From

Table

CROSS APPLY (SELECT TOP(MONTH_COUNT) N-1

                        FROM (VALUES(1),(2),(3),(4),(5),(6)

                                            ,(7),(8),(9),(10),(11),(12))

                                  x (N)) y

if you really wanted you could get rid of the N-1 by starting the VALUES seqence from 0-11.

If you want a large tally table then Chris's CROSS JOIN CTE method works well.

_________________________________________________________________________
SSC Guide to Posting and Best Practices

Suresh Kumar Maganti Default port Points: 1429 More actions · Answer 14

Jason-299789 (6/18/2013)

I think you missed the point that chris was making, if you look at his example he uses an Inline CTE to act as a Tally table.

As you have a fixed number of Periods (12) you could have done something like

SELECT Tabe.columnlist , DATEADD(MONTH,y.N,Table.aDate) Period From Table CROSS APPLY (SELECT TOP(MONTH_COUNT) N-1 FROM (VALUES(1),(2),(3),(4),(5),(6) ,(7),(8),(9),(10),(11),(12)) x (N)) y

if you really wanted you could get rid of the N-1 by starting the VALUES seqence from 0-11.

If you want a large tally table then Chris's CROSS JOIN CTE method works well.

Hi Jason, I completely agree with your method above and Chris's method with Tally inside CTE. There is no doubt that both the methods would be faster. I was just replying to Mike's query above where the Tally table is being separately built and populated prior to being used in the main query.

Another reason why I chose a set-based approach rather than a row-by-row insert in the cursor-based approach is that in my scenario, I did not want to lock the destination table dbo.def around seven hundred million times, once per insert (I had 63 million distinct rows in dbo.abc, the source table). In the set-based approach, it would lock dbo.def only once and then release the lock post the insert operation. I preferred a minor inconvenience for 3 minutes over a nagging problem for 6 hours. I am sure that with Tally, the 3 minutes would get even further reduced. Great solution. Thanks to you and Chris for the insights.

----Thanks and Best Regards,

Suresh.

Kindest Regards,

M Suresh Kumar

sagesmith SSC Veteran Points: 251 More actions · Answer 15

Please forgive my lack of etiquette. I meant to reference this article by Jeff Moden[/url].

I am humbly submitting this code for review by the gurus here on this forum. I think it might show that the CURSOR might be unfairly cast as the villain for this problem. By using the CURSOR in a similar way and just changing the INNER WHILE loop I am able to halve the time it takes the rCTE on my box. Is the CURSOR, as it is used in the example, the bottleneck?