Eliminating Cursors

Question

Post reply

Eliminating Cursors

Viewing 15 posts - 46 through 60 (of 296 total)

You must be logged in to reply to this topic. Login to reply

Sergiy SSC Guru Points: 110192 More actions · Answer 1

Sergiy

SSC Guru

Points: 110192

January 17, 2006 at 1:38 pm

#615279

What is this:

SET @loop_counter = ISNULL(SELECT COUNT(*) FROM @item_table),0) ?

I would like to see a case when SELECT COUNT(*) returns NULL.

_____________
Code for TallyGenerator

DCPeterson SSCoach Points: 15256 More actions · Answer 2

I guess proof is what you need...

create

table curstest (col1 int identity, col2 varchar(25))

GO

ALTER

TABLE curstest ADD CONSTRAINT PK_curstest PRIMARY KEY CLUSTERED (col1) ASC

GO

declare @counter int

set

@counter = 0

while

@counter < 500000

begin

insert

into curstest (col2)

select

NULL

set

@Counter = @counter + 1

end

Your method as shown below, took 8 minutes and 59 seconds to run against the table.

declare @counter int

,@rows int

set

@counter = -1

SET

@Rows = 1

WHILE

@Rows > 0

BEGIN

SELECT TOP 1

@Counter

= Col1

FROM CursTest

WHERE col1 > @Counter

ORDER BY Col1

SET @Rows = @@ROWCOUNT

IF @Rows > 0

print @Counter

END

This firehose cursor took 10 seconds to do the same thing...

DECLARE

@Counter int

DECLARE

c CURSOR FAST_FORWARD

FOR

SELECT

Col1 FROM CursTest

OPEN

c

FETCH

NEXT FROM c INTO @Counter

WHILE

@@Fetch_Status = 0

BEGIN

PRINT @Counter

FETCH NEXT FROM c INTO @Counter

END

CLOSE

c

DEALLOCATE

C

Now, what exactly is the benefit of "rolling your own" cursors? NOTHING, don't do it, it's a stupid idea.

/*****************

If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

*****************/

BeerBeerBeer Hall of Fame Points: 3961 More actions · Answer 3

Because we don't know what complicated (or not) calculation is in the usp_generate_purchase_order stored procedure, it is difficult to guess at a more efficient solution.

However, if that generation code is merely some form of calculation, could it not be in a UDF that is used as part of an insert or update statement? This would surely be more of a set based approach than the solution provided (which is of course just more row by row processing).

It has been my experience that this type of approach comes from those without a strong set based back ground - basically application developers where by necessity things are approached row by row (displays, saving, scrolling).

An example from a past company I worked at was a set of stored procedures that had cursors within cursors (curses) accessing multi-million row tables. This SP would run for 48 hours and not complete. It took me the best part of two months to get it running correctly ($$$$$$$$$$) but in the end it ran in 12 minutes and was more accurate than the original. It was just that the original developer (from the clients in house team) was not used to the idea of set theory and could not conceptualise how to process all that information without doing it row by row.

Sorry, but for me that article was a bit of a dud. However the comments asssociated with it have been revealing.

Regards,

Steve

Life without beer is no life at all

All beer is good, some beers are just better than others

emmchild Ten Centuries Points: 1010 More actions · Answer 4

emmchild

Ten Centuries

Points: 1010

January 17, 2006 at 2:52 pm

#615300

Replacing a cursor with something equally memory intensive is not the optimal approach. Let the performance numbers speak for themselves. The technique I submitted has served me well over the years. I think that the writer could have come up with a simpler design had the amount of time the process ran been an issue.

Calvin Lawson SSChampion Points: 11030 More actions · Answer 5

Quote dcpeterson: I guess proof is what you need...

That was not an accurate test; you'd need to destroy and re-create the table for each test in order for it to be correct. The table gets cached, then read from memory, which is radically faster. Running the test this way shows they are fairly equivalent for recordsets under 10,000.

That said, using a fast_forward cursor is often faster; and definitely faster for large recordsets.

However, I never do cursors on large data sets, as I ALWAYS find a set based approach that works instead. The main reason I use loops is to throttle large updates, or for working with SQL objects dynamically. These always have a small recordset that's being looped through (even though the update may affect a bunch of rows, the table I loop on only has a few). Loops are easier to read, easier to nest, and unlike a cursor they can do pretty much anything. If there is no performance problems, use the clearest code, I say.

cl

Signature is NULL

DCPeterson SSCoach Points: 15256 More actions · Answer 6

I used dbcc freeproccache and dropcleanbuffers between tests. However, subsequent testing shows that they are actually much closer in speed than that initial test. The while loop is still about 15% slower.

I agree that you should always look for a set based solution, but that isn't always possible, even for large data sets.

I have seen a number of cases where programmers used the "roll your own" approach because they had been told to avoid cursors. In almost every case, when I have rewritten the code using cursors, the code is simpler and faster. Simple is in the eye of the beholder, I personally prefer to see the cursor syntax, but faster is faster...

/*****************

If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

*****************/

luciano furlan SSC Enthusiast Points: 107 More actions · Answer 7

I compared the performance of the "while" solution with a fast_forward and a static cursor.

This are the results (average of 10 tests):

No.of records	While start	While loop end	Fast_Forward start	Fast_Forward loop end	Static start	Static loop end
1.000
Avg	51	187	19	21	17	13
Elapsed		238		40		30
10.000
Avg	193	14.093	63	253	73	141
Elapsed		14.287		317		214
100.000
Avg			352	1.832	457	1.065
Elapsed	Too	much		2.184		1.522

The difference is really big. The "while" solution is very slow.

The static cursor is faster then the forward_only.

Here are the script I used:

USE AdventureWorks2000

DBCC DROPCLEANBUFFERS

DBCC FREEPROCCACHE

--Declare variables

DECLARE @Name nvarchar(50),@X nvarchar(50)

declare @Time1 datetime, @TimeDiff int

declare @sPrint varchar(100)

Print 'Start';set @Time1=getdate();

--Declare a memory table

DECLARE @item_table TABLE

(primary_key INT IDENTITY(1,1) NOT NULL, --THE IDENTITY STATEMENT IS IMPORTANT!

Name nvarchar(50)

)

--now populate this table with the required item category values

INSERT INTO @item_table(Name)

SELECT top 10000 P.Name

FROM Product P

join TransactionHistory H

on H.ProductID=P.ProductID

DECLARE @item_category_counter INT

DECLARE @loop_counter INT

SET @loop_counter = ISNULL((SELECT COUNT(*) FROM @item_table),0) -- Set the @loop_counter to the total number of rows in the

set @TimeDiff=datediff(ms,@Time1,getdate());set @Time1=getdate();set @sPrint= 'Insert and count elapsed ' +cast(@TimeDiff as varchar(20))+' ms';Print @sPrint

SET @item_category_counter = 1

WHILE @loop_counter > 0 AND @item_category_counter <= @loop_counter BEGIN

SELECT @Name = Name

FROM @item_table

WHERE primary_key = @item_category_counter

select @X=@Name

SET @item_category_counter = @item_category_counter + 1

END

set @TimeDiff=datediff(ms,@Time1,getdate());set @Time1=getdate();set @sPrint= 'Loop elapsed ' +cast(@TimeDiff as varchar(20))+' ms';Print @sPrint

----------------------------------------------------------

USE AdventureWorks2000

DBCC DROPCLEANBUFFERS

DBCC FREEPROCCACHE

DECLARE @Name nvarchar(50),@X nvarchar(50)

declare @Time1 datetime, @TimeDiff int

declare @sPrint varchar(100)

Print 'Start';set @Time1=getdate();

DECLARE item_cursor CURSOR LOCAL fast_forward FOR

SELECT top 100000 P.Name

FROM Product P

join TransactionHistory H

on H.ProductID=P.ProductID

OPEN item_cursor

FETCH NEXT FROM item_cursor INTO @Name

set @TimeDiff=datediff(ms,@Time1,getdate());set @Time1=getdate();set @sPrint= 'First fetch elapsed ' +cast(@TimeDiff as varchar(20))+' ms';Print @sPrint

WHILE @@FETCH_STATUS = 0 BEGIN

select @X=@Name

FETCH NEXT FROM item_cursor INTO @Name

END

CLOSE item_cursor

deallocate item_cursor

set @TimeDiff=datediff(ms,@Time1,getdate());set @Time1=getdate();set @sPrint= 'Loop elapsed ' +cast(@TimeDiff as varchar(20))+' ms';Print @sPrint

-----------------------------------------------------------------------

USE AdventureWorks2000

DBCC DROPCLEANBUFFERS

DBCC FREEPROCCACHE

DECLARE @Name nvarchar(50),@X nvarchar(50)

declare @Time1 datetime, @TimeDiff int

declare @sPrint varchar(100)

Print 'Start';set @Time1=getdate();

DECLARE item_cursor CURSOR LOCAL static read_only forward_only FOR

SELECT top 100000 P.Name

FROM Product P

join TransactionHistory H

on H.ProductID=P.ProductID

OPEN item_cursor

FETCH NEXT FROM item_cursor INTO @Name

set @TimeDiff=datediff(ms,@Time1,getdate());set @Time1=getdate();set @sPrint= 'First fetch elapsed ' +cast(@TimeDiff as varchar(20))+' ms';Print @sPrint

WHILE @@FETCH_STATUS = 0 BEGIN

select @X=@Name

FETCH NEXT FROM item_cursor INTO @Name

END

CLOSE item_cursor

deallocate item_cursor

set @TimeDiff=datediff(ms,@Time1,getdate());set @Time1=getdate();set @sPrint= 'Loop elapsed ' +cast(@TimeDiff as varchar(20))+' ms';Print @sPrint

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 8

Just my humble opinion but I need to clear up some of the myths associated with Table Variables...

First, table variables don't live in memory anymore than temp tables do... they both start out in memory if they fit and then they BOTH move to TempDB if they don't. See the following URL for "proof"...

http://support.microsoft.com/default.aspx?scid=kb;en-us;305977&Product=sql2k

... which states, as part of answer A4...

Q4: Are table variables memory-only structures that are assured better performance as compared to temporary or permanent tables, because they are maintained in a database that resides on the physical disk?

A4: A table variable is not a memory-only structure. Because a table variable might hold more data than can fit in memory, it has to have a place on disk to store data. Table variables are created in the tempdb database similar to temporary tables. If memory is available, both table variables and temporary tables are created and processed while in memory (data cache).

Now, what you might not know, is that table variables do NOT create or use statistics. So, the use of table variables may actually be slower than using a TempTable on larger usages. I've been unfortunate enough to witness this particular phenomena. From the same previously referenced URL...

Q3: What are some of the drawbacks of table variables?

A3: These are some of the drawbacks as compared to temporary tables:

•	Non-clustered indexes cannot be created on table variables, other than the system indexes that are created for a PRIMARY or UNIQUE constraint. That can influence the query performance when compared to a temporary table with non-clustered indexes.
•	Table variables do not maintain statistics like temporary tables can. Statistics cannot be created on table variables through automatic creation or by using the CREATE STATISTICS statement. Therefore, for complex queries on large tables, the lack of statistics may deter the optimizer to determine the best plan for a query, thus affecting the performance of that query.

Ok, myths about table variables being so much better than temp tables are BUSTED... a little testing on your part will offer additional proof to skeptics... back to the original problem...

As some have identified in this thread, there is no difference between using a Static Cursor and using the combination of a Temp Table with a While Loop. There's no difference in resources or performance. The real problem is that if you have resorted to the use of any form of RBAR (a "Modenism" which mean "Row By Agonizing Row"), then you have failed as an SQL Developer unless a 3rd party requirement exists for data transmission.

The only way to eliminate the performance drain of Cursors is to rewrite the code to be set-based. That may mean that you also have to rewrite more than one level of procs being called in the entire tree of the proc but that's the only way. It's not that Cursors are "bad", per-se', it's that Cusors and any form of RBAR are a lot slower than set based.

Any suggestion to the contrary is likely to be shown to be absolutely full of hooey in the light of any proper testing.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

cadebryant Ten Centuries Points: 1035 More actions · Answer 9

cadebryant

Ten Centuries

Points: 1035

January 18, 2006 at 8:08 am

#615502

I have personally discovered that table variables can often be slower than temp tables. In addition, with temp tables you can create indexes, which often speed up performance significantly (minus the time it takes to create the index).

But for this type of operation, I still think xp_execresultset is the best solution.

ksh SSC Veteran Points: 298 More actions · Answer 10

In SQL Server I have found that using temp tables(non-memory) as a pre-processing staging area makes for more optimized code, reducing the burden on the optimizer(yes these do need help!) and overall making for more efficient use of resources (faster?). These temp tables can then be cursored and this type of configuration can also become a set-up to using just DML instead of cursors(when practical).

As for the idea of using SELECT instead of a cursor, once the temp table is established reminds me of the code that one would have to use if cursors were not available (in the SQL Server world when was this?) Does it work faster than cursors? 15 years ago Microsoft or Sybase would probably have claimed so. That is until cursors were introduced into SQL Server.

We need to remember that SQL Server maintains separate optimizers for cursor SELECTS and another for DML SELECTS. In the Oracle world there is only one optimzer, and so there is no speed difference between using a CURSOR or just using a SELECT. As a result(and some other reasns, too) CURSORs are the preferred way of writing code in the Oracle world.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 11

>>But for this type of operation, I still think xp_execresultset is the best solution.

The original post was for "Eliminating Cursors" so I have to ask... how do you use it to eliminate cursors? If that's not what you were speaking of, then what are you referring to as "this type of operation".

By the way... it's not xp_execresultset... it's sp_execresultset and it uses a While loop within...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 12

>>>These temp tables can then be cursored...

I don't think you understood... the whole idea is to get rid of cursors and other RBAR methods in SQL Server.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

cadebryant Ten Centuries Points: 1035 More actions · Answer 13

See my post on the 4th page of this thread, where I give more detail about using xp_execresultset in place of a cursor. In addition, there are many good examples on the web for using this proc in place of certain kinds of cursor operations.

It's true that the proc uses a loop internally. However, in the few times that I've used this proc in real life, there has always been a performance improvement over using a cursor or spinning my own WHILE loop.

BTW - it can be executed as either xp_execresultset or sp_execresultset.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 14

Ah. Got it... very nice work around and thanks for the clarification on the xp_/sp_. The proc is actually listed as an sp_ in the Master DB but I guess they made it convenient. Considering the verosity of sp_ExecResultSet, I am a bit surprised that it runs faster but its understandable because it's a system proc and it's pretty well written for what it does.

I'd still have to fire a Developer that writes RBAR code, though (certainly not the guy that came up with the type of work around you did). There's just no excuse.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 15

Mike C,

Exactly right and great call. In fact, there's documented proof that table variables do use TempDB at the following web site and that there are some serious DISADVANTAGES to using table variables instead of temp tables...

http://support.microsoft.com/default.aspx?scid=kb;en-us;305977&Product=sql2k

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions