Eliminating Cursors

  • What is this:

    SET @loop_counter = ISNULL(SELECT COUNT(*) FROM @item_table),0) ?

    I would like to see a case when SELECT COUNT(*) returns NULL.

    _____________
    Code for TallyGenerator

  • I guess proof is what you need...

    create

    table curstest (col1 int identity, col2 varchar(25))

    GO

    ALTER

    TABLE curstest ADD CONSTRAINT PK_curstest PRIMARY KEY CLUSTERED (col1) ASC

    GO

    declare @counter int

    set

    @counter = 0

    while

    @counter < 500000

    begin

    insert

    into curstest (col2)

    select

    NULL

    set

    @Counter = @counter + 1

    end

    Your method as shown below, took 8 minutes and 59 seconds to run against the table.

    declare @counter int

    ,@rows int

    set

    @counter = -1

    SET

    @Rows = 1

    WHILE

    @Rows > 0

    BEGIN

    SELECT TOP 1

    @Counter

    = Col1

    FROM CursTest

    WHERE col1 > @Counter

    ORDER BY Col1

    SET @Rows = @@ROWCOUNT

    IF @Rows > 0

    print @Counter

    END

    This firehose cursor took 10 seconds to do the same thing...

    DECLARE

    @Counter int

    DECLARE

    c CURSOR FAST_FORWARD

    FOR

    SELECT

    Col1 FROM CursTest

    OPEN

    c

    FETCH

    NEXT FROM c INTO @Counter

    WHILE

    @@Fetch_Status = 0

    BEGIN

    PRINT @Counter

    FETCH NEXT FROM c INTO @Counter

    END

    CLOSE

    c

    DEALLOCATE

    C

    Now, what exactly is the benefit of "rolling your own" cursors?  NOTHING, don't do it, it's a stupid idea.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Because we don't know what complicated (or not) calculation is in the usp_generate_purchase_order stored procedure, it is difficult to guess at a more efficient solution.

    However, if that generation code is merely some form of calculation, could it not be in a UDF that is used as part of an insert or update statement? This would surely be more of a set based approach than the solution provided (which is of course just more row by row processing).

    It has been my experience that this type of approach comes from those without a strong set based back ground - basically application developers where by necessity things are approached row by row (displays, saving, scrolling).

    An example from a past company I worked at was a set of stored procedures that had cursors within cursors (curses) accessing multi-million row tables. This SP would run for 48 hours and not complete. It took me the best part of two months to get it running correctly ($$$$$$$$$$) but in the end it ran in 12 minutes and was more accurate than the original. It was just that the original developer (from the clients in house team) was not used to the idea of set theory and could not conceptualise how to process all that information without doing it row by row.

    Sorry, but for me that article was a bit of a dud. However the comments asssociated with it have been revealing.


    Regards,

    Steve

    Life without beer is no life at all

    All beer is good, some beers are just better than others

  • Replacing a cursor with something equally memory intensive is not the optimal approach.  Let the performance numbers speak for themselves.  The technique I submitted has served me well over the years.  I think that the writer could have come up with a simpler design had the amount of time the process ran been an issue.

  • Quote dcpeterson: I guess proof is what you need...

    That was not an accurate test; you'd need to destroy and re-create the table for each test in order for it to be correct. The table gets cached, then read from memory, which is radically faster. Running the test this way shows they are fairly equivalent for recordsets under 10,000.

    That said, using a fast_forward cursor is often faster; and definitely faster for large recordsets.

    However, I never do cursors on large data sets, as I ALWAYS find a set based approach that works instead. The main reason I use loops is to throttle large updates, or for working with SQL objects dynamically. These always have a small recordset that's being looped through (even though the update may affect a bunch of rows, the table I loop on only has a few). Loops are easier to read, easier to nest, and unlike a cursor they can do pretty much anything. If there is no performance problems, use the clearest code, I say.

    cl

    Signature is NULL

  • I used dbcc freeproccache and dropcleanbuffers between tests.  However, subsequent testing shows that they are actually much closer in speed than that initial test.  The while loop is still about 15% slower.

    I agree that you should always look for a set based solution, but that isn't always possible, even for large data sets. 

    I have seen a number of cases where programmers used the "roll your own" approach because they had been told to avoid cursors.  In almost every case, when I have rewritten the code using cursors, the code is simpler and faster.  Simple is in the eye of the beholder, I personally prefer to see the cursor syntax, but faster is faster...

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • I compared the performance of the "while" solution with a fast_forward and a static cursor.

    This are the results (average of 10 tests):

    No.of records While start While loop end Fast_Forward start Fast_Forward loop end Static start Static loop end
    1.000            
    Avg 51 187 19 21 17 13
    Elapsed   238   40   30
    10.000            
    Avg 193 14.093 63 253 73 141
    Elapsed   14.287   317   214
    100.000            
    Avg     352 1.832 457 1.065
    Elapsed Too much   2.184   1.522

    The difference is really big. The "while" solution is very slow.

    The static cursor is faster then the forward_only.

    Here are the script I used:

    USE AdventureWorks2000

    DBCC DROPCLEANBUFFERS

    DBCC FREEPROCCACHE

     --Declare variables

    DECLARE @Name nvarchar(50),@X nvarchar(50)

    declare @Time1 datetime, @TimeDiff int

    declare @sPrint varchar(100)

    Print 'Start';set @Time1=getdate();

    --Declare a memory table

    DECLARE @item_table TABLE

    (primary_key INT IDENTITY(1,1) NOT NULL, --THE IDENTITY STATEMENT IS IMPORTANT!

    Name nvarchar(50)

    )

    --now populate this table with the required item category values

    INSERT INTO @item_table(Name)

     SELECT top 10000 P.Name 

     FROM Product P

     join TransactionHistory H

     on H.ProductID=P.ProductID

    DECLARE @item_category_counter INT

    DECLARE @loop_counter INT

    SET @loop_counter = ISNULL((SELECT COUNT(*) FROM @item_table),0) -- Set the @loop_counter to the total number of rows in the

    set @TimeDiff=datediff(ms,@Time1,getdate());set @Time1=getdate();set @sPrint= 'Insert and count elapsed ' +cast(@TimeDiff as varchar(20))+' ms';Print @sPrint

                                                                 

    SET @item_category_counter = 1

    WHILE @loop_counter > 0 AND @item_category_counter <= @loop_counter BEGIN

     SELECT @Name = Name

     FROM @item_table

     WHERE primary_key = @item_category_counter

     

     select @X=@Name

     

     SET @item_category_counter = @item_category_counter + 1

    END

    set @TimeDiff=datediff(ms,@Time1,getdate());set @Time1=getdate();set @sPrint= 'Loop elapsed ' +cast(@TimeDiff as varchar(20))+' ms';Print @sPrint

     

    ----------------------------------------------------------

    USE AdventureWorks2000

    DBCC DROPCLEANBUFFERS

    DBCC FREEPROCCACHE

    DECLARE @Name nvarchar(50),@X nvarchar(50)

    declare @Time1 datetime, @TimeDiff int

    declare @sPrint varchar(100)

    Print 'Start';set @Time1=getdate();

    DECLARE item_cursor CURSOR LOCAL fast_forward FOR

     SELECT top 100000 P.Name 

     FROM Product P

     join TransactionHistory H

     on H.ProductID=P.ProductID

    OPEN item_cursor

    FETCH NEXT FROM item_cursor INTO @Name

    set @TimeDiff=datediff(ms,@Time1,getdate());set @Time1=getdate();set @sPrint= 'First fetch elapsed ' +cast(@TimeDiff as varchar(20))+' ms';Print @sPrint

    WHILE @@FETCH_STATUS = 0 BEGIN

      select @X=@Name

      

      FETCH NEXT FROM item_cursor INTO @Name

    END

    CLOSE item_cursor

    deallocate item_cursor

    set @TimeDiff=datediff(ms,@Time1,getdate());set @Time1=getdate();set @sPrint= 'Loop elapsed ' +cast(@TimeDiff as varchar(20))+' ms';Print @sPrint

    -----------------------------------------------------------------------

    USE AdventureWorks2000

    DBCC DROPCLEANBUFFERS

    DBCC FREEPROCCACHE

    DECLARE @Name nvarchar(50),@X nvarchar(50)

    declare @Time1 datetime, @TimeDiff int

    declare @sPrint varchar(100)

    Print 'Start';set @Time1=getdate();

    DECLARE item_cursor CURSOR LOCAL static read_only forward_only FOR

     SELECT top 100000 P.Name 

     FROM Product P

     join TransactionHistory H

     on H.ProductID=P.ProductID

    OPEN item_cursor

    FETCH NEXT FROM item_cursor INTO @Name

    set @TimeDiff=datediff(ms,@Time1,getdate());set @Time1=getdate();set @sPrint= 'First fetch elapsed ' +cast(@TimeDiff as varchar(20))+' ms';Print @sPrint

    WHILE @@FETCH_STATUS = 0 BEGIN

      select @X=@Name

      

      FETCH NEXT FROM item_cursor INTO @Name

    END

    CLOSE item_cursor

    deallocate item_cursor

    set @TimeDiff=datediff(ms,@Time1,getdate());set @Time1=getdate();set @sPrint= 'Loop elapsed ' +cast(@TimeDiff as varchar(20))+' ms';Print @sPrint

     

  • Just my humble opinion but I need to clear up some of the myths associated with Table Variables...

    First, table variables don't live in memory anymore than temp tables do... they both start out in memory if they fit and then they BOTH move to TempDB if they don't.  See the following URL for "proof"...

    http://support.microsoft.com/default.aspx?scid=kb;en-us;305977&Product=sql2k

    ... which states, as part of answer A4...

    Q4: Are table variables memory-only structures that are assured better performance as compared to temporary or permanent tables, because they are maintained in a database that resides on the physical disk?

    A4: A table variable is not a memory-only structure. Because a table variable might hold more data than can fit in memory, it has to have a place on disk to store data. Table variables are created in the tempdb database similar to temporary tables. If memory is available, both table variables and temporary tables are created and processed while in memory (data cache).

    Now, what you might not know, is that table variables do NOT create or use statistics.  So, the use of table variables may actually be slower than using a TempTable on larger usages.  I've been unfortunate enough to witness this particular phenomena.  From the same previously referenced URL...

    Q3: What are some of the drawbacks of table variables?

    A3: These are some of the drawbacks as compared to temporary tables:

    Non-clustered indexes cannot be created on table variables, other than the system indexes that are created for a PRIMARY or UNIQUE constraint. That can influence the query performance when compared to a temporary table with non-clustered indexes.
    Table variables do not maintain statistics like temporary tables can. Statistics cannot be created on table variables through automatic creation or by using the CREATE STATISTICS statement. Therefore, for complex queries on large tables, the lack of statistics may deter the optimizer to determine the best plan for a query, thus affecting the performance of that query.

    Ok, myths about table variables being so much better than temp tables are BUSTED... a little testing on your part will offer additional proof to skeptics... back to the original problem...

    As some have identified in this thread, there is no difference between using a Static Cursor and using the combination of a Temp Table with a While Loop.  There's no difference in resources or performance.  The real problem is that if you have resorted to the use of any form of RBAR (a "Modenism" which mean "Row By Agonizing Row"), then you have failed as an SQL Developer unless a 3rd party requirement exists for data transmission.

    The only way to eliminate the performance drain of Cursors is to rewrite the code to be set-based.  That may mean that you also have to rewrite more than one level of procs being called in the entire tree of the proc but that's the only way.  It's not that Cursors are "bad", per-se', it's that Cusors and any form of RBAR are a lot slower than set based.

    Any suggestion to the contrary is likely to be shown to be absolutely full of hooey in the light of any proper testing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I have personally discovered that table variables can often be slower than temp tables.  In addition, with temp tables you can create indexes, which often speed up performance significantly (minus the time it takes to create the index).

    But for this type of operation, I still think xp_execresultset is the best solution.

  • In SQL Server I have found that using temp tables(non-memory) as a pre-processing staging area makes for more optimized code, reducing the burden on the optimizer(yes these do need help!) and overall making for more efficient use of resources (faster?).  These temp tables can then be cursored and this type of configuration can also become a set-up to using just DML instead of cursors(when practical).

    As for the idea of using SELECT instead of a cursor, once the temp table is established reminds me of the code that one would have to use if cursors were not available (in the SQL Server world when was this?)  Does it work faster than cursors?  15 years ago Microsoft or Sybase would probably have claimed so.  That is until cursors were introduced into SQL Server. 

    We need to remember that SQL Server maintains separate optimizers for cursor SELECTS and another for DML SELECTS.  In the Oracle world there is only one optimzer, and so there is no speed difference between using a CURSOR or just using a SELECT.  As a result(and some other reasns, too) CURSORs are the preferred way of writing code in the Oracle world. 

  • >>But for this type of operation, I still think xp_execresultset is the best solution.

    The original post was for "Eliminating Cursors" so I have to ask... how do you use it to eliminate cursors?  If that's not what you were speaking of, then what are you referring to as "this type of operation".

    By the way... it's not xp_execresultset... it's sp_execresultset and it uses a While loop within...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • >>>These temp tables can then be cursored...

    I don't think you understood... the whole idea is to get rid of cursors and other RBAR methods in SQL Server.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • See my post on the 4th page of this thread, where I give more detail about using xp_execresultset in place of a cursor.  In addition, there are many good examples on the web for using this proc in place of certain kinds of cursor operations.

    It's true that the proc uses a loop internally.  However, in the few times that I've used this proc in real life, there has always been a performance improvement over using a cursor or spinning my own WHILE loop.

    BTW - it can be executed as either xp_execresultset or sp_execresultset. 

  • Ah. Got it... very nice work around and thanks for the clarification on the xp_/sp_.  The proc is actually listed as an sp_ in the Master DB but I guess they made it convenient.  Considering the verosity of sp_ExecResultSet, I am a bit surprised that it runs faster but its understandable because it's a system proc and it's pretty well written for what it does.

    I'd still have to fire a Developer that writes RBAR code, though (certainly not the guy that came up with the type of work around you did).  There's just no excuse.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Mike C,

    Exactly right and great call.  In fact, there's documented proof that table variables do use TempDB at the following web site and that there are some serious DISADVANTAGES to using table variables instead of temp tables...

    http://support.microsoft.com/default.aspx?scid=kb;en-us;305977&Product=sql2k

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 46 through 60 (of 296 total)

You must be logged in to reply to this topic. Login to reply