Any alternatives for avoiding Cursors... ?

  • Hi All,

    This time I need few clarifications and suggestions about using Cursors.

    Me working in a Finance Domain for an ERP called Great Plains and most of our work will be in SQL.

    Very often we need to write various stored procedures especailly for:

         1) extracting data from different datasources (mainly tables and views) to show in the reports by implementing the required business logic. Especially we need to convert most of the values based on multiple currencies, units, etc. Most of our reports are complex with multiple sub reports according to the client requirements.

         2) for integrating data from different external sources into our ERP tables by following all necessary validations for the data consistency.

        3) For validating different kinds of process routines (Especially posting routines)  and also writes for other scenarios also

    In all the above mentioned kind of scenarios we are using Cursors (Read only) and also feel comfortable to implement our logic with them.

    But for improving the performance issues, how can we eliminate these cursors and what other alternatives can be used to impose our logic easily and comfortably.

    Can we completely eliminate these cursors with any other alternatives,

    Please suggest some better solutions for improving the performance.

    Thanks in advance,

    Subhash

  • Dear Subhash:

    In my case, I avoid cursors like they are a curse...

    I have tables with tens of millions of rows (daily security prices in multiple currencies) and row-by-row processing is out of the question.

    Please correct me if I'm wrong but I guess your cursors are being used in data validation (column by column ) or in reports where you do some column formatting row-by-row.

    My suggestion is for you to look at what your procedure or job is doing as a whole and try to group together similar occurences with the same processing consequences.

    For example, I use SELECT CASE a lot to do a single pass process...and combine it with nested subqueries...

    UPDATE MARKET_TABLE

    SET fxrate = IsNull((SELECT f.rate /CASE f.basis

    WHEN 'L' then 100000

    WHEN 'K' then 10000

    WHEN 'N' then 1000

    WHEN 'Y' THEN 100

    ELSE 1

    END

    FROM forexchange f

    WHERE f.currency_code=MARKET_TABLE.currency_code

    AND f.fx_date=(SELECT MAX(f1.fx_date)

    FROM forexchange f1

    WHERE f1.currency_code=f.currency_code AND

    f1.fx_date<= MARKET_TABLE.balancedt)),1)

    WHERE currency_code 'USD'

    In order to execute a repetitive process, I use WHILE LOOPS

    with conditions that can be shared by multiple rows.

    When validating certain occurences of specific data on certain columns, I use TEMP TABLES to hold the specific data and use equijoins with my base table.

    I replaced a cursor within a cursor situation in one developer procedure by using a join with a derived table:

    SELECT BR.broker_id, BR.broker_name,SA.sec_count

    FROM BROKER AS BR,

    (SELECT broker_id, COUNT( security_id) AS sec_count

    FROM sales

    GROUP BY broker_id

    ) AS SA

    WHERE BR.broker_id = SA.broker_id

    The original had a cursor on the sales table within a loop on the cursor of the broker table.

    I hope this helps a bit...

    ---melvynpatrick

  • Hi Subhash,

     

    Great Plains huh?  I've worked with them since version 3.15 and most of their legacy code is still written like they are using Btrieve.   All of their examples use cursors.  Yuck.

    To give you an example of the performance benefit from NOT using cursors, run this query against the TWO company database. 

    The first output will give your the begin and end time using a cursor and the second output will give you the begin and end time using set based queries.   The awesome difference is that the second query (Set based actually went through 41 million records, while the first query (Cursor based) had to be limited to 1 million in order to prevent taking hours to run.

    Once you go set, you never go back.  I've not knowingly used a cursor in 5 years.   Sure, Dexterity uses cursors in the background, but I didn't knowingly use one.

    Part of the problem with getting away from cursors is that our brains can wrap around the 1 record, one update concept.  The whole concept of grabbing 3 tables and then comparing the resultant dataset against 2 other tables as a whole just blows our minds.  I still can't truly grasp why it works, it still makes me kind of starry eyed.

     

    Here's the code I promised:

    A quick rundown on what I'm doing.  I'm using a cross join on the GL20000 table to generate enough records to make it really obvious which works best.  I have to  clear the buffers so this test can be run multiple times without the benefit of SQL Server caching the data and skewing the results.   You shouldn't see more than a few milliseconds difference as long as you aren't doing anything else on the computer while it's running.

    I limit the Cursor test to 1,000,000 rows(if you don't want to wait 4 minutes, limit it to 100,000 rows) because it takes 4 minutes to run 1 Million, it would take almost 3 hours to run 40,000,000 rows.   I don't have the same problem with the Set based script.  It takes 330 ms to run 1000 records and 660ms to run 41,000,000!  The incredible speed of Set based operations strikes again.  I'm never underwhelmed by the difference.

    --START CODE

    USE TWO

    DBCC DROPCLEANBUFFERS

    go

    DECLARE @Reference char(30)

    DECLARE @NameCountA int

    DECLARE @NameCountB int

    DECLARE @NameCountC int

    DECLARE @RecordCount int

    DECLARE @BeginTime datetime

    DECLARE @EndTime datetime

    DECLARE @CursorTime int

    DECLARE @SETTIme int

    SET @BeginTime = getdate()

    PRINT convert(varchar(50),@BeginTime,9)

    SET @NameCountA = 0

    SET @NameCountB = 0

    SET @NameCountC = 0

    SET @RecordCount = 0

    DECLARE test_Reference cursor

    FOR

    SELECT TOP 1000000 GL20000.REFRENCE

    FROM GL20000,GL20000 as CrossJoined

    OPEN test_Reference

    FETCH NEXT FROM test_Reference

    INTO @Reference

    WHILE @@FETCH_STATUS = 0

    BEGIN

     SET @RecordCount = @RecordCount + 1

     IF LEFT(@Reference,1) = 'A'

      SET @NameCountA= @NameCountA + 1

     IF LEFT(@Reference,1) = 'B'

      SET @NameCountB= @NameCountB + 1

     IF LEFT(@Reference,1) = 'C'

      SET @NameCountC = @NameCountC + 1

     FETCH NEXT FROM test_Reference  

     INTO @Reference 

    END

    CLOSE  test_Reference

    DEALLOCATE test_Reference

    PRINT  'Number of Reference columns beginning with A = ' + cast(@NameCountA as varchar(16))

    PRINT  'Number of Reference columns beginning with B = ' + cast(@NameCountB as varchar(16))

    PRINT  'Number of Reference columns beginning with C = ' + cast(@NameCountC as varchar(16))

    PRINT  'Number of Total Records ' + cast(@RecordCount as varchar(16))

    SET  @EndTime = getdate()

    PRINT  convert(varchar(50), @EndTime,9)

    SET  @CursorTime = DATEDIFF(ms,@BeginTime,@EndTime)

    USE TWO

    DBCC DROPCLEANBUFFERS

    SET @BeginTime = getdate()

    PRINT convert(varchar(50), @BeginTime,9)

    SET @NameCountA = 0

    SET @NameCountB = 0

    SET @NameCountC = 0

    SET @RecordCount = 0

    SELECT @NameCountA =  sum(CASE LEFT(GL20000.REFRENCE,1)

       WHEN 'A' THEN 1

       ELSE 0

       END),

     @NameCountB =  sum(CASE LEFT(GL20000.REFRENCE,1)

       WHEN 'B' THEN 1

       ELSE 0

       END),

     @NameCountC =  sum(CASE LEFT(GL20000.REFRENCE,1)

       WHEN 'C' THEN 1

       ELSE 0

       END),

     @RecordCount = count(GL20000.REFRENCE)

    FROM GL20000,GL20000 as CrossJoined

    /*SELECT @NameCountA = COUNT(REFRENCE)

    FROM GL20000

    WHERE left(REFRENCE,1) = 'A'

    SELECT @NameCountB = COUNT(REFRENCE)

    FROM GL20000

    WHERE left(REFRENCE,1) = 'B'

    SELECT @NameCountC = COUNT(REFRENCE)

    FROM GL20000

    WHERE left(REFRENCE,1) = 'C'

    SELECT @RecordCount = count(REFRENCE)

    FROM GL20000

    */

    PRINT 'Number of Reference columns beginning with A = ' + cast(@NameCountA as varchar(16))

    PRINT 'Number of Reference columns beginning with B = ' + cast(@NameCountB as varchar(16))

    PRINT 'Number of Reference columns beginning with C = ' + cast(@NameCountC as varchar(16))

    PRINT 'Number of Total Records ' + cast(@RecordCount as varchar(16))

    SET @EndTime = getdate()

    PRINT convert(varchar(50), @EndTime,9)

    SET  @SetTime = DATEDIFF(ms,@BeginTime,@EndTime)

    PRINT  'Number of milliseconds to run using cursor: ' + cast(@CursorTime as varchar(16))

    PRINT  'Number of milliseconds to run using Set: ' + cast(@SetTime as varchar(16))

    --END CODE

     

  • I use SERVERFILTER in my Access (project) client:

    http://www.winnetmag.com/Articles/Print.cfm?ArticleID=9037

     

  • Hi All,

    Thanks for all your suggestions.

    I will try all of them and get back to you.

     

    Subhash

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply