Cursors Be Gone!

  • There seems to be a lot of confusion between table variables and temporary tables.

    For us table variables have proved to be of massive benefit, outperforming cursors in every instance (approx 100 tables and several million total records in the system we're using). Cursors have always proved to be much, much slower than using table variables.

    In one instance we need to calculate a result of some microbiological testing, this result can be determined by up to 8 separate results, each of which may have other determining factors. Cursors used to be used, and the slowed things down and sometimes produced locks. When we re-wrote the stored procedures using table variables the speed increased and the locks pretty much vanished too. Some of our stored procedures have 5 or 6 table variables within them, some of them having 30 + fields, as long as you get the ids and links between the table variables right you can process them very efficiently, and then simply link back to your real tables at the end to update the data.

    As I mentioned previously, some cursor based procedures performance improved from minutes down to a few seconds by using table variables.

    I can't easily provide many useful examples since you'd need to understand the calculations etc.. of what was in the code. (plus I'd probably get sacked!)

    I can understand a lot of the why bother type arguments relating to this article and some of the other example code that has been provided though, since in most of the cases the real world benefits would not be seen in the examples. I've only read the first ten or so pages though, since I don't have time!

  • darren.flynn (1/6/2009)


    There seems to be a lot of confusion between table variables and temporary tables.

    For us table variables have proved to be of massive benefit, outperforming cursors in every instance (approx 100 tables and several million total records in the system we're using). Cursors have always proved to be much, much slower than using table variables.

    And herein lies some of the confusion, I think. AFAIK, no one here is saying that Table Variables are not better than Cursors, they are. What we are saying is that in almost every performance aspect, temporary tables are as good as or better than table variables.

    Note the "performance" qualification above, it's important. Because, semantically table vairables are desirable because they are very "clean": they have very explicit local scoping, and they do their own housekeeping. Meaning that SQL developers do not have to worry about cleaning them up nor about unintended functional side-effects with called functions and sProcs. This is why I think that developers find them preferrable.

    Of course, those same side-effects in temporary tables are why they can be used for so many "tricks" that you cannot do with anything else.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • 1) you cannot have a regular old non-unique/nonPK index on a table variable

    2) you can do this however to create a multiple column PK:

    declare @tab table (a tinyint, b tinyint, primary key nonclustered (a, b))

    insert @tab values(1, 1)

    select * from @tab

    3) someone posted an example of how a table variable was faster than a temporary table. It was a meaningless example however because you didn't DO anything with the table! My statement, which is incontrovertible, was that it is the OPTIMIZER that gets hosed by table variables since it cannot have statistics. I have lost track of the number of times I have seen a table variable (or table UDF btw) used in joins and the optimizer gives out an estimated number of rows of ONE then does a nested loop plan which goes out to lunch because of the number of rows ACTUALLY hit. For temp tables the optimizer can create it's own stats and see that when it joins it will actually hit an estimated 500K rows or whatever and pick a MUCH more efficient plan. Or you can create your own explicit stats to help the optimizer out as well. Or index(es).

    When you are dealing with situations where there truly ARE small numbers of rows ALWAYS hit when table vars are used/joined there is absolutely NO problem in using them and potential benefits such as faster net performance and fewer recompiles, etc. But such perf benefits get completely overwhelmed if you get into the dreaded nested-loop-plan-on-a-kajillion-rows situation. 🙂

    Oh, and NEVER lose sight of the fact that when you join the number of rows in the table var is NOT then whole story!! what matters is how many rows come out of the join.

    here is a quickie example. as the number of rows (or joins) goes up it gets much worse

    use adventureworks

    go

    dbcc freeproccache

    set statistics IO on

    set statistics time on

    go

    declare @tab table (ProductID int)

    insert @tab values (870)

    insert @tab values (873)

    insert @tab values (921)

    insert @tab values (712)

    insert @tab values (707)

    insert @tab values (711)

    select ActualCost

    from Production.TransactionHistory th

    inner join @tab t on t.ProductID = th.ProductID

    (16895 row(s) affected)

    nested loop bookmark lookup plan

    note the estimated rows in the join are 257, actual 16895

    420 milliseconds avg duration

    51789 total logical reads

    dbcc freeproccache

    go

    create table #tab (ProductID int)

    insert #tab values (870)

    insert #tab values (873)

    insert #tab values (921)

    insert #tab values (712)

    insert #tab values (707)

    insert #tab values (711)

    select ActualCost

    from Production.TransactionHistory th

    inner join #tab t on t.ProductID = th.ProductID

    (16895 row(s) affected)

    hash join plan, estimated and actual rows are equal

    320 milliseconds duration

    793 total logical reads

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (1/6/2009)


    someone posted an example of how a table variable was faster than a temporary table. It was a meaningless example however

    That someone was me. and i did in fact say that it was a mundane example.

    However, all I was trying to do was point out situations in which they could be faster, which they can, and which I am happy that you have acknowledged. (second edit -> decided what i said sounded too harsh).

    Anyway, here is a slightly less mundane example that still performs faster on my box.

    1.093 secs:

    CREATE TABLE #temptable (cardid int primary key clustered)

    INSERT INTO #temptable

    select top 100 cardid from

    (select cardid from tblLogTransactions GROUP BY [CardID]) t

    select sum(pointsissued) from tblLogTransactions INNER JOIN #temptable t on [tblLogTransactions].[CardID] = t.[CardID]

    0.843 secs:

    DECLARE @temptable table (cardid int primary key clustered)

    INSERT INTO @temptable

    select top 100 cardid from

    (select cardid from tblLogTransactions GROUP BY [CardID]) t

    select sum(pointsissued) from tblLogTransactions INNER JOIN @temptable t on [tblLogTransactions].[CardID] = t.[CardID]

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • Matt, can you please run your last example without the PK clustered?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thank you TheSQLGuru. That explanation makes perfect sense. I was very stuck on why we saw such improvement, yet others were not.

    I had not realized the statistics differences issue and that will be very helpful if we run into problems.

    I wasn't born stupid - I had to study.

  • Yep - no worries.

    Temp table: 1.218

    Table variable: 0.921

    I think probably what you wanted me to do instead was to change the 100 to 1000. Then it moves heavily in the other direction:

    Temp table: 1.328

    Table variable: 5.671

    HOWEVER - and please do read this bit and understand where i'm coming from. I'm not saying that table variables are better. I am just saying they have their uses, can perform equitably to temp tables in the right circumstances and therefore shouldn't be dismissed out of hand.

    As a point of interest - I think it would be incredibly useful for this 'interesting behaviour' of table variables to be addressed - as it makes TV Functions reasonably useless. What are your thoughts?

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • Matt Whitfield (1/6/2009)


    Yep - no worries.

    Temp table: 1.218

    Table variable: 0.921

    I think probably what you wanted me to do instead was to change the 100 to 1000. Then it moves heavily in the other direction:

    Temp table: 1.328

    Table variable: 5.671

    HOWEVER - and please do read this bit and understand where i'm coming from. I'm not saying that table variables are better. I am just saying they have their uses, can perform equitably to temp tables in the right circumstances and therefore shouldn't be dismissed out of hand.

    As a point of interest - I think it would be incredibly useful for this 'interesting behaviour' of table variables to be addressed - as it makes TV Functions reasonably useless. What are your thoughts?

    1) I wanted to see if the clustered index had any 'extra' overhead for the temp table - it appears it does not at least for 1000 rows.

    2) Never said table vars didn't have their place - just that they (in my experience) are over used and cause many (and more severe) performance problems than they solve. I also made sure to mention a situation or two where they SHOULD be considered.

    3) TVFs are already reasonably useless and it has nothing to do with table variables. They fall prey to the exact same problem - the optimizer cannot do the 'optimal' thing when they are in play in many scenarios. But hey, I get paid a lot of money each year to clean up the performance messes created by developers who wanted code reuse instead of good code. So in reality, I LOVE TVFs and table vars!!! 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks 🙂 it's been good to learn from you!

    I'm pretty much way off topic now, but do you think in reality there should be a 'compilation' step that refactors tvf's into temp table based operations when creating stored procs that use them? Ideally like some sort of function whereby in production and test you would have optimised procedures, wheras in development you could make use of the code re-use that tvf's afford?

    It would be a big undertaking, but a product that achieved that would surely be worth it's weight in gold?

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • TheSQLGuru (1/6/2009)


    .... here is a quickie example. as the number of rows (or joins) goes up it gets much worse

    ....

    This example may be rather misleading, depending on what your hardware configuration is. dbcc freeproccache clears stuff from the query plan cache, but the rows read from disc may still be in the disc drive's cache, a raid controller's cache, an operating system cache (unlikely - I think SQLS probably doesn't permit that to happen) or indeed in SQLS managed storage other than the query plan cache. The numbers are small enough that many systems could cope with them in cache. If so (and it could be guaranteed on just about any of my customers' systems that the data would be cached in raid controller store, and I'm pretty sure SQLS would be caching it too) whichever of the two queries runs second has a big advantage. Have you tried running your test with the two queries in the opposite order and seeing if you get the same results?

    One question to which I don't know the answer for the latest versions of SQL server: has the lazy writer some mechanism for reducing the probability of writing to disc dirty pages which contain rows from temp tables, and if so does the transaction log writing mechanism have mechganisms for taking (safe) advantage of this? I've worked on RDBMS where those things were not true; and in at least one case it was because the lead architect thought the effort of doing it when it would only have an effect for small temporary tables (large ones hit the disc anyway) so it wasn't worth doing. That suggests that in terms of lazy writer IO and transaction log IO table variables for "small" tables have a built in advantage for disc IO (especially significant in cheap systems where the log file is held on the same physical media as the data). Obviously unindexed table variables can be a problem, but primary key and unique constraints (both single column and multicolumn) can be included in the definition so that indexes do exist.

    I suppose I could try to measure those things myself, but today is Los Reyes (roughly equivalent here to 25 Dec where I imagine you are) and I'm feeling lazy. (That's a lousy excuse, actually I feel lazy every day, like all good developers - the real reason is that my hardware is probably nothing much like yours so comparisons would be meaningless.)

    Tom

  • Tom.Thomson (1/6/2009)


    TheSQLGuru (1/6/2009)


    .... here is a quickie example. as the number of rows (or joins) goes up it gets much worse

    ....

    This example may be rather misleading, depending on what your hardware configuration is. dbcc freeproccache clears stuff from the query plan cache, but the rows read from disc may still be in the disc drive's cache, a raid controller's cache, an operating system cache (unlikely - I think SQLS probably doesn't permit that to happen) or indeed in SQLS managed storage other than the query plan cache. The numbers are small enough that many systems could cope with them in cache. If so (and it could be guaranteed on just about any of my customers' systems that the data would be cached in raid controller store, and I'm pretty sure SQLS would be caching it too) whichever of the two queries runs second has a big advantage. Have you tried running your test with the two queries in the opposite order and seeing if you get the same results?

    One question to which I don't know the answer for the latest versions of SQL server: has the lazy writer some mechanism for reducing the probability of writing to disc dirty pages which contain rows from temp tables, and if so does the transaction log writing mechanism have mechganisms for taking (safe) advantage of this? I've worked on RDBMS where those things were not true; and in at least one case it was because the lead architect thought the effort of doing it when it would only have an effect for small temporary tables (large ones hit the disc anyway) so it wasn't worth doing. That suggests that in terms of lazy writer IO and transaction log IO table variables for "small" tables have a built in advantage for disc IO (especially significant in cheap systems where the log file is held on the same physical media as the data). Obviously unindexed table variables can be a problem, but primary key and unique constraints (both single column and multicolumn) can be included in the definition so that indexes do exist.

    I suppose I could try to measure those things myself, but today is Los Reyes (roughly equivalent here to 25 Dec where I imagine you are) and I'm feeling lazy. (That's a lousy excuse, actually I feel lazy every day, like all good developers - the real reason is that my hardware is probably nothing much like yours so comparisons would be meaningless.)

    1) You are new here so I will spot you some slack. 😀 I didn't run dbcc dropcleanbuffers because I wanted to ENSURE the rows were in cache so that it was apples-to-apples. Those sets of code were run many times in a row with pure logical reads so I could give an average time. I also wanted to give the table var a fighting chance. Doing both queries with buffers flushed gives the table var a 1.4 sec runtime and the temp table 0.7 sec runtime. 😉

    2) log writing and lazy writing is way beyond the scope of this thread. But I will say that tempdb logging is quite different from normal logging.

    3) You can always make perf comparisons on your hardware, because each run uses the same exact machine. So you CAN get valid perf metrics - you just cannot compare those metrics with other' posters metrics.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Heh... actually... we're all new here... I learn something new everyday.

    Welcome aboard and happy Los Reyes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • RBarryYoung (1/5/2009)


    Farrell Keough (1/5/2009)


    We have one last cursor used for an unusual "counting" routine we must follow. I attempted to replace it with this method, but it is slower than using the cursor.

    OK, if I understand what this is trying to do, then the following routine is approximately 10x faster:

    declare @dat datetime; set @dat = getdate()

    print 'start: '+convert(varchar(23), @dat, 121)

    update c

    set count = 0

    From (Select id, , alpha, beta, gamma, [type], [count]

    , ROW_NUMBER() OVER(Partition By [count],

    Order By [Type], [Alpha], [Beta], [Gamma], [id])

    AS KeyRowCount

    From ccount) c

    Where KeyRowCount > 1

    -- And [COUNT] <> 0 --this MAY make it faster...

    print 'done: '+convert(varchar(23), getdate(), 121)

    +', ms:' + cast(datediff(ms,getdate(),@dat) as varchar)

    It appears to produce the correct results on my DB, but you should check it to be sure.

    Not sure why, but ROW_NUMBER() is not recognized and I am getting an error on the syntax around OVER. I have not used these before.

    I wasn't born stupid - I had to study.

  • Farrell Keough (1/7/2009)


    It appears to produce the correct results on my DB, but you should check it to be sure.

    Not sure why, but ROW_NUMBER() is not recognized and I am getting an error on the syntax around OVER. I have not used these before.

    [/quote]

    You on 2000 by any chance? That's why I posted a different version to achieve the same thing - because OVER clauses are 2005+.

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • Matt Whitfield (1/7/2009)


    You on 2000 by any chance? That's why I posted a different version to achieve the same thing - because OVER clauses are 2005+.

    That's correct. I will have to go back and look for your solution...

    We have not fully migrated yet.

    Thanks

    I wasn't born stupid - I had to study.

Viewing 15 posts - 196 through 210 (of 272 total)

You must be logged in to reply to this topic. Login to reply