TempDB locking question

  • I've recently encountered a severe performance ceiling on a SQL 2000 SP2 server. When this ceiling is reached, response times are 10x their normal values, processor utilization is pegged, and Page lock timeouts occur hundreds of times each second.

    I've determined that one of the pages most contended for belongs to Tempdb.dbo.sysindexes. Out of 400 normal connections to the server, ~70 show this page as their waitresource during normal times. At peak (problem) times, this jumps to hundreds.

    Also, dbcc checkdb and count(*) show no corruption in tempdb and a reasonable number of rows in sysindexes, sysobjects, and syscolumns. However, select rowcnt from sysindexes where indid = 1 and id in (1,2,3) shows (respectively) 25 million, 25 million, and 50 million. Auto update statistics is ON on tempdb.

    Any ideas?

    Dave

  • Have you been able to determine what the users are doing when this timeout starts occurring??

     

     

  • tempDB contention may can alleviate by setting flag -T1118 with mutiple file for the tempdb. I had the same problem and was alleviate by adopting the change. You may still have a better on what's your system is doing anyway.

    Below is the KB link for your reference.

    http://support.microsoft.com/default.aspx?scid=kb;en-us;328551&Product=sql2k

  • Thats a very interesting article that may apply to a problem we are having. It refers to the resources 2:1:1 and 2:1:3 and gives a little info on what they are. We are seeing alot of PAGELATCH_UP waittypes on waitresource 2:1:128. How can I find out what waitresource this is?

     

    Thanks.


    Bob
    SuccessWare Software

  • I've looked over this KB article and splitting the TempDB into multiple files could definately help performance. My only concern is that I haven't been seeing 2:1:1 or 2:1:3 as the waitresource. Like Bob, I've been seeing a substantial amount of PAGEIOLATCH_SH and PAGELATCH_UP waits. These two waits are the bottleneck for 1/3 of my processes.

    Dave

  • I was able to figure out that 2=tempdb, 1=file id, 128=page id.


    Bob
    SuccessWare Software

  • latching may can alleviate by re-org your cluster index.When is last time you drop/recreate your cluster index.

  • I don't usually drop and rebuild system indexes. The page in question belongs to TempDB.dbo.sysindexes. My uptime is 35 days, so that would be the last time the index was rebuilt.

    Table: 'sysindexes' (2); index ID: 1, database ID: 1

    TABLE level scan performed.

    - Pages Scanned................................: 4

    - Extents Scanned..............................: 4

    - Extent Switches..............................: 3

    - Avg. Pages per Extent........................: 1.0

    - Scan Density [Best Count:Actual Count].......: 25.00% [1:4]

    - Logical Scan Fragmentation ..................: 25.00%

    - Extent Scan Fragmentation ...................: 75.00%

    - Avg. Bytes Free per Page.....................: 3131.5

    - Avg. Page Density (full).....................: 61.31%

    I'm leaning towards the issue being a hotspot or locking issue, rather than being physical IO related.

    Dave

  • Is your APP using Global temp table or local temp table ? My experience sometime Global temp table may yield a lot better performance.

  • Thanks John, that's definately something to look into. We're currently using local temp tables, as they may be populated with different information depending on the options the SP was executed with. We've recognized that this is not an optimized SP. I'd like to split this into 20 or so SP's, so that the Query Optimizer can reuse the execution plans more often, but this would be a major code change. It's a long term solution we're considering; however, I need to determine how to best enhance performance in the short term.

    Any ideas on the weird rowcount reporting of sysindexes? For some reason, this strikes me as quite odd.

    Thanks,

    Dave

  • My experience tell me when temp table created/dropped on the fly. The sysindexes and others... statistics did not get update accordingly. Query plan may still using the wrong statiscs, which yiled the poor performance. This is also why you get wrong row count on the sysindexes. I'm curious if you run an update stats, and get the count, did you still get wrong/high row counts ?

    Since the update stats. is not alway running, wrong stats play BIG role here. the correct way is change the way temp table is been used. The short way, try run update stats. more often. if is a SP, try recompile each time it get executed. Be creative, keep in mind 'how can I get system statistics correct' is the key to solve the problem.

  • I've updated statistics and I'm now correctly reporting the rowcounts of sysindexes, sysobjects, and syscolumns.

    Unfortunately, today is relatively light volume, so I wasn't experiencing any issues and can't determine whether these statistics were throwing off execution plans.

    Thanks for the help, John.

    Dave

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply