Blocking TempDB: (15ms)PAGELATCH_UP:tempdb:8(PFS)

  • So, I seem to be having a good opportunity to learn about tempdb contention.

    Here's what I've got:

    SQL Server 2005 running on a 2-processor/8 core machine with a 64-bit install of Windows Server 2008 R2 with access to 8 GB of RAM.

    When I arrived on this contract there were chronic daily restarts of the SQL Server (if someone complained they bounced the service). TempDB was a single file on the same drive as all the database (& log) files. TempDB contention was reduced by moving it to another drive and splitting it into multiple files (8). However I am still seeing contention and I am now aware that the web app that is hitting this db is making very intense use of tempdb, building intermediary tables for business logic.

    Multiple times per day I end up with 40 or more processes blocked due to tempdb contention. Usually this clears out within 15 minutes but it often causes quite a distraction for a few minutes. Often the sys admin just bounces the server as a course of habit before I have a chance to look into what is going on. When I do get to check or get a report back of the object that is listed in the wait stats it is always tempdb, typically sysallocunits or sysfiles1.

    Suggestions? Thanks!

  • Locks or latches?

    Is there any chance of tuning the web app to not use TempDB so heavily?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • What are the processes waiting on? If the web app is creating lots of temp tables you may actually benefit from creating additional data files. No matter how many you create, just make sure they are all sized the same. Without more info its tough to say anything.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I just read the title of the post again. Page latch (not IO) wait on tempdb file 8...chances are adding more data files will help. Gail, please correct if I am off base.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Yep, PageLatch and PFS indicated as the issue. I can certainly try creating more files. Should be fairly easy to see if there is improvement. Anyone want to take a swag at a good number to try? Right now I have 8 that start out at 500 MB each (yeah! that was based on the size they were running in prod).

    So, what do you think of a first try with 16 files with initial size of 250/300 MB?

  • Fly Girl (7/9/2012)


    Yep, PageLatch and PFS indicated as the issue. I can certainly try creating more files. Should be fairly easy to see if there is improvement. Anyone want to take a swag at a good number to try? Right now I have 8 that start out at 500 MB each (yeah! that was based on the size they were running in prod).

    So, what do you think of a first try with 16 files with initial size of 250/300 MB?

    Too many tempdb files can be bad as well. I do not have a ton of experience in this area. I have had to add files to tempdb to reduce contnetion, but have not dealt with anything to the level of what you're seeing. So, I would go slower than simply adding 8 more, even though the math may get nasty. Gail may be able more certain about the safety of doubling the files. I would maybe add 2 more files of the same size (if you have space). Then if that does not help add a couple more while reducing the size of each file equally. Balance the size of each file with the largest expected operation you see in tempdb...spilling is bad too.

    From A SQL Server DBA myth a day: (12/30) tempdb should always have one data file per processor core:

    Too many tempdb data files can cause performance problems for another reason. If you have a workload that uses query plan operators that require lots of memory (e.g. sorts), the odds are that there won't be enough memory on the server to accomodate the operation, and it will spill out to tempdb. If there are too many tempdb data files, then the writing out of the temporarily-spilled data can be really slowed down while the allocation system does round-robin allocation. The same thing can happen with very large temp tables in tempdb too.

    A reboot in between file additions may help avoid SQL pounding the new files as soon as they are added. Without knowing everything about the workload you're dealing with it's hard to say the best move.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Double the files. You want to have them in proportion to the CPUs.

    That said, it's rare for files > CPUs to help much. There are cases where it does, not common though. I would honestly look at trying to tune/fix that web app. Maybe see if there's hardware budget for SSDs for TempDB, but I'm not sure if that will help.

    If the contention is on the system tables (allocunits, etc) rather than an allocation page, extra files won't help. There's still only one sysallocunits no matter how many files there are.

    Also worth doing some more detailed monitoring, locks, temp table size, creation frequency, other server stuff (chapter 1 of http://www.simple-talk.com/books/sql-books/troubleshooting-sql-server-a-guide-for-the-accidental-dba/)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Is it a 2-proc/8-core machine meaning 16 ways? Or 2 procs w/ 4 cores each, i.e. 8 ways?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • It is a machine with 2, 4-core processors so I already have tempdb files = cores. If I'm following along correctly then that would be a 'no' for adding more files.

    Note that during the worst of the contention bouts I have seen allocunits table involved as well. At those times the site is reported (by users) as 'down' rather than 'slow'. From Gail's post that also supports that it would not help to add more files. FYI I'm here on a development contract and not as a DBA so I'm just pitching in to cover the lack of anyone with better DBA skills than I have.

    The web app is under redevelopment but it will be a minimum of 4-5 months before there is anything ready to go into production. The current app is from an outside vendor and cannot be modified by current staff. The relationship with the web app vendor is not good at this time. The main point of the redevelopment is to fix the poorly performing web app for a retail company.

    So, just keep bouncing the server?

  • GilaMonster (7/9/2012)


    Double the files. You want to have them in proportion to the CPUs.

    This is the first time I have heard this. Do you have a reference by chance where I could see more?

    Fly Girl (7/9/2012)


    It is a machine with 2, 4-core processors so I already have tempdb files = cores. If I'm following along correctly then that would be a 'no' for adding more files.

    Thanks for confirming it is 8, not 16. With files already equal to the number cores it is unlikely to help.

    Note that during the worst of the contention bouts I have seen allocunits table involved as well. At those times the site is reported (by users) as 'down' rather than 'slow'. From Gail's post that also supports that it would not help to add more files. FYI I'm here on a development contract and not as a DBA so I'm just pitching in to cover the lack of anyone with better DBA skills than I have.

    The web app is under redevelopment but it will be a minimum of 4-5 months before there is anything ready to go into production. The current app is from an outside vendor and cannot be modified by current staff. The relationship with the web app vendor is not good at this time. The main point of the redevelopment is to fix the poorly performing web app for a retail company.

    So, just keep bouncing the server?

    You say you caught PFS contention. Do you know if you are having SGAM contention too? If so, trace flag trace flag 1118 may be worth looking into. There is a lot to consider before enabling it, but I have heard of it helping on SQL 2005 despite the caching improvements over SQL 2000 and beyond having already added more tempdb data files.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Nasty.

    Try going to 16 files, might help a bit. You can always remove the extras if they don't help.

    I might also suggest more memory for the server. Again, might help a little.

    If there are other user databases that can be moved elsewhere that might alleviate the load.

    The fix is to re-architect the app I'm afraid. Am doing that for a client at the moment, TempDB when I started was 3 times the size of the user database, it now is 20% of the size of the user DB.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • opc.three (7/9/2012)


    GilaMonster (7/9/2012)


    Double the files. You want to have them in proportion to the CPUs.

    This is the first time I have heard this. Do you have a reference by chance where I could see more?

    Cannot find it now, was article I read back around 2006 or so. Not rule, just recommendation. Is why it's 1/4 or 1/2 CPUs as a starting recommendation.

    If all CPUs are writing to TempDB at the same time (using proportional fill) you don't really want some CPUs having more contention than others

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (7/9/2012)


    opc.three (7/9/2012)


    GilaMonster (7/9/2012)


    Double the files. You want to have them in proportion to the CPUs.

    This is the first time I have heard this. Do you have a reference by chance where I could see more?

    Cannot find it now, was article I read back around 2006 or so. Not rule, just recommendation. Is why it's 1/4 or 1/2 CPUs as a starting recommendation.

    If all CPUs are writing to TempDB at the same time (using proportional fill) you don't really want some CPUs having more contention than others

    Thanks. I can see even numbers (divisible) making sense. 3 data files in an 8 way box might make for an irregular contention pattern.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • So far I have not caught SGAM contention--so I'm holding off on that flag. I was looking at that as well but it seems to have some risks and I don't have the specific indicator--or so I gather.

    From what I've been learning in my reluctant dba-dom on this contract, tempdb files equaling the count of processors/cores is in line with best practices. I'll let someone who is better at this than me defend the point, though.

  • Just figured out the page 2 thing.

    No other dbs that can be removed from this server.

    The server is hosted at ViaWest which would dearly like the client to upgrade to 2008. That will be the point at which more memory becomes available, moving up to 32GB and full dba support at ViaWest. This move has been spec'd out for more than a year. However, the company has been purchased and is desperately trying to integrate with the new parent company. As a result there are no resources available to oversee and test the parallel run in order to actually get this done right now.

    I'm trying to get them to commit to a date but they want to align this roll out to use the transition for some testing for the new web app--hence everything may wait until the web app is rewritten. They do understand that the app is the problem and at this point are OK with just bouncing the server.

    Reality is rarely what you want it to be.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply