A strange performance issue

  • Exactly what I was thinking.

    some IO concurrency issues....

    gaj

    Gregory A Jackson MBA, CSM

  • Just an opinion... auto grow of 1MB is too small and auto grow of 10% will create 73 fragments just getting to 1 gig from the default starting size of 1MB. On small systems, I'd set auto grow to a minimum of 250MB and then I'd never let it catch me by surprise. I'd monitor it's growth for a month and then, while continuing to monitor, plan the grow to manually occur at quiet times. I'd also do the same with TempDB... plan it's size and set it up for that size +20% on boot. It's just not worth having auto grow take you buy surprise.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • jeff.williams3188 (5/12/2008)


    Off the top of my head, I would suspect that you are seeing the checkpoint operation in action.

    Sounds very plausable.

    At the times that the disk queue length goes up, what happens to the following counters?

    Physical Disk reads/sec

    Physical Disk writes/sec

    Physical Disk avg sec/read

    Physical Disk avg sec/write

    Physical disk % idle time.

    Track them separatly for different disks.

    What kind of IO system are you on?

    How many physical devices?

    Where are the following:

    TempDB data file?

    User DB data file?

    User DB log file?

    Windows swap file? (probably C: )

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks to all of you who have shared your ideas and suggestions over the last few hours.

    Some comments:

    1. Some of you suggested that perhaps an external process is responsible for the IO problem, which would then affect SQL Server's performance. This is not the case. I have used Sysinternal's Process Monitor to see which process is creating the disk activity and it is definitely sqlservr.exe (see http://docs.google.com/Doc?id=df68tp5_3dgq2d4kx). Note the 7 minute interval between the spikes. Also, as noted in the original post, this interval becomes shorter in proportion to the amount of DB activity. It's as if SQL Server is buffering writes of some sort and when something fills up it goes to the disk in a burst, which causes the spike.

    2. Gail was interested in more numbers from Perfmon. You can find them here: http://spreadsheets.google.com/pub?key=pwKKIs_WnfpEI_7FYaZo1AQ. I have marked the spikes in yellow. You will see an increase in the number of writes per sec. As for the hardware configuration. This is a simple Windows 2003 server with no special IO system (i.e., a single disk).

    3. Some of you suggested that the Profiler could tell me which SQL statement is causing the problem. I know from my own logs that it is not a specific statement. I have the same statements executing very routinely without problem. Then when the IO problem happens, it will affect the performance of whichever SQL statement just happens to be executing.

    4. As for the Autogrow numbers. The sizes of the files are currently as follows:

    MDF - 1.83 GB

    LDF - 344 MB

    Both are set to grow by 10% as needed. If this is not wise, please advise me otherwise.

    Bottom Line:

    The issue is internal to SQL Server itself (i.e., something in SQL Server is performing the write bursts). If there is some IO buffering parameter I should be checking please let me know. Process Monitor seems to also show that when the burst happens, the target file is the MDF file as opposed to the LDF one.

    Thanks,

    Jamie

  • It's very likely the checkpoint, which writes dirty pages to disk. The frequency that the checkpoint runs at depends on the frequency of changes to the database

    It looks like you're overloading the IO subsystem. I would strongly recommend more disks.

    Split TempDB out onto it's own physical volume. Split the data files from the log files for your user database.

    Split the OS and the swap file out onto its own volume

    What's probably happening is, as SQL runs the checkpoint, so writes to the transaction log get slowed down (same disk so contention)

    Log records must be hardened on disk before the transaction can be completed. the log records take longer to write, because of the O load, and hence the transactions slow down.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Gail,

    Thanks for your reply. Do you know if there is any logging of the checkpoints, which would allow me to verify that indeed they are associated with what I'm seeing?

    Thanks,

    Jamie

  • They should show up when you run profiler against your db.

    @SQLvariant

  • Checkpoints are internal operations. They won't show up in profiler

    You can log the output of:

    SELECT * from sys.dm_exec_requests where session_id < 50

    Log it every few sec. You should see some evidence of the checkpoint running

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Bingo!

    I found the following perfmon counter:

    SQLServer:Buffer Manager\Checkpoint pages/sec

    It correlates exactly with the disk writes.

    So indeed my problem seems to be related to the checkpoints.

    Thanks to those of you who pointed me in this direction!

    Jamie

  • Not sure why it took all of those posts to get to checkpoint as the likely cause!! 🙂

    You could test this by setting the recovery interval to a very large value and see if the frequency shifts to much longer. Note: this is not a permenant suggestion, btw.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I had anj idea it was the problem at the beginning, wasn't sure.

    Jamie: The root problem is that your IO sustem is not adequate for the load you're putting on it. I would strongly suggest (at a minimum) you add 2 more disks, put your data file on one of them and your log file on the other.

    Are you using RAID arrays, or just single disks?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for this info.

    We're currently on a single disk. We can live with the current situation as is, but if things get worse, I'll follow up on your advice. Thanks!

    I anticipate an increase in read activity (SELECT statements) in the future, but do not anticipate an increase in write activity (INSERT, UPDATE, DELETE).

    If I understand correctly the nature of the checkpoints, they are a result of write activity only. More reads won't affect them. Given this, we can probably live with things as is (at least for a while).

    Jamie

  • One more question to help me understand what's happening.

    The checkpoint process kicks in as a result of accumulated write activity, which needs to be flushed to the disk. The act of flushing occupies the disk so much that it prevents concurrent queries from executing normally.

    Is this so, because every concurrent query must at least write to the log file even though the data is cached in memory?

    Is this why you suggested that separating the MDF and LDF files to separate disks would improve performance?

    Thanks for your time.

    Jamie

  • I had an idea it was the problem at the beginning, wasn't sure.

    The keys from the original post were that it was repeating every so many minutes and the period went down as the number of database modifications went up. A good note was that he disabled autoupdate stats, which could well have been a culprit here to. Subsequent points were made (such as autogrowth/shrink) to add evidence to the checkpoint hypothesis.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • GilaMonster (5/13/2008)


    Checkpoints are internal operations. They won't show up in profiler

    You can log the output of:

    SELECT * from sys.dm_exec_requests where session_id < 50

    Log it every few sec. You should see some evidence of the checkpoint running

    Good Point.

    My bad for not double-checking that one first.

    @SQLvariant

Viewing 15 posts - 16 through 30 (of 41 total)

You must be logged in to reply to this topic. Login to reply