A strange performance issue

Question

Post reply

A strange performance issue

Viewing 15 posts - 16 through 30 (of 41 total)

You must be logged in to reply to this topic. Login to reply

GregoryAJackson SSCrazy Points: 2794 More actions · Answer 1

GregoryAJackson

SSCrazy

Points: 2794

May 12, 2008 at 6:13 pm

#814170

Exactly what I was thinking.

some IO concurrency issues....

gaj

Gregory A Jackson MBA, CSM

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 2

Just an opinion... auto grow of 1MB is too small and auto grow of 10% will create 73 fragments just getting to 1 gig from the default starting size of 1MB. On small systems, I'd set auto grow to a minimum of 250MB and then I'd never let it catch me by surprise. I'd monitor it's growth for a month and then, while continuing to monitor, plan the grow to manually occur at quiet times. I'd also do the same with TempDB... plan it's size and set it up for that size +20% on boot. It's just not worth having auto grow take you buy surprise.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 3

jeff.williams3188 (5/12/2008)

Off the top of my head, I would suspect that you are seeing the checkpoint operation in action.

Sounds very plausable.

At the times that the disk queue length goes up, what happens to the following counters?

Physical Disk reads/sec

Physical Disk writes/sec

Physical Disk avg sec/read

Physical Disk avg sec/write

Physical disk % idle time.

Track them separatly for different disks.

What kind of IO system are you on?

How many physical devices?

Where are the following:

TempDB data file?

User DB data file?

User DB log file?

Windows swap file? (probably C: )

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jamie Julius Right there with Babe Points: 726 More actions · Answer 4

Thanks to all of you who have shared your ideas and suggestions over the last few hours.

Some comments:

1. Some of you suggested that perhaps an external process is responsible for the IO problem, which would then affect SQL Server's performance. This is not the case. I have used Sysinternal's Process Monitor to see which process is creating the disk activity and it is definitely sqlservr.exe (see http://docs.google.com/Doc?id=df68tp5_3dgq2d4kx). Note the 7 minute interval between the spikes. Also, as noted in the original post, this interval becomes shorter in proportion to the amount of DB activity. It's as if SQL Server is buffering writes of some sort and when something fills up it goes to the disk in a burst, which causes the spike.

2. Gail was interested in more numbers from Perfmon. You can find them here: http://spreadsheets.google.com/pub?key=pwKKIs_WnfpEI_7FYaZo1AQ. I have marked the spikes in yellow. You will see an increase in the number of writes per sec. As for the hardware configuration. This is a simple Windows 2003 server with no special IO system (i.e., a single disk).

3. Some of you suggested that the Profiler could tell me which SQL statement is causing the problem. I know from my own logs that it is not a specific statement. I have the same statements executing very routinely without problem. Then when the IO problem happens, it will affect the performance of whichever SQL statement just happens to be executing.

4. As for the Autogrow numbers. The sizes of the files are currently as follows:

MDF - 1.83 GB

LDF - 344 MB

Both are set to grow by 10% as needed. If this is not wise, please advise me otherwise.

Bottom Line:

The issue is internal to SQL Server itself (i.e., something in SQL Server is performing the write bursts). If there is some IO buffering parameter I should be checking please let me know. Process Monitor seems to also show that when the burst happens, the target file is the MDF file as opposed to the LDF one.

Thanks,

Jamie

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 5

It's very likely the checkpoint, which writes dirty pages to disk. The frequency that the checkpoint runs at depends on the frequency of changes to the database

It looks like you're overloading the IO subsystem. I would strongly recommend more disks.

Split TempDB out onto it's own physical volume. Split the data files from the log files for your user database.

Split the OS and the swap file out onto its own volume

What's probably happening is, as SQL runs the checkpoint, so writes to the transaction log get slowed down (same disk so contention)

Log records must be hardened on disk before the transaction can be completed. the log records take longer to write, because of the O load, and hence the transactions slow down.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jamie Julius Right there with Babe Points: 726 More actions · Answer 6

Jamie Julius

Right there with Babe

Points: 726

May 13, 2008 at 5:15 am

#814341

Hi Gail,

Thanks for your reply. Do you know if there is any logging of the checkpoints, which would allow me to verify that indeed they are associated with what I'm seeing?

Thanks,

Jamie

Aaron Nelson Ten Centuries Points: 1118 More actions · Answer 7

Aaron Nelson

Ten Centuries

Points: 1118

May 13, 2008 at 5:36 am

#814354

They should show up when you run profiler against your db.

@SQLvariant

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 8

Checkpoints are internal operations. They won't show up in profiler

You can log the output of:

SELECT * from sys.dm_exec_requests where session_id < 50

Log it every few sec. You should see some evidence of the checkpoint running

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jamie Julius Right there with Babe Points: 726 More actions · Answer 9

Jamie Julius

Right there with Babe

Points: 726

May 13, 2008 at 7:08 am

#814426

Bingo!

I found the following perfmon counter:

SQLServer:Buffer Manager\Checkpoint pages/sec

It correlates exactly with the disk writes.

So indeed my problem seems to be related to the checkpoints.

Thanks to those of you who pointed me in this direction!

Jamie

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 10

Not sure why it took all of those posts to get to checkpoint as the likely cause!! 🙂

You could test this by setting the recovery interval to a very large value and see if the frequency shifts to much longer. Note: this is not a permenant suggestion, btw.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 11

I had anj idea it was the problem at the beginning, wasn't sure.

Jamie: The root problem is that your IO sustem is not adequate for the load you're putting on it. I would strongly suggest (at a minimum) you add 2 more disks, put your data file on one of them and your log file on the other.

Are you using RAID arrays, or just single disks?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jamie Julius Right there with Babe Points: 726 More actions · Answer 12

Thanks for this info.

We're currently on a single disk. We can live with the current situation as is, but if things get worse, I'll follow up on your advice. Thanks!

I anticipate an increase in read activity (SELECT statements) in the future, but do not anticipate an increase in write activity (INSERT, UPDATE, DELETE).

If I understand correctly the nature of the checkpoints, they are a result of write activity only. More reads won't affect them. Given this, we can probably live with things as is (at least for a while).

Jamie

Jamie Julius Right there with Babe Points: 726 More actions · Answer 13

One more question to help me understand what's happening.

The checkpoint process kicks in as a result of accumulated write activity, which needs to be flushed to the disk. The act of flushing occupies the disk so much that it prevents concurrent queries from executing normally.

Is this so, because every concurrent query must at least write to the log file even though the data is cached in memory?

Is this why you suggested that separating the MDF and LDF files to separate disks would improve performance?

Thanks for your time.

Jamie

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 14

I had an idea it was the problem at the beginning, wasn't sure.

The keys from the original post were that it was repeating every so many minutes and the period went down as the number of database modifications went up. A good note was that he disabled autoupdate stats, which could well have been a culprit here to. Subsequent points were made (such as autogrowth/shrink) to add evidence to the checkpoint hypothesis.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Aaron Nelson Ten Centuries Points: 1118 More actions · Answer 15

GilaMonster (5/13/2008)

Checkpoints are internal operations. They won't show up in profiler

You can log the output of:

SELECT * from sys.dm_exec_requests where session_id < 50

Log it every few sec. You should see some evidence of the checkpoint running

Good Point.

My bad for not double-checking that one first.

@SQLvariant