Re: Intermittent slow database resposne

Question

Post reply

Re: Intermittent slow database resposne

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply

Sam Greene SSCertifiable Points: 5252 More actions · Answer 1

Sam Greene

SSCertifiable

Points: 5252

August 27, 2008 at 9:07 am

#863798

Just for my own information, why did you choose 7168 as the maximum server memory?

7168MB / 1024 = 7GB

You should have at least 1GB for the OS. Gila recommended 1.5.

Any interesting on the performance counters?

jlp3630 Hall of Fame Points: 3243 More actions · Answer 2

DavidB (8/27/2008)

Very sorry about that. The link I was referring to was http://support.microsoft.com/kb/918483.

Not so sure that applies based on the errors that you provided.

Have you installed the performance dashboard as recommended by an earlier response? I would be interested in seeing what you are seeing as your primary wait types.

select * from sys.dm_os_wait_stats order by wait_time_ms desc

I logged into the server this morning, and everything seems to be running ok. In other words, SQL server queries are running fast, and RDC is running at normal speed consistently. I didn't do anything to fix the problem, so I guess it "magically" fixed itself.

Here's a snapshot summary of the current task manager/performance tab:

PF Usage: 8.06GB

Physical Memory (K)

Total: 8383116

Available: 208096

System Cache: 676940

Commit Charge (K)

Total: 8458284

Limit: 11761148

Peak: 11022620

Here's the current wait times from the performance dashboard:

Wait Category Number of Waits Wait Time (sec) % Wait Time

Sleep 92258898 132896.686 51.79%

Parallelism 36760626 57328.218 22.34%

Other 203254943 33374.994 13.01%

Buffer IO 2558454 24789.561 9.66%

Logging 3022637 2843.078 1.11%

Lock 84328 2281.701 0.89%

Latch 869246 1280.718 0.50%

Scheduler Yield 16314131 1243.343 0.48%

Network IO 26328 362.187 0.14%

Buffer Latch 3763003 180.328 0.07%

Memory 5061 1.625 0.00%

Here's the currents results from sys.dm_os_wait_stats:

wait_type waiting_tasks_count wait_time_ms max_wait_time_ms signal_wait_time_ms

SQLTRACE_BUFFER_FLUSH 16216 64860578 4281 546

LAZYWRITER_SLEEP 66871 64822687 1296 3078

CXPACKET 36751505 57303687 532390 1913500

PAGEIOLATCH_SH 2515588 23759968 4015 132578

BROKER_TASK_STOP 2764 13695843 10015 500

BROKER_RECEIVE_WAITFOR 36 7487453 600000 15

IO_COMPLETION 597881 5658250 2406 10203

WRITELOG 2936245 2669671 4437 116312

OLEDB 201284544 1756562 52687 0

LCK_M_U 83012 1738359 20000 2328

BACKUPIO 1142998 1623281 2000 1500

LATCH_EX 868884 1280375 906 186546

SOS_SCHEDULER_YIELD 16044649 1237187 750 1234000

SLEEP_TASK 92094968 1191093 437 1142906

ASYNC_IO_COMPLETION 100 1147593 349718 0

BACKUPBUFFER 222418 1116890 953 1890

PAGEIOLATCH_EX 39569 904937 2421 203

BACKUPTHREAD 999 814796 364468 15

Sam Greene SSCertifiable Points: 5252 More actions · Answer 3

I logged into the server this morning, and everything seems to be running ok. In other words, SQL server queries are running fast, and RDC is running at normal speed consistently. I didn't do anything to fix the problem, so I guess it "magically" fixed itself.

Sounds like a good time to dig into looking at memory and disk performance.

Also, what is the wait in the parallelism?

jlp3630 Hall of Fame Points: 3243 More actions · Answer 4

sam (8/27/2008)

I logged into the server this morning, and everything seems to be running ok. In other words, SQL server queries are running fast, and RDC is running at normal speed consistently. I didn't do anything to fix the problem, so I guess it "magically" fixed itself.

Sounds like a good time to dig into looking at memory and disk performance.

Also, what is the wait in the parallelism?

Here's the info on parallelism:

Wait Type Number of Waits Wait Time (sec) % Wait time Max Wait Time (ms) Avg Wait Time (ms)

CXPACKET 36847923 57496.859 100.00% 532390 1.6

jlp3630 Hall of Fame Points: 3243 More actions · Answer 5

sam (8/26/2008)

http://www.google.com/search?q=troubleshooting+sql+memory+problems&sourceid=ie7&rls=com.microsoft:en-US&ie=utf8&oe=utf8

The first link has a few counters you could look at.

Is this the website that you were referring to regarding performance counters:

http://sqlserver-qa.net/blogs/perftune/archive/2007/06/19/memory-issues.aspx

Thanks again for all your input.

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 6

jlp3630 (8/27/2008)

I'm running RAID-0 on my c: drive (contains Windows, SQL server, master, model and msdb databases and log files).

I'm running RAID-50 on my f: drive (tempdb, user databases and log files).

RAID 50? That's the first time I've heard of anyone using that Raid level.

Just curious, why RAID 50?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

jlp3630 Hall of Fame Points: 3243 More actions · Answer 7

GilaMonster (8/27/2008)

jlp3630 (8/27/2008)

I'm running RAID-0 on my c: drive (contains Windows, SQL server, master, model and msdb databases and log files).

I'm running RAID-50 on my f: drive (tempdb, user databases and log files).

RAID 50? That's the first time I've heard of anyone using that Raid level.

Just curious, why RAID 50?

The server was setup before I started working for this client. I have no idea why it's setup that way. When I find out, I'll let you know.

Also, my c: drive is RAID-1 not RAID-0.

Cliff Jones SSChampion Points: 10547 More actions · Answer 8

On our 64 bit servers we set the memory to allow 1.5 - 2 GB for the OS as Gail recommended and set Maxdop at 1 or 2. If you leave Maxdop at 0 or set it to some higher value (than 1) then set the cost threshold up to a higher value (10 or more).

It looks like parallelism might be a problem based on your waitstats (did you reset them?).

jlp3630 Hall of Fame Points: 3243 More actions · Answer 9

Cliff Jones (8/28/2008)

On our 64 bit servers we set the memory to allow 1.5 - 2 GB for the OS as Gail recommended and set Maxdop at 1 or 2. If you leave Maxdop at 0 or set it to some higher value (than 1) then set the cost threshold up to a higher value (10 or more).

It looks like parallelism might be a problem based on your waitstats (did you reset them?).

Cliff,

Just for my own knowledge, could you explain the logic behind these two suggestions:

1. set Maxdop to 1 or 2

2. maxdop at 0 or higher than 1 & cost threshold > 10

I didn't reset my wait stats when I posted these sys.dm_os_wait_stats results:

wait_type waiting_tasks_count wait_time_ms max_wait_time_ms signal_wait_time_ms

SQLTRACE_BUFFER_FLUSH 16216 64860578 4281 546

LAZYWRITER_SLEEP 66871 64822687 1296 3078

CXPACKET 36751505 57303687 532390 1913500

PAGEIOLATCH_SH 2515588 23759968 4015 132578

BROKER_TASK_STOP 2764 13695843 10015 500

BROKER_RECEIVE_WAITFOR 36 7487453 600000 15

IO_COMPLETION 597881 5658250 2406 10203

I just reset them for further monitoring.

Thanks for your input.

Jon

Cliff Jones SSChampion Points: 10547 More actions · Answer 10

We have found on our 64 bit servers (especially our Itanium) that if we leave Maxdop at 0, the server spends a lot of CPU time transferring the load from one CPU to another (context switching). We actually achieved better performance by setting Maxdop to 1 (which was actually what our HP vendor reccommended).

On some of our other non-Itanium servers I found that I was able to home in on a proper value for Maxdop by tinkering with the Maxdop and the threshold values during times of high CPU utilization to home in on a better value. I did this by resetting the waitstats, let the server run for an hour or so and then rechecking the CXPacket waits and reducing the signal wait percentage. Start at Maxdop 1 for a benchmark, then Maxdop 2. At Maxdop = 2 increase the threshold to see if you can improve on the baseline.

Did you install SQL 2005 from a fresh install (rather than an in place upgrade)?

Cliff Jones SSChampion Points: 10547 More actions · Answer 11

I don't remember where I got this but here is the query I use to get the Signal Wait percentages:

create table #waitstats (

waittype varchar(80),

requests numeric(20,1),

waittime numeric (20,1),

signalwaittime numeric(20,1))

insert into

#waitstats (waittype, requests, waittime, signalwaittime)

select

wait_type, waiting_tasks_count, wait_time_ms, signal_wait_time_ms

from

sys.dm_os_wait_stats

declare @totalwait numeric(20,1),

@totalsignalwait numeric(20,1) ,

@endtime datetime,

@begintime datetime

--get the totals

select

@totalwait=sum(waittime), @totalsignalwait=sum(signalwaittime)

from

#waitstats

--- subtract waitfor, sleep, and resource_queue from Total

select

@totalwait = @totalwait - sum(waittime), @totalsignalwait = @totalsignalwait - sum(signalwaittime)

from

#waitstats

where

waittype in ('Waitfor','Sleep','Resource_Queue')

-- insert adjusted totals, rank by percentage descending

insert into #waitstats select '***total***', 0, @totalwait, @totalsignalwait

select

waittype,

waittime,

percentage = cast(100 * waittime / @totalwait as numeric(20,1)),

signalwaittime,

percentagesw = cast(100 * signalwaittime / @totalsignalwait as numeric(20,1))

from

#waitstats

where

waittype not in ('waitfor','sleep','resource_queue','total')

order by percentage desc

select '%signal'= 100 * (@totalsignalwait / @totalwait)

drop table #waitstats

jlp3630 Hall of Fame Points: 3243 More actions · Answer 12

Cliff Jones (8/28/2008)

We have found on our 64 bit servers (especially our Itanium) that if we leave Maxdop at 0, the server spends a lot of CPU time transferring the load from one CPU to another (context switching). We actually achieved better performance by setting Maxdop to 1 (which was actually what our HP vendor reccommended).

On some of our other non-Itanium servers I found that I was able to home in on a proper value for Maxdop by tinkering with the Maxdop and the threshold values during times of high CPU utilization to home in on a better value. I did this by resetting the waitstats, let the server run for an hour or so and then rechecking the CXPacket waits and reducing the signal wait percentage. Start at Maxdop 1 for a benchmark, then Maxdop 2. At Maxdop = 2 increase the threshold to see if you can improve on the baseline.

Did you install SQL 2005 from a fresh install (rather than an in place upgrade)?

Our current database server was loaded via a fresh SQL 2005 install. However, the majority of the databases were migrated from a SQL 2000 database server.

Cliff Jones SSChampion Points: 10547 More actions · Answer 13

That's a good answer. If you upgraded then some of your default values may not be correct (like Max Worker Threads).

My guess is that if you set both your min and max memory to allow 1.5 GB or more to the OS then your problem will go away, as long as you don't have some other application running on the server. When SQL Server gobbles up the memory beyond a certain point the server begins to thrash and CPU hits the roof.

I have seen this several times with our clients servers.