WAITSTATS - CXPACKET query

Question

Post reply

WAITSTATS - CXPACKET query

Randolph West

SSCommitted

Points: 1575
More actions
June 17, 2009 at 3:57 am

#131255

Hi all

I've posted this on Serverfault.com as well, just in case.

Platform: SQL 2000 Enterprise with SP4, running on Windows 2003 Enterprise, 8 cores, 8GB RAM, and internal (!) storage. Databases are roughly 100GB in total.

Scenario: The waitstats are heavily skewed to hardware:

CXPACKET - 35.9%

WRITELOG - 18.4%

LCK_M_S - 13.2%

PAGEIOLATCH_SH - 10.9%

NETWORKIO - 10.4%

LATCH_EX - 5.6%

PAGEIOLATCH_EX - 2.6%

LCK_M_IX - 1.5%

LCK_M_IS - 1.2%

PAGELATCH_UP - 0.1%

I am ignoring the LATCH and LCK types for now, because we're moving to SAN shortly during a SQL 2005 upgrade, and the internal drives are heavily fragmented anyway.

However, I'm concerned about the CXPACKET type. In some cases I've had 80% wait times relating to this. Does the parallel execution plan setting need to be reviewed? (it is set to 5 at the moment). Will moving to SAN fix this as well? Should I even worry in the first place?

Thanks in advance!

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Randolph West SSCommitted Points: 1575 More actions · Answer 1

Randolph West

SSCommitted

Points: 1575

June 17, 2009 at 5:23 am

#1010788

A response on my post on Serverfault.com indicated that this is I/O related as well, which means I can re-benchmark after SAN and SQL 2005 migration.

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 2

A CXPacket wait is due to a parallel skew. SQL has decided it will parallel one or more operators in a query and one of the threads finishes before the other and has to wait for the second thread to catch up. That wait is a CXPacket wait.

One of the suggested ways to reduce this is to reduce the MaxDop of the server. It's not a suitable solution in all cases though. Depends what kind of queries are running on the server and how often they parallel.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Randolph West SSCommitted Points: 1575 More actions · Answer 3

Randolph West

SSCommitted

Points: 1575

June 17, 2009 at 9:58 am

#1011094

Dankie, Gail 🙂

I've read up on it before posting, so *theoretically* I know what it means. That said, your advice and suggestions are welcome. I've not seen the wait number this high in a production environment before, and was wondering whether moving the data files to SAN will fix it?

Thanks again!

Randolph

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 4

CXPacket wait is a CPU-related wait, not an IO-related wait. It's from queries processing in parallel. There's a number of things that can cause that, from poor statistics (SQL makes a bad choice about splitting data), to hyperthreading, to other processes interfering (often non-SQL processes)

As for SANs, just moving data files to a SAN won't necessarily make things faster. It's easy for a SAN to be slower than direct-attached storage (if it's configured poorly). A friend talks about 'magic SAN dust' - the myth that SAN automatically = fast.

This is a good podcast - http://www.sqldownunder.com/SDU34FullShow.mp3

Moving the data files to faster drives, to separate drives, may help with the LogWrite waits and the PageIOLatch waits. Those are both IO-related. If the SAN's poorly configured, it could make things worse.

The only way I can see an IO-relation to CXPacket is if it's the index seek/scan that's paralleling and the data is not in memory and reading some of the index off disk is slower than reading other parts of the index off disk. It's possible.

How many processors in this server? Are they hyperthreaded? Is this an OLTP or OLAP type database?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Randolph West SSCommitted Points: 1575 More actions · Answer 5

Hi Gail

I inherited it two weeks ago and I've made it go much faster with query optimisations, indexes, etc., but I'm now trying to squeeze a little bit more performance out of it.

From what I can tell, it's a 4-way dual-core Xeon. There are 3 x 200GB SATA disks dedicated to SQL (no RAID AFAIK), and it's OLTP with long-running transactions (procs within cursors within triggers), and some reporting bits.

I've also tried to optimise the indexes, and need to run a long-term analysis on the box, but they shut down SQL every night to do a filesystem backup. I'm already on their case about proper backups, but that's next week's problem (apparently).

What else? Massive fragmentation on the drives, which I'm (slowly) sorting out.

That's about it, really. The biggest complaint is that every so often, fast-running queries (~400ms) can take up to 30 seconds to run, and I've narrowed it down to environment - blocking is minimal since I did the indexing and query optimisations, and I/O *is* slow. Hence the wait stats and now my CX Packet question. I've tried playing with the parallel setting, with no visible effect. I even played around with RAM, but it's 32-bit running AWE, so I'm fairly restricted there.

As far as SAN is concerned, they have another DB running very nicely with good I/O, so I'm expecting some improvement.

I appreciate your input in this.

Randolph

Randolph West SSCommitted Points: 1575 More actions · Answer 6

One more thing - the system is distributed between three cities and headoffice. I'm discussing the headoffice box specifically. I've taken into account various external factors (which also accounts for the high network I/O stat in my OP).

The box usually sits at 25 - 30% CPU usage, and under 200MB RAM (which I don't believe, because of AWE, but Process Explorer doesn't lie).

Cheers.

Randy-574768 Ten Centuries Points: 1355 More actions · Answer 7

Randy-574768

Ten Centuries

Points: 1355

June 17, 2009 at 12:04 pm

#1011185

We were talking about CXPACKETS and MaxDop on this post , as well. Might want to check it out. I've wrestled my CXPACKETS down from being a major problem to a non-issue working with the Maximum Degree of Parallelism and Cost Threshold of Parallelism server settings.

Randy

Eddie Wuerch SSChampion Points: 12504 More actions · Answer 8

The high CXPACKET waits, combined with the PAGEIOLATCH_SH and PAGEIOLATCH_EX waits, likely indicate table scanning. Including the PAGEIOLATCH waits shows that you're scanning from disk and incurring additional cost on the scan.

Keep an eye on sysprocesses for specific queries hitting those waits, and then check the plans for table scans that can be turned into seeks with proper indexing (and loop joins that can be turned into merge joins, etc.). Fiddling with MAXDOP in those cases simply chokes the big queries out of the way so other queries have resources, but does not reduce the amount of data that must be processed.

-Eddie

Eddie Wuerch
MCM: SQL

Randolph West SSCommitted Points: 1575 More actions · Answer 9

Thanks, Eddie and Randy. I'll look into it during daylight hours tomorrow.

To be honest, I wouldn't be surprised if there's a lot of table scanning happening. Some of the big stuff is dynamically generated. I wanted to run Profiler over a couple of days, or even a week, to allow me to run the tuning wizards on proper samples (I miss SQL 2005 so much!), but they shut down SQL every night, which doesn't help things.

I'll let you know what I find.

R

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 10

If it's OLTP I'd consider dropping the MAXDOP to 4 or even 2. Combine that with optimising the worst offending queries, and you may get this under control fairly well.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 11

Randolph Potter (6/17/2009)

To be honest, I wouldn't be surprised if there's a lot of table scanning happening. Some of the big stuff is dynamically generated. I wanted to run Profiler over a couple of days, or even a week, to allow me to run the tuning wizards on proper samples (I miss SQL 2005 so much!), but they shut down SQL every night, which doesn't help things.

I wouldn't recommend running profiler for days on end. It just leads to information overload, too much data to process. I've found that profiler traces of max an hour work very well.

http://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server,-part-1/

http://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server,-part-2/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Randolph West SSCommitted Points: 1575 More actions · Answer 12

Randolph West

SSCommitted

Points: 1575

June 17, 2009 at 2:09 pm

#1011260

Much obliged again, Gail.

Randolph West SSCommitted Points: 1575 More actions · Answer 13

Max degree of parallelism: 2 sent CPU through the roof - 4 seems to be a much better setting, and CXPACKET wait is way down

Cost threshold for parallelism: 25 made no change from 5 - 75 seems to have a dramatic difference

Now they are putting a freeze on production because of an application upgrade, so I cannot make any more changes until July. However, I think with the query and index optimisations I've done, and your guys' help this week with CXPACKET, the environment is vastly improved.

Many thanks, all.

jcosta SSC Veteran Points: 241 More actions · Answer 14

jcosta

SSC Veteran

Points: 241

September 10, 2009 at 12:08 pm

#1051858

Randolph,

How did you obtain the wait states by percentage breakdown in this post?