CXPACKET/Parallelism question

Question

Post reply

CXPACKET/Parallelism question

mazzz

SSCertifiable

Points: 6139
More actions
February 24, 2011 at 10:21 am

#235735

I have a server with a significant amount of CXPACKET waits. It has only one processor (dual core).

Does the possible max DOP depend on physical or logical cores?

Can you get CXPACKET waits that aren't related to parallelism?

------------------------------------------------------------------------
Bite-sized fiction (with added teeth) [/url]

Viewing 15 posts - 1 through 15 (of 44 total)

You must be logged in to reply to this topic. Login to reply

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 1

mazzz (2/24/2011)

I have a server with a significant amount of CXPACKET waits. It has only one processor (dual core).

Does the possible max DOP depend on physical or logical cores?

Possible maxdop is equal to the # of Processors allocated to SQL Server. BOL: http://msdn.microsoft.com/en-us/library/ms189094.aspx

Can you get CXPACKET waits that aren't related to parallelism?

As far as I know the answer is No. CXPACKET is specific to parallelism where a query was parallelized and one or more threads are waiting on a thread to complete "its piece" before it can proceed with the next step of query execution.

You did not mention it specifically but I was managing a server where CXPACKET was the #1 wait type and turning off Hyperthreading (per a SQL Server Pro's recommendation http://blogs.msdn.com/b/jimmymay/archive/2008/11/28/case-study-part-1-cxpacket-wait-stats-max-degree-of-parallelism-option-introduction-to-using-wait-stats-to-identify-remediate-query-parallelism-bottlenecks.aspx) actually improved performance and reduced CXPACKET waits.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Jim Murphy SSCertifiable Points: 6263 More actions · Answer 2

CXPacket waits are always parallelism related. It is based on logical CPU's. SS addresses the CPUs as seen in Windows. So a dual core will show up as 2 CPUs in Windows and SS will use both (by default) for scheduling, and also use both for parallel queries (by default). If Hyperthreading is enabled, it'll show in Windows as 4 CPUs and SS will try to schedule on all 4 'CPUs'

If this is a dev box, then the current config probably won't make much impact on your life. If this is production or a staging server of sorts, then it is best to disable hyper threading entirely, and you also may choose to disable DOP if you only have 1 dual core. Balancing, of course, your own load depending on if you have lots of small OLTP transactions (disable DOP) or if you have larger reports or complicated queries (enable DOP).

There is a little more to it, but these are some very basic guidelines.

Jim

Jim Murphy
http://www.sqlwatchmen.com
@SQLMurph

mazzz SSCertifiable Points: 6139 More actions · Answer 3

Thank you both for your responses.

As far as I knew CXPACKET waits were specifically parallelism related, good to know I haven't been wrong all along.

I've left the office now so can't check on hyperthreading till tomorrow.

The issue I have is with a 3rd party application which recently had an upgrade onto a new server with two CPUs (the old server is the one with the cxpacket waits and single CPU). I was on holiday when this happened, but apparently some reports weren't working (due to a "lock not having enough memory"(1)) and the 3rd party asked for the 2nd CPU to be removed to fix the issue.

(1)I have hardly any information on what actually happened at the time and am trying to troubleshoot retrospectively... joy.

The CPU was removed, and the reports worked, but 2GB RAM were also added at the same time so that could have been the saviour. Things like page life expectancy and Available MBytes are still rather on the low side, so maybe it's a very memory hungry application...

The application is struggling quite a bit with just one CPU, and the 3rd party think setting MAXDOP to 1 will solve the issue if we add it back in. I'm just not convinced that parallelism is necessarily the issue here. The new server is also on dual core CPU but I see no CXPACKET waits like I did on the old one - EDIT: which is why I was wondering about total cores vs number of CPUs

The 3rd party seems to think that parallelism could cause the query to fail due to the two parts it is splitting into both trying to access the same object at the same time - that doesn't happen, surely?

I realise I'm rambling a bit here, but this is me after today :crazy:

------------------------------------------------------------------------
Bite-sized fiction (with added teeth) [/url]

Jim Murphy SSCertifiable Points: 6263 More actions · Answer 4

Ha! Another fun day in the sun.

Doesn't sound like a CPU issue to me. I have seen CPU deadlocks before, but only as a SS bug which was fixed by a hot fix. I have not seen pllsm paralyze itself by deadlocking with another db resource, like a table/index.

Memory appears to be the concern here. Even adding more ram, SS is probably not using more than 2GB unless it is 64 bit.

There are lots of things to look at from here. Are lock escalations disabled? This can cause a ton of memory to be consumed from locks. The low page life indicates that ram is too low so SS has a small buffer cache - again, going back to the 2GB limit on 32 but/standard edition.

I would say to enjoy some quality time with perfmon or a tool like Spotlight, etc.

Jim

Jim Murphy
http://www.sqlwatchmen.com
@SQLMurph

mazzz SSCertifiable Points: 6139 More actions · Answer 5

Jim Murphy (SQLWatchmen.com) (2/24/2011)

Ha! Another fun day in the sun.

I deal with a lot of 3rd party apps, so I have plenty of those!

Doesn't sound like a CPU issue to me. I have seen CPU deadlocks before, but only as a SS bug which was fixed by a hot fix. I have not seen pllsm paralyze itself by deadlocking with another db resource, like a table/index.

Memory appears to be the concern here. Even adding more ram, SS is probably not using more than 2GB unless it is 64 bit.

It is 64bit so no worries there.

There are lots of things to look at from here. Are lock escalations disabled? This can cause a ton of memory to be consumed from locks. The low page life indicates that ram is too low so SS has a small buffer cache - again, going back to the 2GB limit on 32 but/standard edition.

I'll checkon the lock escalations being disabled, but it is not something I have done and I doubt the vendor would have set that up - then again you never know.

I would say to enjoy some quality time with perfmon or a tool like Spotlight, etc.

Jim

I will be doing just that, as soon as my customers can get their urgent reports for the month out of the way now that the system works (albeit slowly), and hopeully we can test this next week. I'm almost looking forward to Monday, the curiosity is killing me!

Thans again for your time, Jim!

------------------------------------------------------------------------
Bite-sized fiction (with added teeth) [/url]

Jim Murphy SSCertifiable Points: 6263 More actions · Answer 6

Jim Murphy

SSCertifiable

Points: 6263

February 24, 2011 at 12:49 pm

#1291235

Glad to help.

Please do post the results here so we get a nice googlable thread with an answer.

Jim

Jim Murphy
http://www.sqlwatchmen.com
@SQLMurph

mazzz SSCertifiable Points: 6139 More actions · Answer 7

mazzz

SSCertifiable

Points: 6139

February 24, 2011 at 12:58 pm

#1291242

Oh, absolutely. Well, so long as I manage to get to the bottom of the issue!

------------------------------------------------------------------------
Bite-sized fiction (with added teeth) [/url]

MikaS Ten Centuries Points: 1210 More actions · Answer 8

Hi,

Parallelism can cause interesting issues in your environment. I've seen it cause rather severe blocking (and Im quite certain it caused deadlock situation once) in some databases myself, even though these servers had been correctly configured for parallelism and the hardware has been properly set up (disabling HT etc).

While I've seen numerous improvements in SQL Server engine from the earlier versions, there's very little SQL Server can do if the database or the query design itself is poor. Sure it'll try it's best trying to optimize the queries, but there's only so much it can do 🙂

It'd be nice to take advantage of the multiple CPU cores we can get these days for affordable prices, but unfortunately I find myself often setting MAXDOP to 1 because all the benefits of using multiple CPUs disappear under loads of CXPACKETs.

Mika Sutinen, Senior DBA
@SQLFinn on Twitter
Blogging at SQLStarters[/url]

Jim Murphy SSCertifiable Points: 6263 More actions · Answer 9

Parallelism can indeed cause CPU blocking; so SS won't schedule another task on a CPU which is waiting to sync with the other CPUs. meanwhile, that CPU is at 0%. Nice waste of CPU! That said, you will ALWAYS have cxpacket waits when parallelism is enabled because SS doesn't do a perfect job splitting the query into equal parts and each individual part may get blocked by locks or any other contention (disk, nic, data not in cache like the other streams, and on and on). So there is no need to fear the cxpacket wait type UNLESS parallelism is configured to it's default setting of zero (use all CPUs for parallelism) which hinders concurrency on systems with a nontrivial load.

So it takes a little investigation to determine if your cxpacket waits are a normal part of parallelism and are not hindering the scheduling of additional queries, or if cxpacket is indicating poor server configuration options depending on the number of CPUs available, the load of the CPUs and the amount of concurrent activity. Frankly, the maxdop setting should be fairly low or disabled on a heavy oltp system, but possibly raised higher on a reporting system with fewer users.

Having said that, I have seen cases where a complex query causes a CPU deadlock and hung the system. SS bug. But this was way back in the SQL 2000 days. A work around (besides a hot fix) was to set maxdop to 1 for that query and everything worked. This appeared after the SQL programmer modified the already complex query which worked with maxdop>1 to add another joint to an additional innocent table. Upon releasing this new stored proc, the query hung when executed. Setting maxdop=1 worked around the issue.

Jim

Jim Murphy
http://www.sqlwatchmen.com
@SQLMurph

mazzz SSCertifiable Points: 6139 More actions · Answer 10

Hi folks, thanks again for the discussion.

Well, I'm back. I don't think the extra CPU/parallelism has anything to do with it.

I've been running some tests on the test machine and managed to reproduce the error, which was 1204 ("The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions."

In Profiler I saw an absolute ton of 1222 errors reported from spid 22 ("Lock request timeout period exceeded")

I monitored sys.dm_tran_locks via:

select request_session_id, COUNT(*)

from sys.dm_tran_locks

group by request_session_id

and over one period a spid running one stored proc had a count of 3,800,000.

I ran sp_lock and it returned 1,500,000 rows (all key locks)

Afte reading this article:

http://blogs.msdn.com/b/grahamk/archive/2009/06/02/sql-server-2005-error-701-severity-17-state-193.aspx

I checked DBCC MEMORYSTATUS and saw these:



OBJECTSTORE_LOCK_MANAGER (Total)         KB

---------------------------------------- -----------

VM Reserved                              4096

VM Committed                             4096

Locked Pages Allocated                   0

SM Reserved                              0

SM Committed                             0

SinglePage Allocator                     442904

MultiPage Allocator                      0

(7 row(s) affected)

OBJECTSTORE_LOCK_MANAGER (node 0)        KB

---------------------------------------- -----------

VM Reserved                              4096

VM Committed                             4096

Locked Pages Allocated                   0

SM Reserved                              0

SM Committed                             0

SinglePage Allocator                     442880

MultiPage Allocator                      0

(7 row(s) affected)

So I'm guessing memory is the issue here, rather than the parallel running of the query. These locks and tran_lock counts are excessive, aren't they? Am I in my rights to ask the 3rd party vendor to tune that procedure?

------------------------------------------------------------------------
Bite-sized fiction (with added teeth) [/url]

UMG Developer SSChampion Points: 13482 More actions · Answer 11

mazzz (2/25/2011)

I've been running some tests on the test machine and managed to reproduce the error, which was 1204 ("The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions."

So I'm guessing memory is the issue here, rather than the parallel running of the query. These locks and tran_lock counts are excessive, aren't they? Am I in my rights to ask the 3rd party vendor to tune that procedure?

It sounds like maybe lock escalation is failing, or has been disabled at the table level.

Have you checked to see if the locks option has been changed from the default of 0?

EXEC sys.sp_configure @configname = 'locks'

Are there a lot of active sessions when this happens? Long running transactions? (Maybe a transaction is one COMMIT short of being committed. 😉

mazzz SSCertifiable Points: 6139 More actions · Answer 12

Thanks for responding. I think it's still the default? config_value presumably?



name minimum    maximum      config_value   run_value

locks 5000         2147483647  0                  0

There's only one active session, all the locks, tran_locks are coming from one SPID.

------------------------------------------------------------------------
Bite-sized fiction (with added teeth) [/url]

UMG Developer SSChampion Points: 13482 More actions · Answer 13

Yes, a config_value of 0 is the default.

SPID 22 is probably a system session, so it shouldn't be causing the problem. (SPIDs under 50 are almost always system processes.)

I think you need to run a trace and capture what the one active session is doing that causes all the locks. (Using Adam Machanic's WhoIsActive can be helpful while things are running to get a snapshot and execution plans.)

mazzz SSCertifiable Points: 6139 More actions · Answer 14

UMG Developer (2/25/2011)

Yes, a config_value of 0 is the default.

SPID 22 is probably a system session, so it shouldn't be causing the problem. (SPIDs under 50 are almost always system processes.)

Thanks, I am aware of the system spids 🙂

I think you need to run a trace and capture what the one active session is doing that causes all the locks. (Using Adam Machanic's WhoIsActive can be helpful while things are running to get a snapshot and execution plans.)

I have already identified the offending session. It's one stored procedure that does quite a lot of things, and runs for a good 2-3 hours to produce some sort of report. (I've only had a quick look, it's a 3rd party application)

My question at this point is, do I ask the software vendor to take a good look at their procedure? Or could something be wrong at system level that could be causing this excessive locking? We've established that lock escalation isn't disabled - anything else I can check?

Unfortunately I can't spend much time myself looking into the query itself, my boss would not be pleased with not having anyone to charge my time to, and the users of the app would not want to take on the cost either!

------------------------------------------------------------------------
Bite-sized fiction (with added teeth) [/url]