Max degree of parallelism

Question

Post reply

Max degree of parallelism

Roy Ernest

SSC-Dedicated

Points: 38780
More actions
July 26, 2011 at 11:00 am

#254346

Hi All,

I was wondering what you would advice the value for max degree of parallesim on the server level on an OLTP server with 24 logical CPUs and 256 GB RAM. It is a pretty busy server. Averaging 13k batch request per second.

Non NUMA architecture.

-Roy

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 1

On a personal level, I'd probably start at serverwide MAXDOP 8 with a cost setting of ~10-15 and then monitor the heck out of it looking for the pain points. I assume you're moving an existing server/instance onto the new hardware. What did the old machine look like?

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Paul White SSC Guru Points: 150457 More actions · Answer 2

Roy Ernest (7/26/2011)

I was wondering what you would advice the value for max degree of parallesim on the server level on an OLTP server with 24 logical CPUs and 256 GB RAM. It is a pretty busy server. Averaging 13k batch request per second. Non NUMA architecture.

If it is *truly* an OLTP workload - primarily nested loops joins, very few sorts, single-row lookups, optimal indexes, you might choose server DOP 1. This assumes you have enough query traffic to keep 24 workers busy (i.e. RUNNING or RUNNABLE) most of the time. Any odd queries that would benefit from parallelism, during quieter times, could use an explicit MAXDOP > 1 query hint.

It's tough to give blanket recommendations though, as so much depends on your particular workload. Other people are happy to set server MAXDOP to 2, 4, or even higher, perhaps along with an elevated cost threshold. 35-50 is a popular number, but not necessarily with me 🙂

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Roy Ernest SSC-Dedicated Points: 38780 More actions · Answer 3

This is on an existing Hardware. When we moved into this server, we had two queries that was having parallel plan. I gave hint for just those two. During the past 2 years I started noticing the CXPACKET wait stats going up. The DBA usually review all objects that are released to this server. (We do check for missing indexes, race condition and even object qualifiers). Somehow I missed some SPs that had parallelism and now I see a higher wait on CXPACKET. I was able to identify all queries that have parallelism. It is just a handful. I can explicitly specify MAXDOP for these. But I am not sure that is the way to go.

The max CPU usage is around 40% when it is really busy. usually it stays around 15%.

-Roy

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 4

Let me point out something very important... I've never tuned a server with anywhere near that much traffic.

That being said here are a few ideas :

It's a little script I wrote to find out if the current cost threshold might be too low (causing extra cpackets waits). While we're at it, what does the cpacket waits look like (http://www.sqlskills.com/BLOGS/PAUL/post/Wait-statistics-or-please-tell-me-where-it-hurts.aspx)?

SET IMPLICIT_TRANSACTIONS OFF



GO

CREATE TABLE #configs

(

	name nvarchar(35),

	minimum INT,

	maximum INT,

	config_value INT,

	run_value INT

)

GO

EXEC sp_configure 'show advanced options', 1;

GO

reconfigure;

GO

INSERT INTO #configs (name, minimum, maximum, config_value, run_value)

EXEC sp_configure 'cost threshold for parallelism'

GO

EXEC sp_configure 'show advanced options', 0;

GO

reconfigure; 

GO

SELECT

    optz.counter

  , optz.occurrence

  , CONVERT(DECIMAL(18,2), optz.value) AS AvgValue

  , conf.name

  , conf.config_value

  , conf.run_value

  , Uptime.DaysUptime AS [Days Server UPTIME & Last Stats Reset]

  , CASE WHEN Uptime.DaysUptime < 45 THEN 'You may not have very meaningful stats because of a recent restart' ELSE NULL END AS [Stats Warning]

  , CASE WHEN optz.value > conf.config_value THEN 'Cost threshold for parallelism might be too low' ELSE NULL END AS [Cost Threshold Warning]

  , CASE WHEN conf.run_value <> conf.config_value THEN 'Server needs to be restarted for the setting to take effect' ELSE NULL END [Restart Warning]

FROM

    sys.dm_exec_query_optimizer_info optz

	CROSS JOIN #configs conf

	OUTER APPLY	(

				SELECT

					CONVERT(DECIMAL(18 , 2) , DATEDIFF(hh , create_date , GETDATE()) / 24.0) AS DaysUptime

				FROM

					sys.databases

				WHERE

					name = 'tempdb'

   				) Uptime

WHERE

    optz.counter = 'final cost'

GO

DROP TABLE #configs

I've also just toyed for about 1 hour to tweak this script. It finds the plan that require parallelism to run. Using the temp table you can get weighted averages. It goes without saying that this is a really heavy query to run. It takes over 1 minute to process 10K plans on our little prod server (2 xeon cpus, 4 Gb ram).

This version will get you the AVG weigthed estimated cost of the queries with parallelism. You can compare that with the previous one which is server wide to make a really good guess for the "correct" cost threshold to set on the server.

A final note, as much as I tried I don't have any plans with any info in DegreeOfParallelism. I'm 99.9% certain my code is correct but that's not 100%.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;



WITH XMLNAMESPACES

   (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')

SELECT

        query_plan AS CompleteQueryPlan,

        n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS StatementText,

        n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS StatementOptimizationLevel,

        n.value('(@StatementOptmEarlyAbortReason)[1]', 'VARCHAR(25)') AS StatementOptmEarlyAbortReason,

        n.value('(@StatementSubTreeCost)[1]', 'DECIMAL(18, 6)') AS StatementSubTreeCost,

        n.value('(QueryPlan/@CompileCPU)[1]', 'DECIMAL(18, 0)') AS CompileCPU,

        n.value('(QueryPlan/@CompileTime)[1]', 'DECIMAL(18, 0)') AS CompileTime,

        n.value('(QueryPlan/@CompileMemory)[1]', 'DECIMAL(18, 0)') AS CompileMemory,

        n.value('(QueryPlan/@DegreeOfParallelism)[1]', 'DECIMAL(18, 0)') AS DegreeOfParallelism,

        n.query('.').exist('//MissingIndexes') AS HasMissingIndexes,

        n.query('.') AS ParallelSubTreeXML,

        ecp.usecounts,

        ecp.size_in_bytes

INTO #X

FROM sys.dm_exec_cached_plans AS ecp

CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp

CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n)

WHERE  n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1 

SELECT * FROM #X

SELECT SUM(usecounts * DegreeOfParallelism) / SUM(usecounts) AS Weigthed_Avg_DOP,

SUM(usecounts * StatementSubTreeCost) / SUM(usecounts) AS Weigthed_Avg_SubTreeCost

 FROM #X

Johan Bijnens SSC Guru Points: 134873 More actions · Answer 5

Maybe check out the replies by Grant on another thread can get you started:

http://qa.sqlservercentral.com/Forums/FindPost1123444.aspx

I did Grant wrong with this mentioning to read his reply :blush:

I'm sorry Grant :crying:

I should have mentioned "replies" ... I corrected it.

Priceless info Grant.:smooooth: Thank you once more.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 6

Ninja's_RGR'us

SSC Guru

Points: 294069

July 26, 2011 at 1:32 pm

#1358782

ALZDBA (7/26/2011)

Maybe check out the reply by Grant on another thread can get you started:

http://qa.sqlservercentral.com/Forums/FindPost1123444.aspx

Yup, I "gut" my script inspiration after that thread. Very nice info.

Roy Ernest SSC-Dedicated Points: 38780 More actions · Answer 7

Your second query is still running. But your first query here is the output.

counter final cost

occurrence 3189815

AvgValue 2.77

name cost threshold for parallelism

config_value 5

run_value 5

Days Server UPTIME & Last Stats Reset 167.50

Stats Warning NULL

Cost Threshold Warning NULL

Restart Warning NULL

And here is the top 10 Wait type stats.

WAIT TYPE PERCENTAGE

CXPACKET 57.72

LATCH_EX 11.88

CMEMTHREAD 4.74

SOS_SCHEDULER_YIELD 4.71

ASYNC_NETWORK_IO 1172668.81 3.67 82.72

MSQL_XP 916031.63 2.86

PREEMPTIVE_OS_GETPROCADDRESS 2.86

OLEDB 2.02

TRACEWRITE 2.01

BACKUPBUFFER 1.19

-Roy

Johan Bijnens SSC Guru Points: 134873 More actions · Answer 8

See what happens when you rise the cost threshold for parallisme and be surprised, like I was.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 9

Roy Ernest (7/26/2011)

Your second query is still running. But your first query here is the output.

counter final cost

occurrence 3189815

AvgValue 2.77

name cost threshold for parallelism

config_value 5

run_value 5

Days Server UPTIME & Last Stats Reset 167.50

Stats Warning NULL

Cost Threshold Warning NULL

Restart Warning NULL

And here is the top 10 Wait type stats.

WAIT TYPE PERCENTAGE

CXPACKET 57.72

LATCH_EX 11.88

CMEMTHREAD 4.74

SOS_SCHEDULER_YIELD 4.71

ASYNC_NETWORK_IO 1172668.81 3.67 82.72

MSQL_XP 916031.63 2.86

PREEMPTIVE_OS_GETPROCADDRESS 2.86

OLEDB 2.02

TRACEWRITE 2.01

BACKUPBUFFER 1.19

What's the avg wait time in ms for cxpackets?

BTW how come you got 57% for cxpackets and 82.72% for async_network_IO???? That doesn't seem to add up really well!

Here's the modified version I use to include the avgs :



WITH Waits AS

    (SELECT

        wait_type,

        wait_time_ms / 1000.0 AS WaitS,

        (wait_time_ms - signal_wait_time_ms) / 1000.0 AS ResourceS,

        signal_wait_time_ms / 1000.0 AS SignalS,

        waiting_tasks_count AS WaitCount,

        100.0 * wait_time_ms / SUM (wait_time_ms) OVER() AS Percentage,

        ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS RowNum

    FROM sys.dm_os_wait_stats

    WHERE wait_type NOT IN (

        'CLR_SEMAPHORE', 'LAZYWRITER_SLEEP', 'RESOURCE_QUEUE', 'SLEEP_TASK',

        'SLEEP_SYSTEMTASK', 'SQLTRACE_BUFFER_FLUSH', 'WAITFOR', 'LOGMGR_QUEUE',

        'CHECKPOINT_QUEUE', 'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BROKER_TO_FLUSH',

        'BROKER_TASK_STOP', 'CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT', 'DISPATCHER_QUEUE_SEMAPHORE',

        'FT_IFTS_SCHEDULER_IDLE_WAIT', 'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN', 'BROKER_EVENTHANDLER',

        'TRACEWRITE', 'FT_IFTSHC_MUTEX', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', 'BROKER_RECEIVE_WAITFOR')

     )

SELECT

     W1.wait_type AS WaitType,

     CAST (W1.WaitS AS DECIMAL(14, 2)) AS Wait_S,

     CAST (W1.WaitS * 1000 / W1.WaitCount AS DECIMAL(14, 2)) AS AvgWait_MS,

     CAST (W1.ResourceS AS DECIMAL(14, 2)) AS Resource_S,

     CAST (W1.SignalS AS DECIMAL(14, 2)) AS Signal_S,

     W1.WaitCount AS WaitCount,

     CAST (W1.Percentage AS DECIMAL(4, 2)) AS Percentage

FROM Waits AS W1

INNER JOIN Waits AS W2

     ON W2.RowNum <= W1.RowNum

GROUP BY W1.RowNum, W1.wait_type, W1.WaitS, W1.ResourceS, W1.SignalS, W1.WaitCount, W1.Percentage

HAVING SUM (W2.Percentage) - W1.Percentage < 97 -- percentage threshold

		AND W1.WaitCount > 0;

GO

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 10

Ninja's_RGR'us

SSC Guru

Points: 294069

July 26, 2011 at 1:44 pm

#1358793

ALZDBA (7/26/2011)

See what happens when you rise the cost threshold for parallisme and be surprised, like I was.

Good or bad surprised? What and how did you trace the changes?

Roy Ernest SSC-Dedicated Points: 38780 More actions · Answer 11

I thought I edited it properly. But I guess not.

wait_type wait_time_s pct

CXPACKET 18468243.63 57.72

LATCH_EX 3802397.93 11.88

CMEMTHREAD 1516464.84 4.74

SOS_SCHEDULER_YIELD 1508456.11 4.71

ASYNC_NETWORK_IO 1172668.81 3.67

MSQL_XP 916031.63 2.86

PREEMPTIVE_OS_GETPROCADDRESS 915633.18 2.86

OLEDB 646687.82 2.02

TRACEWRITE 641578.03 2.01

BACKUPBUFFER 381806.38 1.19

Your Second query ran in 19 min.

Weigthed_Avg_DOP Weigthed_Avg_SubTreeCost

NULL 17.060217

Total of 117 rows. Two out of the 117 has timeout as the reason for StatementOptmEarlyAbortReason. There are quite a few that has missing indexes. DegreeOfParallelism column is NULL. But the missing indexes SPs I have to look at and see why it got missed at the time of review.

EDIT (how come you got 57% for cxpackets and 82.72% for async_network_IO) The 82.72% was actually the Running_pct column. The query I use to get the WAIT was done by Glenn Berry. One of his 5 Super DMV queries.

-Roy

Paul White SSC Guru Points: 150457 More actions · Answer 12

Hey Roy,

I don't think there's much to choose here between DOP 1 and query hints on the very few queries that would benefit, versus DOP 4 (say) and a higher cost threshold. The effect is likely to be very similar, from what you've told us. If I could see your system and queries for a few hours, I might sway one way or the other, but as it is I say 'meh'. If you have a number of queries that would meet a much higher cost threshold, then by all means go the DOP + cost route. If you have a stable system now where almost all queries are simple and run acceptably fast, I might not take the (small) risk. Only you can really make this decision, though as I say, I don't think there's too much in it either way.

I do want to respond to the comments about CXPACKET though. Very few people understand what CXPACKET truly means (and I do not number storage engine experts among them). For almost all intents and purposes, CXPACKET should be ignored. It is a consequence of running a parallel query, nothing more. People do tend to see CXPACKET 'waits' and assume they mean something similar to PAGEIO_LATCH* or whatever - but they really really don't. Yes, if you run fewer parallel queries, and/or fewer parallel regions, you will see CXPACKET numbers decline. No surprises there. It does not mean your server has become more efficient, or that response time has improved, or much of anything else.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 13

Edit : Saw Paul's answer after I posted. I'd deffer to his experience rather than my very limited knowledge with this.

I'm sure you're on top of this but stale stats will increase cxpackets as well.

From what you've sent so far I probabely wouldn't go to 30-50 for the cost threshold, I'd actually try something much lower like 10-15 so to NOT penalize the queries that are using it ATM.

You seem to have either very very low cost queries or relatively cheap "big" queries.

That's why I wanted you to see the real average and then the average of bigger queries. Of course if you use recompile hints you're not going to see those in the cache. @ 13K queries / sec, saving the actual plan with stats is obviously out of the question :w00t:.

The other question is also how long do the plans stay in the cache... that would be another good info to have in those queries.

That bugs me that you don't see anything for the DegreeOfParallelism either. Looks like there's something I don't understand in there (either the caching or the XML query). :unsure:

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 14

SQLkiwi (7/26/2011)

I do want to respond to the comments about CXPACKET though. Very few people understand what CXPACKET truly means (and I do not number storage engine experts among them). For almost all intents and purposes, CXPACKET should be ignored. It is a consequence of running a parallel query, nothing more. People do tend to see CXPACKET 'waits' and assume they mean something similar to PAGEIO_LATCH* or whatever - but they really really don't. Yes, if you run fewer parallel queries, and/or fewer parallel regions, you will see CXPACKET numbers decline. No surprises there. It does not mean your server has become more efficient, or that response time has improved, or much of anything else.

I'm glad you clarified this Paul. You'd made the comment before on another thread and I had become concerned I'd missed something hugely important. I'd been meaning to do some research but you just eased my mind.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA