Parallel execution killing server sometimes

  • I have not done enough research on this yet to be too precise, but I think I know where our server hangs are coming from.

    Once in a while, maybe once or twice a week, our main data warehouse server gets so slow it appears hung. You can't connect with Query analyzer, jobs abort due to timeouts, etc. It is the SQL Server process that is buried.

    However, nothing is "wrong" -- the current queries complete, and all returns to normal. Brief looks at statistics show no obvious tuning problems-- low page rates, disks busy but not saturated, just lots and lots of CPU time being consumed. And it is a busy server.

    However, at other times the server is also completely busy, but is responsive -- you can connect, your query runs slow but it doesn't appear (to the outside) dead.

    The common threat here (pun intended) seems to be that when it appears more hung, many of the processes have LARGE numbers of parallel execution threads (e.g. multiple rows in an SP_WHO2). I saw one yesterday that a rough count probably had 16, and the query was being run twice so about 32 threads were in two queries.

    THis is a quad processor machine, and we have not changed these parameters. we're set at "Max worker threads" of 255 and use all processors for parallel execution. I've experimented a bit with the max parallel hint and usually things run slower if I reduce the parallelism on isolated queries.

    Is the "max worker thread" perhaps the issue, that I get a bunch of parllel threads and all together we hit this limit so nothing else can start?

    Or is it a good idea to limit how many processors are used for parallel execution? (What exactly does this mean, the number used on one SPID max? The number used at any given time among all SPIDs? (If the latter what do the other processors do, nothing in SQL?))

    The BOL just repeates the label on the dialog box, it doesn't really give any insight into how these function.

  • I've experienced as well SOMETIMES (and this is the problem, that it's not always) a paralell running query is slower the a query running on a single processor (ie. using the MAXDOP=1 hint)

    This is a known bug/issue in SQLServer.

    You have to check the number of context switches and if it's high then you could experience tu reduce the max degree of paralellism to 1.

    You will still use your 4 procs for the concurrent queries but you will use just one per query.

    I think it's worth to try it out

    Bye

    Gabor



    Bye
    Gabor

  • I will agree with Gabor. Reduce the degree of parallelism. We have seen this issue quite often and it has pretty much triggered post SP3 and probably seems to be bug/performance enhancement.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply