Join strategy

  • I've been called to take a look at a server that's performing poorly on some queries. First of all, let's clarify one thing: it's an Oracle server, so I have to admit I know nearly nothing about it. There was an Oracle DBA working on the problem and he said that tha main issue is lack of CPU.

    Looking at the execution plan I noticed that the optimizer was going for HASH joins, probably because the number of rows in the two tables were very different (100K VS 90 million). Since CPU seems to be the bottleneck, I suggested him to try to turn the HASH join into a MERGE join, that would be less CPU-intensive, changing the sort order of the index.

    He laughed out loud at me and he said that HASH join is less CPU-intensive than MERGE join. I was unsure, so I set up some tests:

    -- CREATE TEST TABLES

    IF OBJECT_ID('tempdb..#test1') IS NOT NULL DROP TABLE #test1

    CREATE TABLE #Test1 (

    num int

    )

    CREATE CLUSTERED INDEX IX_num ON #Test1(num)

    INSERT INTO #Test1

    SELECT TOP 5000000 ROW_NUMBER() OVER (ORDER BY(SELECT NULL))

    FROM Tally AS A

    CROSS JOIN Tally AS B

    IF OBJECT_ID('tempdb..#test2') IS NOT NULL DROP TABLE #test2

    CREATE TABLE #Test2 (

    num int

    )

    CREATE CLUSTERED INDEX IX_num ON #Test2(num)

    INSERT INTO #Test2

    SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY(SELECT NULL))

    FROM Tally AS A

    CROSS JOIN Tally AS B

    SET STATISTICS TIME ON

    SET STATISTICS IO ON

    IF OBJECT_ID('tempdb..#res') IS NOT NULL DROP TABLE #res

    SELECT A.num

    INTO #res

    FROM #Test1 AS A

    INNER HASH JOIN #Test2 AS B

    ON A.num = B.num

    -- SQL Server Execution Times:

    -- CPU time = 21406 ms, elapsed time = 2113 ms.

    SET STATISTICS TIME ON

    SET STATISTICS IO ON

    IF OBJECT_ID('tempdb..#res') IS NOT NULL DROP TABLE #res

    SELECT A.num

    INTO #res

    FROM #Test1 AS A

    INNER MERGE JOIN #Test2 AS B

    ON A.num = B.num

    -- SQL Server Execution Times:

    -- CPU time = 7752 ms, elapsed time = 15467 ms.

    So, which one uses less CPU??

    -- Gianluca Sartori

  • Hey Gianluca,

    The answer is that it depends.

    First, though, let me compare my results to yours.

    Using HASH, I get 6s CPU and 26s elapsed.

    Using MERGE, I get 1.5s CPU and 1.6s elapsed.

    I am quite surprised that you managed to get such different results for merge - though you should probably disable parallelism with OPTION (MAXDOP 1) to (a) get comparable results; and (b) ensure that the CPU times reported with STATISTICS TIME are accurate.

    I'm not going to go on about the tests too much (clearing caches and so on) because there is no answer to this question. Which one uses more CPU depends on the data - width, distribution, cardinality, indexes, and the number of rows feeding into each input of the join.

    In your test, with simple consecutive integers, and input sets that are not so very different in size, about the only way HASH can ever 'win' is if it triggers parallelism where the MERGE does not - and by 'win' here I mean in terms of elapsed time.

    Not specifying the clustered index as UNIQUE on the outer input to the MERGE does hamper it very slightly (your test produces a many-to-many MERGE, which requires a worktable for potential rewinds) - but in the event there are no duplicates in either input set, so no rewinds are required.

    The HASH is the wrong choice for this data set, because it has the overhead of creating the hash buckets and values from the build input rows, and then computing hash values and probing into the buckets on the probe input. None of this extra work adds any value in this case, because we are dealing with unique integers.

    Another thing, while we are here, is to note that INNER HASH|MERGE JOIN also forces the order of the join: from table A to table B. In fact, the reverse is optimal for both join types: table B should be the outer input since is contains fewer rows. This is important for both, but especially so for the HASH, which will be building the hash structures on the larger table, and probing from the smaller input. To be fair, you should replace the join hint with an OPTION (HASH|MERGE JOIN) - this does not force the join order.

    If you don't specify a join type, SQL Server will pick the HASH - it thinks that the many-to-many merge would be more expensive. If we let the optimizer know that table B contains unique values, it correctly chooses a MERGE plan, with table B as the outer input.

    MERGE is generaly less efficient than HASH with very large inputs, when parallelism is used a high DOP, when a many-many operation is required, or if pre-sorted order on one or both inputs is not available (producing a Sort iterator). A MERGE join does not require a memory grant, but may use a worktable (many-many merge only).

    HASH excels at processing very large inputs, uses parallelism extremely well, and handles large data types (particularly long strings) much better than MERGE. It benefits from one input being much smaller than the other, but this is not essential. A large HASH might have to wait for a memory grant, though - and might spill to disk if the estimated memory grant requirement turns out to be much too small at run time.

    The behaviour of the two join types is so very different - and it depends on the data so much - that it is impossible to state that one or the other 'uses less CPU'.

    Paul

  • I dont know why your numbers are so strange. Perhaps your system was heavily loaded ?

    I ran the same test on my unloaded 2-CPU laptop and I got the following reults:

    -- HASH JOIN

    -- SQL Server Execution Times:

    -- CPU time = 1201 ms, elapsed time = 684 ms.

    -- MERGE JOIN

    --SQL Server Execution Times:

    -- CPU time = 765 ms, elapsed time = 556 ms.

    So, in this case it seems that MERGE is about twice as CPU efficient as HASH.

    HASH also uses much more memory.

    I would say that as a general rule MERGE is more efficient than HASH, but the difference is usually not very big.

    I have no idea how this translates to Oracle...

    /SG

  • First of all, thank you for you reply, Paul.

    Very informative and precise, as usual.

    I forgot to mention that I ran these tests on my dev server (8 dual core CPUs, 16 Gb RAM, attached to a old slow SAN) and not on my laptop. This is probably why we get different exec times.

    Taking parallelism into consideration, everything makes much more sense. That Oracle server uses 3 quad-core CPUs, but the consultant is telling us we should have at least 12. Wow.

    I repeated the tests including you recommendations (join order, maxdop, unique index) and, without any other hint, the optimizer decides to go for a HASH join. If I force MAXDOP=1 I get a MERGE join.

    These are the statistics I get:

    MERGE JOIN: CPU time = 953 ms, elapsed time = 972 ms.

    HASH JOIN MAXDOP=1: CPU time = 4547 ms, elapsed time = 4563 ms.

    HASH JOIN MAXDOP=0: CPU time = 6282 ms, elapsed time = 992 ms.

    The scenario here is a quite large (approx. 1TB) database holding BI data. There are no cubes defined, everything is strictly relational. High CPU load is due to all the aggregations in the queries generated by the reporting engine, so I supposed that trying to force MERGE joins would have left CPU to aggregations, taking other free resources (IO is quite low, at least not problematic).

    Thank you for clarifying this could not be the case. What I would have given it a try, if I were the Oracle DBA.

    I think this is a very hard environment to tune: large tables, small hardware and no way to change the way queries are generated.

    The only chance I see is working on the physical layer: materialized views and query rewrites, buffer cache and that kind of stuff.

    -- Gianluca Sartori

  • Stefan_G (4/13/2010)


    I dont know why your numbers are so strange. Perhaps your system was heavily loaded ?

    I ran the same test on my unloaded 2-CPU laptop and I got the following reults:

    -- HASH JOIN

    -- SQL Server Execution Times:

    -- CPU time = 1201 ms, elapsed time = 684 ms.

    -- MERGE JOIN

    --SQL Server Execution Times:

    -- CPU time = 765 ms, elapsed time = 556 ms.

    So, in this case it seems that MERGE is about twice as CPU efficient as HASH.

    HASH also uses much more memory.

    I would say that as a general rule MERGE is more efficient than HASH, but the difference is usually not very big.

    I have no idea how this translates to Oracle...

    /SG

    Thank you for your reply, Stefan.

    My test server is attached to a slow SAN, that could be the bottleneck.

    The main problem here is that I know very little about Oracle. And turns out that I know very little about SQL Server as well...:blush:

    -- Gianluca Sartori

  • The general rule of thumb is (and always has been, as far back as I can recall) is that as long as at least one side has unique matching, Merge Joins are the most efficient, IF the data is already sorted in the necessary order, otherwise Hash join is the fastest, IF the appropriately sized hash table can be allocated efficiently in memory.

    Of course, they are called "general Rules of Thumb" (or ROTs) for a reason, your mileage may vary for various causes as Paul points out.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (4/13/2010)


    The general rule of thumb is (and always has been, as far back as I can recall) is that as long as at least one side has unique matching, Merge Joins are the most efficient, IF the data is already sorted in the necessary order, otherwise Hash join is the fastest, IF the appropriately sized hash table can be allocated efficiently in memory.

    I would agree with that ROT - for medium-sized inputs, and especially if sorted output is important to the overall plan.

  • Gianluca Sartori (4/13/2010)


    I repeated the tests including you recommendations (join order, maxdop, unique index) and, without any other hint, the optimizer decides to go for a HASH join. If I force MAXDOP=1 I get a MERGE join.

    Yep. The optimizer costs a parallel MERGE join quite high - if the sort order must be preserved. While a MERGE does preserve sort order in general, enforcing this in a parallel plan requires a 'merging' exchange (gather streams or repartition streams) - meaning that the parallelism iterator will have an ORDER BY: attribute.

    When preserving order, the exchange must process parallel threads in order - one packet of rows from thread 1, one packet of rows from thread 2...and so on. If one of the threads is slower than the others (perhaps because the scheduler it is running on has other work to do), it slows down the whole operation.

    Another factor is that bitmap filters (an important optimisation in many data-warehouse queries) are only supported on a MERGE if there is a Sort iterator immediately prior to the MERGE's outer input. By default, MERGE is non-blocking (HASH is blocking on its build input while it constructs the hash table) so a bitmap cannot be built in advance.

    The scenario here is a quite large (approx. 1TB) database holding BI data. There are no cubes defined, everything is strictly relational.

    Hash join is pretty much always the best option with large warehouse queries - bitmap filters and scalable parallelism being the main reasons. HASH is also scarily efficient when the build input is small, and the probe contains many duplicates.

    High CPU load is due to all the aggregations in the queries generated by the reporting engine...materialized views and query rewrites, buffer cache and that kind of stuff.

    I would absolutely agree that indexed views are probably the single best optimization you could make here.

  • Paul, I find your knowledge of the internal workings of SQL Server very impressive.

    How do you know all this internal stuff ?

  • Stefan_G (4/14/2010)


    Paul, I find your knowledge of the internal workings of SQL Server very impressive.

    How do you know all this internal stuff ?

    Thanks very much, Stefan. For anyone interested in this sort of thing, there's a lot of very good stuff blogged by the various SQL Server teams, and many of the same people have contributed to books published by Microsoft Press.

    Just to give one example that is relevant to this thread, take a look at http://blogs.msdn.com/craigfr/archive/2007/04/17/parallel-query-execution-presentation.aspx

  • Paul White NZ (4/14/2010)


    Stefan_G (4/14/2010)


    Paul, I find your knowledge of the internal workings of SQL Server very impressive.

    How do you know all this internal stuff ?

    Thanks very much, Stefan. For anyone interested in this sort of thing, there's a lot of very good stuff blogged by the various SQL Server teams, and many of the same people have contributed to books published by Microsoft Press.

    Just to give one example that is relevant to this thread, take a look at http://blogs.msdn.com/craigfr/archive/2007/04/17/parallel-query-execution-presentation.aspx

    Don't try to fool us, Emperor Paulpatine, we know that all this expertise comes from the dark side of the force... πŸ˜›

    As a side note, this is my first (answered) question ever on SSC! Thanks for answering, Senator!

    -- Gianluca Sartori

  • Stefan_G (4/14/2010)


    Paul, I find your knowledge of the internal workings of SQL Server very impressive.

    How do you know all this internal stuff ?

    Some people don't need sleep, Stefan πŸ˜‰

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply