Join Types & Execution Plans

  • Hi,

    I have spent some time recently tweaking a pretty intensive procedure, one thing i noticed that raised a question is with regards to Hash matches in the execution plan, do these always require space to be used on tempdb (and then given back) for the hash table, or does the execution plan ever do it in memory.

    Does anyone have a link for some detailed information regarding all the join types/Execution methods?

    Thanks,

    Nic

  • Hash matches happen in memory, unless the tash table gets too big. In that case, it spils to disk. It's called a hash bailout. There's an event in profiler that records this.

    SQL 2000 Books Online


    In-Memory Hash Join

    The hash join first scans or computes the entire build input and then builds a hash table in memory. Each row is inserted into a hash bucket depending on the hash value computed for the hash key. If the entire build input is smaller than the available memory, all rows can be inserted into the hash table. This build phase is followed by the probe phase. The entire probe input is scanned or computed one row at a time, and for each probe row, the hash key's value is computed, the corresponding hash bucket is scanned, and the matches are produced.

    Grace Hash Join

    If the build input does not fit in memory, a hash join proceeds in several steps. Each step has a build phase and probe phase. Initially, the entire build and probe inputs are consumed and partitioned (using a hash function on the hash keys) into multiple files. The number of such files is called the partitioning fan-out. Using the hash function on the hash keys guarantees that any two joining records must be in the same pair of files. Therefore, the task of joining two large inputs has been reduced to multiple, but smaller, instances of the same tasks. The hash join is then applied to each pair of partitioned files.

    Recursive Hash Join

    If the build input is so large that inputs for a standard external merge sorts would require multiple merge levels, multiple partitioning steps and multiple partitioning levels are required. If only some of the partitions are large, additional partitioning steps are used for only those specific partitions. In order to make all partitioning steps as fast as possible, large, asynchronous I/O operations are used so that a single thread can keep multiple disk drives busy.

    As for the multiple join types, there's a summary here[/url] (Blatent self-promotion) with links to far more in depth articles. If you have questions after, please post them here.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Gail,

    Thats answered the question, and given me some additional learning material.

    Greatly appreciated,

    Nic

  • You might also check out this blog:http://blogs.msdn.com/craigfr/

    Great information in there.

  • Thanks Steve,

    Nic

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply