Query plan question

  • All,

    I'm trying to increase my knowledge of query plans on large databases. This question is purely a learning exercise for me. I'm also aware that there may not be a clear answer as it might depend on too many other contributing factors.

    The DDL for the tables is:

    /****** Object:  Table [dbo].[Posts]    Script Date: 24/10/2019 15:41:07 ******/
    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[Posts](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [AcceptedAnswerId] [int] NULL,
    [AnswerCount] [int] NULL,
    [Body] [nvarchar](max) NOT NULL,
    [ClosedDate] [datetime] NULL,
    [CommentCount] [int] NULL,
    [CommunityOwnedDate] [datetime] NULL,
    [CreationDate] [datetime] NOT NULL,
    [FavoriteCount] [int] NULL,
    [LastActivityDate] [datetime] NOT NULL,
    [LastEditDate] [datetime] NULL,
    [LastEditorDisplayName] [nvarchar](40) NULL,
    [LastEditorUserId] [int] NULL,
    [OwnerUserId] [int] NULL,
    [ParentId] [int] NULL,
    [PostTypeId] [int] NOT NULL,
    [Score] [int] NOT NULL,
    [Tags] [nvarchar](150) NULL,
    [Title] [nvarchar](250) NULL,
    [ViewCount] [int] NOT NULL
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO

    /****** Object: Table [dbo].[Comments] Script Date: 24/10/2019 15:57:05 ******/
    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[Comments](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [CreationDate] [datetime] NOT NULL,
    [PostId] [int] NOT NULL,
    [Score] [int] NULL,
    [Text] [nvarchar](700) NOT NULL,
    [UserId] [int] NULL,
    CONSTRAINT [PK_Comments__Id] PRIMARY KEY CLUSTERED
    (
    [Id] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    GO


    Two of the indexes are as follows:

    /****** Object:  Index [idxPostID]    Script Date: 24/10/2019 15:40:17 ******/
    CREATE CLUSTERED INDEX [idxPostID] ON [dbo].[Posts]
    (
    [Id] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO

    /****** Object: Index [idxPostCUI] Script Date: 24/10/2019 15:40:40 ******/
    CREATE NONCLUSTERED INDEX [idxPostCUI] ON [dbo].[Posts]
    (
    [OwnerUserId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO

    My query is:

    Select id from posts where id in (select postid from comments)

    The execution plan is:

    <?xml version="1.0" encoding="utf-16"?>
    <ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.481" Build="14.0.3192.2" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
    <BatchSequence>
    <Batch>
    <Statements>
    <StmtSimple StatementCompId="1" StatementEstRows="24484400" StatementId="1" StatementOptmLevel="FULL" CardinalityEstimationModelVersion="70" StatementSubTreeCost="1294.62" StatementText="Select id from posts where id in (select postid from comments)" StatementType="SELECT" QueryHash="0x3EFF698920163089" QueryPlanHash="0xB9DF43C412BAC1AD" RetrievedFromCache="true" SecurityPolicyApplied="false">
    <StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
    <QueryPlan CachedPlanSize="48" CompileTime="4" CompileCPU="4" CompileMemory="296">
    <ThreadStat Branches="2" />
    <MemoryGrantInfo SerialRequiredMemory="1024" SerialDesiredMemory="4511592" />
    <OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="104857" EstimatedPagesCached="13107" EstimatedAvailableDegreeOfParallelism="2" MaxCompileMemory="1309920" />
    <RelOp AvgRowSize="11" EstimateCPU="22.9642" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="24484400" LogicalOp="Gather Streams" NodeId="1" Parallel="true" PhysicalOp="Parallelism" EstimatedTotalSubtreeCost="1294.62">
    <OutputList>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
    </OutputList>
    <Parallelism>
    <RelOp AvgRowSize="11" EstimateCPU="286.071" EstimateIO="394.003" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="24484400" LogicalOp="Inner Join" NodeId="2" Parallel="true" PhysicalOp="Hash Match" EstimatedTotalSubtreeCost="1271.65">
    <OutputList>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
    </OutputList>
    <MemoryFractions Input="1" Output="1" />
    <Hash>
    <DefinedValues />
    <HashKeysBuild>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
    </HashKeysBuild>
    <HashKeysProbe>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
    </HashKeysProbe>
    <RelOp AvgRowSize="11" EstimateCPU="16.9169" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="24484800" LogicalOp="Bitmap Create" NodeId="3" Parallel="true" PhysicalOp="Bitmap" EstimatedTotalSubtreeCost="452.424">
    <OutputList>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
    </OutputList>
    <Bitmap>
    <DefinedValues>
    <DefinedValue>
    <ColumnReference Column="Bitmap1005" />
    </DefinedValue>
    </DefinedValues>
    <HashKeys>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
    </HashKeys>
    <RelOp AvgRowSize="11" EstimateCPU="16.9169" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="24484800" LogicalOp="Aggregate" NodeId="4" Parallel="true" PhysicalOp="Stream Aggregate" EstimatedTotalSubtreeCost="452.424">
    <OutputList>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
    </OutputList>
    <StreamAggregate>
    <DefinedValues />
    <GroupBy>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
    </GroupBy>
    <RelOp AvgRowSize="11" EstimateCPU="267.68" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="43182600" LogicalOp="Repartition Streams" NodeId="5" Parallel="true" PhysicalOp="Parallelism" EstimatedTotalSubtreeCost="435.507">
    <OutputList>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
    </OutputList>
    <Parallelism PartitioningType="Hash">
    <PartitionColumns>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
    </PartitionColumns>
    <OrderBy>
    <OrderByColumn Ascending="true">
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
    </OrderByColumn>
    </OrderBy>
    <RelOp AvgRowSize="11" EstimateCPU="29.6551" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="43182600" LogicalOp="Aggregate" NodeId="6" Parallel="true" PhysicalOp="Stream Aggregate" EstimatedTotalSubtreeCost="167.827">
    <OutputList>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
    </OutputList>
    <StreamAggregate>
    <DefinedValues />
    <GroupBy>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
    </GroupBy>
    <RelOp AvgRowSize="11" EstimateCPU="41.4909" EstimateIO="96.6809" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="75437800" EstimatedRowsRead="75437800" LogicalOp="Index Scan" NodeId="7" Parallel="true" PhysicalOp="Index Scan" EstimatedTotalSubtreeCost="138.172" TableCardinality="75437800">
    <OutputList>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
    </OutputList>
    <IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
    <DefinedValues>
    <DefinedValue>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
    </DefinedValue>
    </DefinedValues>
    <Object Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Index="[idxCommentPostID]" IndexKind="NonClustered" Storage="RowStore" />
    </IndexScan>
    </RelOp>
    </StreamAggregate>
    </RelOp>
    </Parallelism>
    </RelOp>
    </StreamAggregate>
    </RelOp>
    </Bitmap>
    </RelOp>
    <RelOp AvgRowSize="11" EstimateCPU="55.0462" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="45919800" LogicalOp="Repartition Streams" NodeId="8" Parallel="true" PhysicalOp="Parallelism" EstimatedTotalSubtreeCost="139.154">
    <OutputList>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
    </OutputList>
    <Parallelism PartitioningType="Hash">
    <PartitionColumns>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
    </PartitionColumns>
    <RelOp AvgRowSize="11" EstimateCPU="25.256" EstimateIO="58.852" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="45919800" EstimatedRowsRead="45919800" LogicalOp="Index Scan" NodeId="9" Parallel="true" PhysicalOp="Index Scan" EstimatedTotalSubtreeCost="84.108" TableCardinality="45919800">
    <OutputList>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
    </OutputList>
    <IndexScan Ordered="false" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
    <DefinedValues>
    <DefinedValue>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
    </DefinedValue>
    </DefinedValues>
    <Object Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Index="[idxPostCUI]" IndexKind="NonClustered" Storage="RowStore" />
    <Predicate>
    <ScalarOperator ScalarString="PROBE([Bitmap1005],[StackOverflow].[dbo].[Posts].[Id],N'[IN ROW]')">
    <Intrinsic FunctionName="PROBE">
    <ScalarOperator>
    <Identifier>
    <ColumnReference Column="Bitmap1005" />
    </Identifier>
    </ScalarOperator>
    <ScalarOperator>
    <Identifier>
    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
    </Identifier>
    </ScalarOperator>
    <ScalarOperator>
    <Const ConstValue="N'[IN ROW]'" />
    </ScalarOperator>
    </Intrinsic>
    </ScalarOperator>
    </Predicate>
    </IndexScan>
    </RelOp>
    </Parallelism>
    </RelOp>
    </Hash>
    </RelOp>
    </Parallelism>
    </RelOp>
    </QueryPlan>
    </StmtSimple>
    </Statements>
    </Batch>
    </BatchSequence>
    </ShowPlanXML>

    I'm wondering why the optimiser would use the idxPostCUI non clustered index rather than the idxPostID clustered index to get the list of IDs from the post table? The only reason I could thing of is because it's a smaller index and the non clustered index will have links to the clustered index?

    If anyone is interested in which database I'm using it's this one https://www.brentozar.com/archive/2018/06/new-stack-overflow-public-database-available-2018-06/

    Thanks

  • I believe that the optimizer will see the estimated count from the subquery and determine that the list of postid from the Comments table won't be selective enough to justify doing seeks against the clustered index.

    Then, since the column in the query is just id, and it is included in the nonclustered index because it's the clustered index key, then the cheapest way to get a list of all the ids that match the Comments table is to use the smaller index idxPostCUI and build a hash table using multiple CPU cores to filter the matches.

  • Thanks for your help.

     

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply