Strange Query Optimization

  • I have a query that performs as well as I need it to, but I would like to understand why the criteria I have added to the query makes it so quick.  Here is the query:

    DECLARE @batchuid VARCHAR(40)

    SELECT tDocArchive.arcuid, tOCRTokens.oct_token FROM tOCRTokens

    INNER JOIN tDocOCRTokens ON tDocOCRTokens.ocd_octuid = tOCRTokens.octuid

    INNER JOIN tDocArchive ON tDocArchive.arcuid = tDocOCRTokens.ocd_arcuid

    WHERE tDocArchive.doa_batchuid = @batchuid

    AND tDocArchive.doa_batchorder%1 < 1

    ORDER BY tDocArchive.doa_batchorder

    When I run this query it takes 5 seconds to return 24599 rows but, if I remove the "tDocArchive.doa_batchorder%1 < 1" piece of criteria it takes 24 seconds to run.  How is this possible given the following index details:

    tDocArchive.arcuid - clustered primary key

    tOCRTokens.octuid - clusterd primary key

    tDocOCRTokens.ocd_octuid - indexed foreign key

    tDocOCRTokens.ocd_arcuid - indexed foreign key

    The doa_batchorder field does not have an index!

    The only thing I can think of is that the ordering is helping the optimization.  The query execution plan shows that the sort on doa_batchorder happens first.

    Thanks, any insight is appreciated.

    Tony

  • How many rows do you get if you remove that condition?

  • I get the same number of rows, the results are the same.

  • I would imagine the same number of results, as any integer will return true for n%1<1...

  • Another question... what is this supposed to filter?

    AND tDocArchive.doa_batchorder%1 < 1

    Can you rerun the queries and send us the execution plans

    SET SHOWPLAN_TEXT ON

    GO

    Query1

    GO

    Query2

    GO

    SET SHOWPLAN_TEXT OFF

  • Ya... it's so obvious (when you think about it) .

  • I am not attempting to filter with that filter, I only noticed that the query runs faster.  Here is the plan you requested.

      |--Sort(ORDER BY: ([tDocArchive].[doa_batchorder] ASC))

           |--Merge Join(Inner Join, MERGE: ([tOCRTokens].[octuid])=([tDocOCRTokens].[ocd_octuid]), RESIDUAL: ([tOCRTokens].[octuid]=[tDocOCRTokens].[ocd_octuid]))

                |--Index Scan(OBJECT: ([SecureDocs_v1].[dbo].[tOCRTokens].[pk_tOCRTokens]), ORDERED FORWARD)

                |--Sort(ORDER BY: ([tDocOCRTokens].[ocd_octuid] ASC))

                     |--Bookmark Lookup(BOOKMARK: ([Bmk1002]), OBJECT: ([SecureDocs_v1].[dbo].[tDocOCRTokens]))

                          |--Nested Loops(Inner Join, OUTER REFERENCES: ([tDocArchive].[arcuid]) WITH PREFETCH)

                               |--Filter(WHERE: ([tDocArchive].[doa_batchorder]%1<1))

                               |    |--Bookmark Lookup(BOOKMARK: ([Bmk1004]), OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive]))

                               |         |--Index Seek(OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive].[idx07_tDocArchive]), SEEK: ([tDocArchive].[doa_batchuid]='ddf66fe212a14c2191d6ee43b43f762b') ORDERED FORWARD)

                               |--Index Seek(OBJECT: ([SecureDocs_v1].[dbo].[tDocOCRTokens].[idx03_tDocOCRTokens]), SEEK: ([tDocOCRTokens].[ocd_arcuid]=[tDocArchive].[arcuid]) ORDERED FORWARD)

    Thanks for taking a look at this.

  • where's the second plan?

  • Oops, I guess that would help

    With: tDocArchive.doa_batchorder%1 < 1

      |--Sort(ORDER BY: ([tDocArchive].[doa_batchorder] ASC))

           |--Merge Join(Inner Join, MERGE: ([tOCRTokens].[octuid])=([tDocOCRTokens].[ocd_octuid]), RESIDUAL: ([tOCRTokens].[octuid]=[tDocOCRTokens].[ocd_octuid]))

                |--Index Scan(OBJECT: ([SecureDocs_v1].[dbo].[tOCRTokens].[pk_tOCRTokens]), ORDERED FORWARD)

                |--Sort(ORDER BY: ([tDocOCRTokens].[ocd_octuid] ASC))

                     |--Bookmark Lookup(BOOKMARK: ([Bmk1002]), OBJECT: ([SecureDocs_v1].[dbo].[tDocOCRTokens]))

                          |--Nested Loops(Inner Join, OUTER REFERENCES: ([tDocArchive].[arcuid]) WITH PREFETCH)

                               |--Filter(WHERE: ([tDocArchive].[doa_batchorder]%1<1))

                               |    |--Bookmark Lookup(BOOKMARK: ([Bmk1004]), OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive]))

                               |         |--Index Seek(OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive].[idx07_tDocArchive]), SEEK: ([tDocArchive].[doa_batchuid]='ddf66fe212a14c2191d6ee43b43f762b') ORDERED FORWARD)

                               |--Index Seek(OBJECT: ([SecureDocs_v1].[dbo].[tDocOCRTokens].[idx03_tDocOCRTokens]), SEEK: ([tDocOCRTokens].[ocd_arcuid]=[tDocArchive].[arcuid]) ORDERED FORWARD)

     

    Without: tDocArchive.doa_batchorder%1 < 1

      |--Sort(ORDER BY: ([tDocArchive].[doa_batchorder] ASC))

           |--Merge Join(Inner Join, MERGE: ([tOCRTokens].[octuid])=([tDocOCRTokens].[ocd_octuid]), RESIDUAL: ([tOCRTokens].[octuid]=[tDocOCRTokens].[ocd_octuid]))

                |--Index Scan(OBJECT: ([SecureDocs_v1].[dbo].[tOCRTokens].[pk_tOCRTokens]), ORDERED FORWARD)

                |--Sort(ORDER BY: ([tDocOCRTokens].[ocd_octuid] ASC))

                     |--Hash Match(Inner Join, HASH: ([tDocArchive].[arcuid])=([tDocOCRTokens].[ocd_arcuid]), RESIDUAL: ([tDocArchive].[arcuid]=[tDocOCRTokens].[ocd_arcuid]))

                          |--Bookmark Lookup(BOOKMARK: ([Bmk1004]), OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive]))

                          |    |--Index Seek(OBJECT: ([SecureDocs_v1].[dbo].[tDocArchive].[idx07_tDocArchive]), SEEK: ([tDocArchive].[doa_batchuid]='ddf66fe212a14c2191d6ee43b43f762b') ORDERED FORWARD)

                          |--Clustered Index Scan(OBJECT: ([SecureDocs_v1].[dbo].[tDocOCRTokens].[pk_tDocOCRTokens]))

    Thanks

  • the where seems to force and index seek (removing a scan.. which is probabely the longest operation of the query).

    However I can't really be sure without the relative cost to the batch... and most importantly why???

  • for real, why why why

  • I wish I knew...

  • I would agree that the index seek is the most significant difference between the two queries.  The tDocOcrTokens table has the greatest number of rows of any table in the database, over 10,000,000, all the other tables have less than 500,000 so this is the most costly operation in the query, is that what you mean when you meant by "the relative cost to the batch?"

    Anyhow, Thanks a ton.  If I find out why I'll post.  I am off for the weekend.

  • Hi

    1. With: tDocArchive.doa_batchorder%1 < 1 :: It uses "Nested Loops" JOIN whereas without ...%1 <1 it uses "Hash Match" JOIN.   HASH JOIN uses more memory w.r.t. Nested LOOP JOIN. The execution plan is different & so is the result.

    2. INDEX on tDocArchive.doa_batchuid  & tDocArchive.doa_batchorder are missing.

    3. JOIN tables with small records in the begning of the SQL SCRIPT.

    4 UPDATE the statistics for all the indexes.

    5. Preferably filter the records in the INNER JOIN.   e.g.  INNER JOIN TABLE1 on ( TABLE1.field10 = 'value' AND TABLE1.field1 = TABLE2.field2 )

     

    regds/ramanuj


    🙂

  • I have changed the query to read as follows, performance seems to be the same.

    SELECT tDocArchive.arcuid, tOCRTokens.oct_token

    FROM tOCRTokens

    INNER JOIN tDocOCRTokens ON tDocOCRTokens.ocd_octuid = tOCRTokens.octuid

    INNER JOIN tDocArchive ON (tDocArchive.doa_batchorder%1 < 1 AND

        tDocArchive.doa_batchuid = @batchuid AND

        tDocArchive.arcuid = tDocOCRTokens.ocd_arcuid)

    ORDER BY tDocArchive.doa_batchorder

    The doa_batchuid does have an index, just not doa_batchorder, until this query I have only used doa_batchorder to order the batch, it has never been used as a filter before.  I added an index to doa_batchorder but it did not seem to make a difference so I removed it.  Is there a better way to convince this query to use a NESTED LOOP JOIN instead of a HASH JOIN?  I have looked into query hints but I don't seem to see a hint that will do this.

    Also, I have a job that updates the statistics and defrags indexes early every morning.

    Thanks, Tony

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply