Index scan vs. seek

  • I've a unique non-clustered partitioned index which time to time gets scanned which I believe causes performance issues every time that happens. When checking for fragmentation, it shows .1 to .2 for just a handful of partitions. If the index is not fragmented what causes the scans?

  • Estimated data volume and cost of traversing just the leaf level vs. running the tree repeatedly.

    One example cause: http://sqlskills.com/BLOGS/KIMBERLY/category/The-Tipping-Point.aspx


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (10/27/2011)


    Estimated data volume and cost of traversing just the leaf level vs. running the tree repeatedly.

    One example cause: http://sqlskills.com/BLOGS/KIMBERLY/category/The-Tipping-Point.aspx

    I don't think this is the case in my situation. I've a proc that takes way too long to exectute, an average 3 seconds and all it is doing:

    select f1 from t1 where f2 = @val

    The index in question is on f2 with an include on a couple other fields. So it basically returns only 1 value. And this is where I see some scans but no fragmentation...

  • Index fragmentation shouldn't have an impact on whether the query optimizer chooses a scan over a seek; the optimizer tries to minimize reads when choosing a query plan, but index fragmentation is outside the scope of what it looks at. A highly fragmented index can cause a scan against it to perform worse than expected, but it won't cause the optimizer to scan if it would normally have used a seek.

  • Lexa (10/27/2011)


    Evil Kraig F (10/27/2011)


    Estimated data volume and cost of traversing just the leaf level vs. running the tree repeatedly.

    One example cause: http://sqlskills.com/BLOGS/KIMBERLY/category/The-Tipping-Point.aspx

    I don't think this is the case in my situation. I've a proc that takes way too long to exectute, an average 3 seconds and all it is doing:

    select f1 from t1 where f2 = @val

    The index in question is on f2 with an include on a couple other fields. So it basically returns only 1 value. And this is where I see some scans but no fragmentation...

    I would bet that the statistics for this index are out of date. The statistics are only updated when 20% of the records in the index (plus another 500 records) have been added or updated. That's really not often enough.

    The query optimizer uses these statistics to calculate how efficient a given query plan might be. An index seek is not always faster than an index scan. Let's say, for example that you are querying a table of sales orders for Amazon.com. If you want to find a list of all sales shipped to Zimbabwe, that is going to be a relatively short list. The query optimizer will use an index seek for that query. If on the other hand, you want a list of all sales shipped in the United States, it just might be more efficient to do an index scan.

    Here is why that is so. An index is a B-Tree (or Balanced-Tree) structure. It is like a big triangle. (Right now I wish I could draw a picture for you.) When you do an index seek, you start at the top of the triangle. The tip of the triangle tells you which direction to go (left or right) to find the record you want. The next level down will tell you the same thing (left or right) until you get to the bottom level (also called the Leaf-Level) of the index. The leaf-level contains a link to the record you want.

    I have heard (but I have not confirmed) that the typical index has five to seven levels in it. So, retrieving a single record via an index seek actually requires five to seven reads (depending). If your index has millions of records, and you only want a few of them, then this is a very efficient operation. However, if you are retrieving a large percentange of the records in the index, it is more efficient to simply scan the leaf-level of the index without navigating your way through the rest of the tree. If you scan the leaf level, then one record requires one read.

    So, back to your example. Do this on your system. Run your query and get the Actual Execution plan. After your query finishes, check the properties for the SELECT operator. (That's the one on the far left.) Compare the Estimated Rows in the SELECT operator with the actual number of records that the query retrieved. From what you tell me, I will bet that there is a large discrepancy. The estimate is based on the index statistics. The query plan is based on the estimate. I am sure that you can see where I am going with this.

    After you do this little test, update the statistics for that index. The command is UPDATE STATISTICS, and it would be good for you to read the details about the command in Books Online. Once you have updated the statistics, run the query again. (Make sure that you are getting the Actual Execution Plan again.) Compare the plan, the estimated vs. actual rows and the performance with your first test run. I believe you will be quite surprised.

    Oh, and come back to tell us about it (especially if I happen to be wrong).

  • David Moutray (10/27/2011)


    Lexa (10/27/2011)


    Evil Kraig F (10/27/2011)


    Estimated data volume and cost of traversing just the leaf level vs. running the tree repeatedly.

    One example cause: http://sqlskills.com/BLOGS/KIMBERLY/category/The-Tipping-Point.aspx

    I don't think this is the case in my situation. I've a proc that takes way too long to exectute, an average 3 seconds and all it is doing:

    select f1 from t1 where f2 = @val

    The index in question is on f2 with an include on a couple other fields. So it basically returns only 1 value. And this is where I see some scans but no fragmentation...

    I would bet that the statistics for this index are out of date. The statistics are only updated when 20% of the records in the index (plus another 500 records) have been added or updated. That's really not often enough.

    The query optimizer uses these statistics to calculate how efficient a given query plan might be. An index seek is not always faster than an index scan. Let's say, for example that you are querying a table of sales orders for Amazon.com. If you want to find a list of all sales shipped to Zimbabwe, that is going to be a relatively short list. The query optimizer will use an index seek for that query. If on the other hand, you want a list of all sales shipped in the United States, it just might be more efficient to do an index scan.

    Here is why that is so. An index is a B-Tree (or Balanced-Tree) structure. It is like a big triangle. (Right now I wish I could draw a picture for you.) When you do an index seek, you start at the top of the triangle. The tip of the triangle tells you which direction to go (left or right) to find the record you want. The next level down will tell you the same thing (left or right) until you get to the bottom level (also called the Leaf-Level) of the index. The leaf-level contains a link to the record you want.

    I have heard (but I have not confirmed) that the typical index has five to seven levels in it. So, retrieving a single record via an index seek actually requires five to seven reads (depending). If your index has millions of records, and you only want a few of them, then this is a very efficient operation. However, if you are retrieving a large percentange of the records in the index, it is more efficient to simply scan the leaf-level of the index without navigating your way through the rest of the tree. If you scan the leaf level, then one record requires one read.

    So, back to your example. Do this on your system. Run your query and get the Actual Execution plan. After your query finishes, check the properties for the SELECT operator. (That's the one on the far left.) Compare the Estimated Rows in the SELECT operator with the actual number of records that the query retrieved. From what you tell me, I will bet that there is a large discrepancy. The estimate is based on the index statistics. The query plan is based on the estimate. I am sure that you can see where I am going with this.

    After you do this little test, update the statistics for that index. The command is UPDATE STATISTICS, and it would be good for you to read the details about the command in Books Online. Once you have updated the statistics, run the query again. (Make sure that you are getting the Actual Execution Plan again.) Compare the plan, the estimated vs. actual rows and the performance with your first test run. I believe you will be quite surprised.

    Oh, and come back to tell us about it (especially if I happen to be wrong).

    Ok, thanks for such a detailed reply.

  • JonFox (10/27/2011)


    Index fragmentation shouldn't have an impact on whether the query optimizer chooses a scan over a seek; the optimizer tries to minimize reads when choosing a query plan, but index fragmentation is outside the scope of what it looks at. A highly fragmented index can cause a scan against it to perform worse than expected, but it won't cause the optimizer to scan if it would normally have used a seek.

    Sounds like a bit of contradictions here. As far as I know, the query optimizer does use the estimated number of pages to read as a part of the optimization. If the index is (heavily) fragmented, the number of pages in the index will increase, and that can in turn affect the execution plan chosen. However, this would make it more likely to see index seeks, not scans.



    Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

  • Can you post the index definition please?

    Can you post the execution plan please?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • David Moutray (10/27/2011)


    I would bet that the statistics for this index are out of date. The statistics are only updated when 20% of the records in the index (plus another 500 records) have been added or updated. That's really not often enough.

    Possible, but usually stale stats has the opposite effect, seeks and key lookups where a scan would be appropriate.

    The query optimizer uses these statistics to calculate how efficient a given query plan might be. An index seek is not always faster than an index scan. Let's say, for example that you are querying a table of sales orders for Amazon.com. If you want to find a list of all sales shipped to Zimbabwe, that is going to be a relatively short list. The query optimizer will use an index seek for that query. If on the other hand, you want a list of all sales shipped in the United States, it just might be more efficient to do an index scan.

    Hmmm, that example sounds rather familiar.... 😀

    I have heard (but I have not confirmed) that the typical index has five to seven levels in it.

    5-7 is pretty deep, especially for a nonclustered index. 3-4's more the usual for NC indexes, unless the index key is huge and there are millions of row

    I have a table with 40 million rows in (avg 600 bytes/row). The clustered index is 4 levels deep. The fan-out for b-trees is large, that's what makes them efficient.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • okbangas (10/28/2011)


    JonFox (10/27/2011)


    Index fragmentation shouldn't have an impact on whether the query optimizer chooses a scan over a seek; the optimizer tries to minimize reads when choosing a query plan, but index fragmentation is outside the scope of what it looks at. A highly fragmented index can cause a scan against it to perform worse than expected, but it won't cause the optimizer to scan if it would normally have used a seek.

    Sounds like a bit of contradictions here. As far as I know, the query optimizer does use the estimated number of pages to read as a part of the optimization. If the index is (heavily) fragmented, the number of pages in the index will increase, and that can in turn affect the execution plan chosen. However, this would make it more likely to see index seeks, not scans.

    That's a great point, I hadn't thought about it that way. Thanks!

  • JonFox (10/28/2011)


    okbangas (10/28/2011)


    JonFox (10/27/2011)


    Index fragmentation shouldn't have an impact on whether the query optimizer chooses a scan over a seek; the optimizer tries to minimize reads when choosing a query plan, but index fragmentation is outside the scope of what it looks at. A highly fragmented index can cause a scan against it to perform worse than expected, but it won't cause the optimizer to scan if it would normally have used a seek.

    Sounds like a bit of contradictions here. As far as I know, the query optimizer does use the estimated number of pages to read as a part of the optimization. If the index is (heavily) fragmented, the number of pages in the index will increase, and that can in turn affect the execution plan chosen. However, this would make it more likely to see index seeks, not scans.

    That's a great point, I hadn't thought about it that way. Thanks!

    Basically the optimiser doesn't care about fragmentation, but it does note page density.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hmmm, that example sounds rather familiar.... 😀

    Yes, I attended both of your presentations at PASS. They were both very enlightening. 🙂

  • GilaMonster (10/28/2011)


    Can you post the index definition please?

    Can you post the execution plan please?

    CREATE UNIQUE NONCLUSTERED INDEX my_index ON table1

    (

    [f1] ASC,

    [f2] ASC

    )

    INCLUDE ( [f3],

    [f4]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80)

    GO

    Execution plans is 100% index seek on my_index

  • Lexa (10/28/2011)


    Execution plans is 100% index seek on my_index

    In your initial post you said you had a scan.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Execution plans is 100% index seek on my_index

    We would like to see the actual execution plan, if you wouldn't mind. (The ACTUAL plan, not the estimated plan.)

    Right click anywhere on your execution plan, and select show execution plan xml. The XML will open in a new query window. Then just copy and paste the xml into a post here. (You can put xml tags around it. See the IFCode Shortcuts to the left. <--)

    We can then save the xml you post as a file with a .sqlplan extension. When we double-click on the file the actual execution plan will open in management studio for us. We'll be able to see what you see.

    There is a LOT of information in the execution plan. We'd like to see it all! 🙂

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply