Index scan vs. seek

Question

Post reply

Index scan vs. seek

Lexa

Hall of Fame

Points: 3314
More actions
October 27, 2011 at 6:15 pm

#250902

I've a unique non-clustered partitioned index which time to time gets scanned which I believe causes performance issues every time that happens. When checking for fragmentation, it shows .1 to .2 for just a handful of partitions. If the index is not fragmented what causes the scans?

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 1

Estimated data volume and cost of traversing just the leaf level vs. running the tree repeatedly.

One example cause: http://sqlskills.com/BLOGS/KIMBERLY/category/The-Tipping-Point.aspx

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Lexa Hall of Fame Points: 3314 More actions · Answer 2

Evil Kraig F (10/27/2011)

Estimated data volume and cost of traversing just the leaf level vs. running the tree repeatedly.

One example cause: http://sqlskills.com/BLOGS/KIMBERLY/category/The-Tipping-Point.aspx

I don't think this is the case in my situation. I've a proc that takes way too long to exectute, an average 3 seconds and all it is doing:

select f1 from t1 where f2 = @val

The index in question is on f2 with an include on a couple other fields. So it basically returns only 1 value. And this is where I see some scans but no fragmentation...

JonFox SSCrazy Points: 2157 More actions · Answer 3

Index fragmentation shouldn't have an impact on whether the query optimizer chooses a scan over a seek; the optimizer tries to minimize reads when choosing a query plan, but index fragmentation is outside the scope of what it looks at. A highly fragmented index can cause a scan against it to perform worse than expected, but it won't cause the optimizer to scan if it would normally have used a seek.

David Moutray SSCertifiable Points: 5704 More actions · Answer 4

Lexa (10/27/2011)

Evil Kraig F (10/27/2011)

Estimated data volume and cost of traversing just the leaf level vs. running the tree repeatedly.

One example cause: http://sqlskills.com/BLOGS/KIMBERLY/category/The-Tipping-Point.aspx

I don't think this is the case in my situation. I've a proc that takes way too long to exectute, an average 3 seconds and all it is doing:

select f1 from t1 where f2 = @val

The index in question is on f2 with an include on a couple other fields. So it basically returns only 1 value. And this is where I see some scans but no fragmentation...

I would bet that the statistics for this index are out of date. The statistics are only updated when 20% of the records in the index (plus another 500 records) have been added or updated. That's really not often enough.

The query optimizer uses these statistics to calculate how efficient a given query plan might be. An index seek is not always faster than an index scan. Let's say, for example that you are querying a table of sales orders for Amazon.com. If you want to find a list of all sales shipped to Zimbabwe, that is going to be a relatively short list. The query optimizer will use an index seek for that query. If on the other hand, you want a list of all sales shipped in the United States, it just might be more efficient to do an index scan.

Here is why that is so. An index is a B-Tree (or Balanced-Tree) structure. It is like a big triangle. (Right now I wish I could draw a picture for you.) When you do an index seek, you start at the top of the triangle. The tip of the triangle tells you which direction to go (left or right) to find the record you want. The next level down will tell you the same thing (left or right) until you get to the bottom level (also called the Leaf-Level) of the index. The leaf-level contains a link to the record you want.

I have heard (but I have not confirmed) that the typical index has five to seven levels in it. So, retrieving a single record via an index seek actually requires five to seven reads (depending). If your index has millions of records, and you only want a few of them, then this is a very efficient operation. However, if you are retrieving a large percentange of the records in the index, it is more efficient to simply scan the leaf-level of the index without navigating your way through the rest of the tree. If you scan the leaf level, then one record requires one read.

So, back to your example. Do this on your system. Run your query and get the Actual Execution plan. After your query finishes, check the properties for the SELECT operator. (That's the one on the far left.) Compare the Estimated Rows in the SELECT operator with the actual number of records that the query retrieved. From what you tell me, I will bet that there is a large discrepancy. The estimate is based on the index statistics. The query plan is based on the estimate. I am sure that you can see where I am going with this.

After you do this little test, update the statistics for that index. The command is UPDATE STATISTICS, and it would be good for you to read the details about the command in Books Online. Once you have updated the statistics, run the query again. (Make sure that you are getting the Actual Execution Plan again.) Compare the plan, the estimated vs. actual rows and the performance with your first test run. I believe you will be quite surprised.

Oh, and come back to tell us about it (especially if I happen to be wrong).

Lexa Hall of Fame Points: 3314 More actions · Answer 5

David Moutray (10/27/2011)

Lexa (10/27/2011)

Evil Kraig F (10/27/2011)

Estimated data volume and cost of traversing just the leaf level vs. running the tree repeatedly.

One example cause: http://sqlskills.com/BLOGS/KIMBERLY/category/The-Tipping-Point.aspx

I don't think this is the case in my situation. I've a proc that takes way too long to exectute, an average 3 seconds and all it is doing:

select f1 from t1 where f2 = @val

The index in question is on f2 with an include on a couple other fields. So it basically returns only 1 value. And this is where I see some scans but no fragmentation...

I would bet that the statistics for this index are out of date. The statistics are only updated when 20% of the records in the index (plus another 500 records) have been added or updated. That's really not often enough.

The query optimizer uses these statistics to calculate how efficient a given query plan might be. An index seek is not always faster than an index scan. Let's say, for example that you are querying a table of sales orders for Amazon.com. If you want to find a list of all sales shipped to Zimbabwe, that is going to be a relatively short list. The query optimizer will use an index seek for that query. If on the other hand, you want a list of all sales shipped in the United States, it just might be more efficient to do an index scan.

Here is why that is so. An index is a B-Tree (or Balanced-Tree) structure. It is like a big triangle. (Right now I wish I could draw a picture for you.) When you do an index seek, you start at the top of the triangle. The tip of the triangle tells you which direction to go (left or right) to find the record you want. The next level down will tell you the same thing (left or right) until you get to the bottom level (also called the Leaf-Level) of the index. The leaf-level contains a link to the record you want.

I have heard (but I have not confirmed) that the typical index has five to seven levels in it. So, retrieving a single record via an index seek actually requires five to seven reads (depending). If your index has millions of records, and you only want a few of them, then this is a very efficient operation. However, if you are retrieving a large percentange of the records in the index, it is more efficient to simply scan the leaf-level of the index without navigating your way through the rest of the tree. If you scan the leaf level, then one record requires one read.

So, back to your example. Do this on your system. Run your query and get the Actual Execution plan. After your query finishes, check the properties for the SELECT operator. (That's the one on the far left.) Compare the Estimated Rows in the SELECT operator with the actual number of records that the query retrieved. From what you tell me, I will bet that there is a large discrepancy. The estimate is based on the index statistics. The query plan is based on the estimate. I am sure that you can see where I am going with this.

After you do this little test, update the statistics for that index. The command is UPDATE STATISTICS, and it would be good for you to read the details about the command in Books Online. Once you have updated the statistics, run the query again. (Make sure that you are getting the Actual Execution Plan again.) Compare the plan, the estimated vs. actual rows and the performance with your first test run. I believe you will be quite surprised.

Oh, and come back to tell us about it (especially if I happen to be wrong).

Ok, thanks for such a detailed reply.

okbangas SSChampion Points: 11773 More actions · Answer 6

JonFox (10/27/2011)

Index fragmentation shouldn't have an impact on whether the query optimizer chooses a scan over a seek; the optimizer tries to minimize reads when choosing a query plan, but index fragmentation is outside the scope of what it looks at. A highly fragmented index can cause a scan against it to perform worse than expected, but it won't cause the optimizer to scan if it would normally have used a seek.

Sounds like a bit of contradictions here. As far as I know, the query optimizer does use the estimated number of pages to read as a part of the optimization. If the index is (heavily) fragmented, the number of pages in the index will increase, and that can in turn affect the execution plan chosen. However, this would make it more likely to see index seeks, not scans.

Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

Concatenating Row Values in Transact-SQL[/url]

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 7

Can you post the index definition please?

Can you post the execution plan please?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 8

David Moutray (10/27/2011)

I would bet that the statistics for this index are out of date. The statistics are only updated when 20% of the records in the index (plus another 500 records) have been added or updated. That's really not often enough.

Possible, but usually stale stats has the opposite effect, seeks and key lookups where a scan would be appropriate.

The query optimizer uses these statistics to calculate how efficient a given query plan might be. An index seek is not always faster than an index scan. Let's say, for example that you are querying a table of sales orders for Amazon.com. If you want to find a list of all sales shipped to Zimbabwe, that is going to be a relatively short list. The query optimizer will use an index seek for that query. If on the other hand, you want a list of all sales shipped in the United States, it just might be more efficient to do an index scan.

Hmmm, that example sounds rather familiar.... 😀

I have heard (but I have not confirmed) that the typical index has five to seven levels in it.

5-7 is pretty deep, especially for a nonclustered index. 3-4's more the usual for NC indexes, unless the index key is huge and there are millions of row

I have a table with 40 million rows in (avg 600 bytes/row). The clustered index is 4 levels deep. The fan-out for b-trees is large, that's what makes them efficient.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

JonFox SSCrazy Points: 2157 More actions · Answer 9

okbangas (10/28/2011)

JonFox (10/27/2011)

Index fragmentation shouldn't have an impact on whether the query optimizer chooses a scan over a seek; the optimizer tries to minimize reads when choosing a query plan, but index fragmentation is outside the scope of what it looks at. A highly fragmented index can cause a scan against it to perform worse than expected, but it won't cause the optimizer to scan if it would normally have used a seek.

Sounds like a bit of contradictions here. As far as I know, the query optimizer does use the estimated number of pages to read as a part of the optimization. If the index is (heavily) fragmented, the number of pages in the index will increase, and that can in turn affect the execution plan chosen. However, this would make it more likely to see index seeks, not scans.

That's a great point, I hadn't thought about it that way. Thanks!

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 10

JonFox (10/28/2011)

okbangas (10/28/2011)

JonFox (10/27/2011)

Index fragmentation shouldn't have an impact on whether the query optimizer chooses a scan over a seek; the optimizer tries to minimize reads when choosing a query plan, but index fragmentation is outside the scope of what it looks at. A highly fragmented index can cause a scan against it to perform worse than expected, but it won't cause the optimizer to scan if it would normally have used a seek.

Sounds like a bit of contradictions here. As far as I know, the query optimizer does use the estimated number of pages to read as a part of the optimization. If the index is (heavily) fragmented, the number of pages in the index will increase, and that can in turn affect the execution plan chosen. However, this would make it more likely to see index seeks, not scans.

That's a great point, I hadn't thought about it that way. Thanks!

Basically the optimiser doesn't care about fragmentation, but it does note page density.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

David Moutray SSCertifiable Points: 5704 More actions · Answer 11

David Moutray

SSCertifiable

Points: 5704

October 28, 2011 at 7:09 am

#1400614

Hmmm, that example sounds rather familiar.... 😀

Yes, I attended both of your presentations at PASS. They were both very enlightening. 🙂

Lexa Hall of Fame Points: 3314 More actions · Answer 12

GilaMonster (10/28/2011)

Can you post the index definition please?

Can you post the execution plan please?

CREATE UNIQUE NONCLUSTERED INDEX my_index ON table1

(

[f1] ASC,

[f2] ASC

)

INCLUDE ( [f3],

[f4]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80)

GO

Execution plans is 100% index seek on my_index

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 13

Lexa (10/28/2011)

Execution plans is 100% index seek on my_index

In your initial post you said you had a scan.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

David Moutray SSCertifiable Points: 5704 More actions · Answer 14

Execution plans is 100% index seek on my_index

We would like to see the actual execution plan, if you wouldn't mind. (The ACTUAL plan, not the estimated plan.)

Right click anywhere on your execution plan, and select show execution plan xml. The XML will open in a new query window. Then just copy and paste the xml into a post here. (You can put xml tags around it. See the IFCode Shortcuts to the left. <--)

We can then save the xml you post as a file with a .sqlplan extension. When we double-click on the file the actual execution plan will open in management studio for us. We'll be able to see what you see.

There is a LOT of information in the execution plan. We'd like to see it all! 🙂