Certain Number of rows before SQL will use an index ?

  • Am I losing the plot or....

    Is there a row count threshold that has to be passed before SQL will start using indexes clustered or non-clustered, rather than a table scan ?

    e.g if the table has say 100 rows will it still use table scans despite there being indexes present ?

    thanks simon

  • You have to ask the Query Optimizer 😛

    For small tables (like 100 rows), it may decide it's quicker & cheaper to scan the whole table

    For big tables, depending on the query, it could use Indexes or Table Scan

    Answer is = it depends

    SQLServerNewbie MCITP: Database Administrator SQL Server 2005
  • Depends on the table, depends on the query, depends on the index.

    e.g if the table has say 100 rows will it still use table scans despite there being indexes present ?

    If all 100 rows are in one data page, which is faster, to scan the entire table (1 page) or to use the index to seek (minimum 2 pages)?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Just to add to what Jerry already said...

    If you are using columns in your Select statement that are not part of your index in one way or another and the optimizer needs to do a bookmark lookup, with small tables like that one the table scan would be faster.

    I.E. you got the following

    Create Table #myUsers (

    username varchar(25) Primary Key,

    FName Varchar(30),

    LName Varchar(50)

    )

    And the only index you have is a PK on Username but your query is something like

    Select FName, LName

    FROM #MyUsers

    WHERE username = 'bgates'

    a table scan would be cheaper than the lookup that would be needed to find it in the index and then use the bookmark to get the rest of the data from the row, depending on the number of rows and 20 other factors.

    there is no magic number when the optimizer will prefer one way vs. another, it takes into account all manner of things including you stats and event he available memory and/or CPU cycles. If it can do the operation 2 ways, one which will take more memory and one which will take more CPU cycles and there is less contention for cpu than memory, it will use the plan that takes up more CPU and less memory. The next tiem it runs the opposite may happen.

    As Jerry said, It Depends and as always YMMV.

    -Luke.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • thank you all very much

    food for thought ...

    best wishes

    ~simon

  • I thought if a table was less than 8 pages, it won't ever use an index...

  • lookup the INDEX= hint for the FROM clause in BOL. You can force a query to use a particular index. Now you can compare the query cost (show actual execution plan) and the IOs required (set statistics IO on) for each flavor of the query:

    select *

    from mytable WITH (INDEX=0) --force table scan, which is probably unnecessary for small tables

    select *

    from mytable WITH (INDEX=myNCindex) --force index usage

    Ballpark the optimizer will switch to scan when it estimates you are asking for somewhere between 8 and 15% of the total number of rows in the table.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Mike Durnwald (12/9/2008)


    I thought if a table was less than 8 pages, it won't ever use an index...

    I had heard 1000 rows but it would appear by the knowledgable answers in this thread thats wrong ! 😀

  • The optimizer is cost-based. An index seek/bookmark lookup of a given depth takes X cost per row. A table (or clustered-index range) scan takes Y cost per page. Based on estimated rows retrieved it is pretty simple math to determine in this case which cost is less. Note that X does NOT equal Y here. For those inquisitive readers, Joe Chang has some amazing details available online from previous work he has done.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (12/9/2008)


    The optimizer is cost-based. An index seek/bookmark lookup of a given depth takes X cost per row. A table (or clustered-index range) scan takes Y cost per page. Based on estimated rows retrieved it is pretty simple math to determine in this case which cost is less. Note that X does NOT equal Y here. For those inquisitive readers, Joe Chang has some amazing details available online from previous work he has done.

    thank you very much 🙂

    simon

  • TheSQLGuru (12/9/2008)


    Ballpark the optimizer will switch to scan when it estimates you are asking for somewhere between 8 and 15% of the total number of rows in the table.

    Depends on the table. I've had it switch to a cluster/table scan with under 1% of the rows queried. With a non-covering index, of course.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (12/9/2008)


    TheSQLGuru (12/9/2008)


    Ballpark the optimizer will switch to scan when it estimates you are asking for somewhere between 8 and 15% of the total number of rows in the table.

    Depends on the table. I've had it switch to a cluster/table scan with under 1% of the rows queried. With a non-covering index, of course.

    1) I wonder if that was actual rows returned or estimated?

    2) I also wonder about the size of the table, like 100 rows or 100M rows and did it have bloated data size?

    3) I should have specifically excluded covering-index situations. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (12/9/2008)


    1) I wonder if that was actual rows returned or estimated?

    Estimated. The optimiser works with estimated row counts and that's what it uses for its costings and plan. It doesn't know, at optimisation time, what the actual row count is going to be and hence can't cost and optimise based on the actual row count.

    That said, in test cases that I've investigated with the stats up to date the two are the same

    2) I also wonder about the size of the table, like 100 rows or 100M rows and did it have bloated data size?

    Small, large, doesn't seem to matter. Though the larger the table, the smaller the break point seems to be. Not a large shift in the % though.

    Some setup first

    CREATE TABLE [dbo].[SeekOrScan](

    [ID] [int] IDENTITY(1,1) NOT NULL primary key,

    SomeNumber int,

    padding char(100)

    ) ON [PRIMARY]

    CREATE NonCLUSTERED INDEX [idx_SeekOrScan] ON [dbo].[SeekOrScan] (SomeNumber)

    First test - 1000 rows

    insert into [SeekOrScan] (SomeNumber)

    select top 1000 number

    from master..spt_values

    where name is null

    dbcc freeproccache

    select * from [SeekOrScan]

    where somenumber between 0 and 100 -- 10% of table

    -- Clustered index scan

    dbcc freeproccache

    select * from [SeekOrScan]

    where somenumber between 0 and 50 -- 5% of table

    -- Clustered index scan

    dbcc freeproccache

    select * from [SeekOrScan]

    where somenumber between 0 and 5 -- 0.6% of table

    -- Clustered index scan

    dbcc freeproccache

    select * from [SeekOrScan]

    where somenumber between 0 and 4 -- 0.5% of table

    -- nonclustered index seek

    Break point was around 0.5% of the total rows.

    1 000 000 rows

    truncate table SeekOrScan

    insert into [SeekOrScan] (SomeNumber)

    select top 1000000 0

    from master..spt_values a cross join master..spt_values b

    where a.name is null and b.name is null

    update [SeekOrScan] set SomeNumber = ID -- just so we've got sequential numbers for the between

    GO

    dbcc freeproccache

    select * from [SeekOrScan]

    where somenumber between 1 and 100000 -- 10% of table

    -- Clustered index scan

    dbcc freeproccache

    select * from [SeekOrScan]

    where somenumber between 1 and 50000 -- 5% of table

    -- Clustered index scan

    dbcc freeproccache

    select * from [SeekOrScan]

    where somenumber between 1 and 10000 -- 1%

    -- Clustered index scan

    dbcc freeproccache

    select * from [SeekOrScan]

    where somenumber between 1 and 5000 -- 0.5% of table

    -- clustered index scan

    dbcc freeproccache

    select * from [SeekOrScan]

    where somenumber between 1 and 3000 -- 0.3% of table

    -- nonclustered index seek

    Breakpoint here was between 0.3 and 0.4% of the table.

    For what it's worth, I've never seen the breakpoint at 15%. I don't think I've ever seen it as high as 10%

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the repro Gail! Clearly either my memory is faulty (wouldn't be the first time and it seems to be happening more and more since the birth of my daughter 3+ years ago :w00t:) or I am remembering metrics from a no-longer-existent version of the SQL Server.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply