large query shows 54M read on 200M table with only 568 results returned

  • Realise this is probably an impossible question to answer without knowing the system but any feedback welcome.

    4 Queries commited to SQL server 2005 TABLEA which contains 200M rows.

    The running of these queries slowed my insert times to TABLEA hence looking at the issue.

    (1) Oct 7/Oct 13 / READS=54,166,223 / 755981ms / + VALUE 7 11:41:26 11:54:02 = 568 results

    (2) Oct 7/Oct 13 / READS=54,166,503 / 581339ms /+ VALUE 7 11:43:17 11:52:58 = 568 results

    (3) Oct 12/Oct 13 / READS=1,072,410 / 17154ms / + VALUE 8 11:50:40 11:50:57 = 50results

    (4) Oct 12/Oct 13 / READS=18,171,411 / 214978ms / + VALUE 7 11:52:30 11:56:05 = 100 results

    Query (1) 7th to 13th shows 54166223 reads on the 200M row table yet the results retuned were approx 568.

    Query (2) 7th to 13th shows 54166503 reads on the 200M row table yet the results retuned were approx 568.

    Query (3) 12th to 13th shows 1072410 reads on the 200M row table yet the results retuned were approx 50.

    Query (4) 12th to 13th shows 18171411 reads on the 200M row table yet the results retuned were approx 100.

    The fact they were overlapping was causing the increase in write times. The query uses NOLOCK on the large 200M table.

    We do use indexes but judging by the number of reads its not looking that effcencet (query results above captuered via Profiler)

    We have indexes on TABLEA two of which are:

    index 1: PARENT_FK_ID + VALUE (i.e value used in query above)

    index 2: PARENT_FK_ID + DATE

    The users enter date range to search which we find to be very slow compared to numerical searches.

    Thanks for any info.

    Scott

  • scott_lotus (10/13/2010)


    Realise this is probably an impossible question to answer without knowing the system but any feedback welcome.

    4 Queries commited to SQL server 2005 TABLEA which contains 200M rows.

    The running of these queries slowed my insert times to TABLEA hence looking at the issue.

    (1) Oct 7/Oct 13 / READS=54,166,223 / 755981ms / + VALUE 7 11:41:26 11:54:02 = 568 results

    (2) Oct 7/Oct 13 / READS=54,166,503 / 581339ms /+ VALUE 7 11:43:17 11:52:58 = 568 results

    (3) Oct 12/Oct 13 / READS=1,072,410 / 17154ms / + VALUE 8 11:50:40 11:50:57 = 50results

    (4) Oct 12/Oct 13 / READS=18,171,411 / 214978ms / + VALUE 7 11:52:30 11:56:05 = 100 results

    Query (1) 7th to 13th shows 54166223 reads on the 200M row table yet the results retuned were approx 568.

    Query (2) 7th to 13th shows 54166503 reads on the 200M row table yet the results retuned were approx 568.

    Query (3) 12th to 13th shows 1072410 reads on the 200M row table yet the results retuned were approx 50.

    Query (4) 12th to 13th shows 18171411 reads on the 200M row table yet the results retuned were approx 100.

    The fact they were overlapping was causing the increase in write times. The query uses NOLOCK on the large 200M table.

    We do use indexes but judging by the number of reads its not looking that effcencet (query results above captuered via Profiler)

    We have indexes on TABLEA two of which are:

    index 1: PARENT_FK_ID + VALUE (i.e value used in query above)

    index 2: PARENT_FK_ID + DATE

    The users enter date range to search which we find to be very slow compared to numerical searches.

    Thanks for any info.

    Scott

    A query looking for a specifik value isnt going to be helped by those indexes. You would need something like

    index 3: VALUE + PARENT_FK_ID

    index 4: DATE + PARENT_FK_ID

    Lets say you have

    ID DATE

    1 2010-01-02

    2 2010-02-03

    3 2010-03-04

    4 2010-01-02

    Now with your 2 orginal indexes the index 2 would look exactly as above

    Now if you tell SQL too look for 2010-01-02 it will have to check EVERY row to see if it matches the date. It cant know that the rows you want are 1 and 4 any other way.

    With the 2 new indexes you get an index 4 that looks like

    2010-01-02 1

    2010-01-02 4

    2010-02-03 2

    2010-03-04 3

    A search for 2010-01-02 and SQL know when it gets to 2010-02-03 that there are no more values and can stop searching.

    This is a crude description on how SQL uses indexes. Its in reality a bit more complex then that. But this works i think to get a rough idea on how it works.

    /T

  • Can you post query, table definitions, index definitions and execution plan as per http://qa.sqlservercentral.com/articles/SQLServerCentral/66909/

    It's hard to give anything other than guesses without.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • thanks very much for the reply, sorry for all the spawned Qs below:

    1) so you are saying the first column in the index is the important one ? (that makes sense).

    2) i was half thinking an index with :

    PARENT_FK_ID / VALUE / DATE

    i.e all three columns in any order would have helped , but judging by your response its the first column that's important. based on the initial search criteria the USP needs. Obvious really , just had not twigged sorry.

    3) How important are "INCLUDED COLUMNS" are they the same as "GENERAL COLUMNS" ?

    4) When i have many index , how does SQL decide what index to use first ? (i understand i can use index hits on a query)

    5) judging by the amount of reads i guess my usp's where table scanning ?

    Thanks

    Scott

  • GilaMonster (10/13/2010)


    Can you post query, table definitions, index definitions and execution plan as per http://qa.sqlservercentral.com/articles/SQLServerCentral/66909/

    It's hard to give anything other than guesses without.

    Yip, sorry will do.

  • scott_lotus (10/13/2010)


    1) so you are saying the first column in the index is the important one ? (that makes sense).

    No, however the order of columns is critically important.

    http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/

    3) How important are "INCLUDED COLUMNS" are they the same as "GENERAL COLUMNS" ?

    Depends what you're trying to do. Include columns cannot be used for index seeks. They're good for creating covering indexes

    4) When i have many index , how does SQL decide what index to use first ? (i understand i can use index hits on a query)

    It uses the one that the optimiser judges cheapest. Combination of several things including size of index, usefulness for seek operations, whether it's selective or not, whether it's covering or not, etc.

    5) judging by the amount of reads i guess my usp's where table scanning ?

    Very likely.

    This series (3 parts) may be of interest - http://qa.sqlservercentral.com/articles/Indexing/68439/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • thank you again for the reply , very helpful stuff.

  • If you haven't looked at the execution plan, you really need to. When you don't know what's happening with a query, that should be your first stop.

    Also, be very, very careful about introducing hints into your queries. Most of the time, the optimizer is pretty accurate in what it does. Trying to take control from it can frequently lead to issues worse than what you had before.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Oh, and on that nolock....

    http://sqlblog.com/blogs/andrew_kelly/archive/2009/04/10/how-dirty-are-your-reads.aspx

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply