Clustered / NonClustered

  • I was asked to tune a database that is suffering from poor performance and I found something that I'm unsure on how to treat.

    One of the main tables (13 million rows) is defined this way:

    CREATE TABLE Receivings (

    Receiving_Number int NOT NULL,

    Package_Number int NOT NULL,

    ...

    Job_Type char(4) NOT NULL,

    Job_Year int NOT NULL,

    Job_Number int NOT NULL,

    Job_Index char(1) NOT NULL,

    Job_Package int NOT NULL,

    ...

    (other 40 columns)

    ...

    PRIMARY KEY CLUSTERED (Receiving_Number, Package_Number)

    )

    Looking at both the app code and index usage, the primary key is almost never used to access the table.

    Instead, they're using the composite key (Job_Type, Job_Year, Job_Number, Job_Index, Job_Package).

    Index usage stats for 24 hrs of activity contain something like this:

    index_name user_seeks user_scans user_lookups

    ------------- ---------- ---------- ------------

    PK_Receivings 24493 457 239399

    IX_Job 1831779 171 0

    I would be tempted to change the primary key to a nonclustered unique index and create the clustered index on the job fields to avoid those lookups.

    Does it make sense to you?

    -- Gianluca Sartori

  • Personal preference, probably not. Looks a little too wide for my tastes for a clustered index.

    Check some of the queries doing the lookups, are they selecting majority of the columns of the table?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Based on the little information provided I can't see that changing the indexes would help. You really need to see what queries are performing badly and then ID why those are a problem. In modern databases 13 million rows isn't relly that big, and the index you mention is being used for seeks. I'd start to worry about it if there were a lot of scans.

    You also need to consider that changing the cluster may result in a lot of page splitting, where the current cluster may give you nice full pages.

    Also ID where the resource contention is, CPU, RAM, Disk IOs.

    Also are there regular index rebuilds and stats updates?

    If you give a bit more info you will probably get some more detailed responses.

    Leo

    Leo
    Nothing in life is ever so complicated that with a little work it can't be made more complicated.

  • GilaMonster (10/18/2010)


    Personal preference, probably not. Looks a little too wide for my tastes for a clustered index.

    Check some of the queries doing the lookups, are they selecting majority of the columns of the table?

    Thanks for the suggestion, Gail.

    I forgot to mention that the job fields are nearly as selective as the primary key (12'665'254 distinct values over 13'207'302 rows) and that alone made me think it could be a good clustered index.

    As per the clustering key width, it would be 17 bytes instead of 8, it's not that bad (is it?).

    I got through some of the stored procedures (I couldn't check all of them, they're literally hundreds) and generally speaking the queries on this table select 5 to 10 columns with a filter / join predicate based on the job fields.

    Also, the table is partitioned (6 partitions) on Receiving_Number, but just to avoid fragmentation. There are three special receivings (9999, 10000 and 20000) that get updated with new rows regularly, while all other receivings (with higher numbers) are inserted just once. To avoid fragmentation, the table is partitioned this way:

    Partition 1: receivings 0 to 9998

    Partition 2: receiving 9999

    Partition 3: receiving 10000

    Partition 4: receivings 10001 to 19999

    Partition 5: receiving 20000

    Partition 6: receivings 20001 +

    Indexes are rebuilt/defragmented every night.

    -- Gianluca Sartori

  • Leo.Miller (10/18/2010)


    In modern databases 13 million rows isn't relly that big, and the index you mention is being used for seeks. I'd start to worry about it if there were a lot of scans.

    Thanks for your answer, Leo.

    I agree with you that 13 million rows is not that big.

    I find in the index usage stats lots of seeks, some scans, but nothing really worth worrying about. What I find inappropriate is the number of lookups, that made me think I need to change the clustered index.

    Also ID where the resource contention is, CPU, RAM, Disk IOs.

    Also are there regular index rebuilds and stats updates?

    None of these resources is really critical. It just doesn't run as fast as the users would like to.

    Indexes and stats are rebuilt/updated regularly.

    Thanks for your help.

    -- Gianluca Sartori

  • 17 Bytes is not that bad as long as you don't have many other indexes.

    If most of the queries return the same columns you can also give a shot at including more columns in your index to cover most cases and reduce the lookups.

  • Oliiii (10/19/2010)


    ... you can also give a shot at including more columns in your index to cover most cases and reduce the lookups.

    This is a good point. I've tried to go that way, but it means analyzing the whole app code.

    Missing index suggestions don't help much, as I can't build an index with 40 included columns. 🙂

    -- Gianluca Sartori

  • Gianluca Sartori (10/19/2010)


    Oliiii (10/19/2010)


    Missing index suggestions don't help much, as I can't build an index with 40 included columns. 🙂

    You can. It may not be a good idea, but you can...

    17 bytes vs 8 = 9 extra bytes per nonclustered index per row. Bad? Only you can really tell that. Try it in a dev environment and see.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • As per the clustering key width, it would be 17 bytes instead of 8, it's not that bad (is it?).

    ..

    Also, the table is partitioned (6 partitions) on Receiving_Number, but just to avoid fragmentation. There are three special receivings (9999, 10000 and 20000) that get updated with new rows regularly, while all other receivings (with higher numbers) are inserted just once. To avoid fragmentation, the table is partitioned this way:

    Partition 1: receivings 0 to 9998

    Partition 2: receiving 9999

    Partition 3: receiving 10000

    Partition 4: receivings 10001 to 19999

    Partition 5: receiving 20000

    Partition 6: receivings 20001 +

    Indexes are rebuilt/defragmented every night.

    From what you've said here it looks like the design and eventual use of the table don't match. I suspect the original design didn't expected this concentrated use of the Receiving_Number.

    It would definitly be worth while testing to see if your idea works, but as Gail says you are adding a lot of extra index space here.

    I still have one concern and that is: you haven't posted any execution plans or IO stats etc. showing where the time is being spent in the queries. The ratio of lookups on PK_Receiving / Seeks on IX_Job is only 1/7 which doesn't look too bad. Are you sure that that much time is being spent doing lookups?

    Leo

    Leo
    Nothing in life is ever so complicated that with a little work it can't be made more complicated.

  • Leo.Miller (10/19/2010)


    I suspect the original design didn't expected this concentrated use of the Receiving_Number.

    Leo, the original design didn't expect lots of things. 🙁 This is one of the worst databases I've ever seen.

    Are you sure that that much time is being spent doing lookups?

    This is what I see from the exec plans on the heaviest and most executed queries on this table.

    I decided to include some columns to the Job index and things got better. The nonclustered seek/clustered lookup ratio got from 7.6 to 6.46.

    Obviously I can't include all the fields required by these queries.

    Thank you for your time and suggestions.

    -- Gianluca Sartori

  • Gianluca Sartori (10/20/2010)


    Are you sure that that much time is being spent doing lookups?

    This is what I see from the exec plans on the heaviest and most executed queries on this table.

    Any chance you can post a couple of those?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I have to admit that including more columns in the job index worked very well.

    Most of the queries I saw yesterday performing badly got really better today.

    An example of high lookup cost is in the attached plan. Unfortunately, this query is part of a big stored procedure and attaching the whole sqlplan would be a mess. I could only extract the picture for the single query plan. If you know how to extract a single query's plan from a procedure plan I'll be glad to post it.

    Anyway the picture should be enough to understand what I mean:

    Key Lookup plan

    As you can see, there's a nonclustered seek (cost: 1%) + clustered key lookup (cost: 70%).

    You will notice that table and column names are different from the ones I posted. I thought that translating those names in English would have been better for those trying to help...

    Queries like this apart, I think I can be satisfied of the results for the moment.

    I think I'll be monitoring performance in the next few days and, if users stop complaining, I won't tune it any further.

    -- Gianluca Sartori

  • Gianluca Sartori (10/20/2010)


    If you know how to extract a single query's plan from a procedure plan I'll be glad to post it.

    Hmmm... Let me investigate when I get home

    Anyway the picture should be enough to understand what I mean:

    Key Lookup plan

    What I want to see most is the properties of those two operators.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • OK, I managed to extract a single query plan from the procedure plan, tweaking the xml in a text editor.

    I can't believe there's no easier way to do it.

    Here's the plan.

    Ah, BTW, SSC attachments don't work. Grrrr!!!

    -- Gianluca Sartori

  • Will try and remember to check this evening.

    Testing attachments... looks like they're working now. Maybe intermittent glitch.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply