Clustered vs Non-clustered indexes

  • Let's suppose we have the following tables:

    CREATE TABLE Customers (

     CustomerID int IDENTITY PRIMARY KEY,

     CustomerName varchar(255) NOT NULL UNIQUE

    )

    CREATE TABLE Orders (

     OrderID int IDENTITY PRIMARY KEY,

     OrderDate smalldatetime NOT NULL,

     OrderNumber varchar(10) NOT NULL UNIQUE,

     CustomerID int NOT NULL REFERENCES Customers (CustomerID)

    )

    CREATE TABLE Products (

     ProductID int IDENTITY PRIMARY KEY,

     ProductName varchar(50) NOT NULL UNIQUE

    CREATE TABLE OrderDetails (

     OrderDetailID int IDENTITY PRIMARY KEY,

     OrderID int NOT NULL REFERENCES Orders (OrderID),

     ProductID int NOT NULL REFERENCES Products (ProductID),

     Quantity numeric(18,3) NOT NULL,

     UNIQUE (OrderID, ProductID)

    )

    Which indexing strategy would better for the Orders table:

    a) make OrderID clustered and OrderDate, OrderNumber and CustomerID non-clustered,

    or

    b) make OrderDate clustered and OrderID, OrderNumber and CustomerID non-clustered,

    or

    c) other strategies ?

    As you may guess, the Orders table will be frequently used in joins with the other tables: sometimes Orders and Customers, sometimes Orders and OrderDetails, sometimes all 4 tables. Many queries (but not all) will be for a date range of one month (i.e. WHERE OrderDate BETWEEN '20040101' AND '20040131').

    Please take into consideration:

    1) the efficiency of clustered vs non-clustered indexes for the types of queries that will be used;

    2) the fact that OrderID is an identity column (therefore it will have unique values, the values will always be greater than the existing values and the values cannot be modified)

    3) the fact that OrderDate is not unique (and because of this, it will have a lower selectivity than OrderID)

    4) any other aspects that you consider to be relevant.

    After giving a response to this scenario, please consider another scenario: the same tables using uniqueidentifier primary keys (instead of int identity). The same question applies: which index should be clustered?

    Razvan Socol

  • This is a hard one to call.

    If I read your post correctly, you are saying that the bulk of the queries you are running are for date ranges. This being the case, I would definitely put a clustered index on the date field rather than the unique ID. Books on line substantiates this; quote 'if an application frequently executes a query to retrieve records between a range of dates, a clustered index can quickly locate the row containing the beginning date, and then retrieve all adjacent rows in the table until the last date is reached. This can help increase the performance of this type of query'.

    However, this will impact upon joins to other tables when updating, inserting and deleting. How much so? I would not know until I tested it.

    A possibility that may not have occurred to you is this.

    My scenario;

    I have a database that records all hits to external web sites per each user in my organisation. I want to bring back a report of the number of hits grouped by url for each whole month. I get something like 30,000 lines per day (10m per year).

    I found that whatever I tried, the query was very, very slow.

    So! I put a trigger on the database to insert year and month (e.g 200401 fro jan 2004) into a column on the table. After indexing this column, the query ran in seconds.

    I think the best thing you can do is try out some scenarios on a test system with a good chunk of data and watch the execution plan to make sure it is following the routes you are intending it to take.

    In regards to your second question, I would always stick to an identity column as the primary key on a table unless you have a particular reason not to (Perhaps if you wanted the primary key to be built up from a formulae rather than incremental value).

    As they say 'If it ain't broke, don't fix it'.


    ------------------------------
    The Users are always right - when I'm not wrong!

  • This is a hard one to call.

    If I read your post correctly, you are saying that the bulk of the queries you are running are for date ranges. This being the case, I would definitely put a clustered index on the date field rather than the unique ID. Books on line substantiates this; quote 'if an application frequently executes a query to retrieve records between a range of dates, a clustered index can quickly locate the row containing the beginning date, and then retrieve all adjacent rows in the table until the last date is reached. This can help increase the performance of this type of query'.

    However, this will impact upon joins to other tables when updating, inserting and deleting. How much so? I would not know until I tested it.

    A possibility that may not have occurred to you is this.

    My scenario;

    I have a database that records all hits to external web sites per each user in my organisation. I want to bring back a report of the number of hits grouped by url for each whole month. I get something like 30,000 lines per day (10m per year).

    I found that whatever I tried, the query was very, very slow.

    So! I put a trigger on the database to insert year and month (e.g 200401 fro jan 2004) into a column on the table. After indexing this column, the query ran in seconds.

    I think the best thing you can do is try out some scenarios on a test system with a good chunk of data and watch the execution plan to make sure it is following the routes you are intending it to take.

    In regards to your second question, I would always stick to an identity column as the primary key on a table unless you have a particular reason not to (Perhaps if you wanted the primary key to be built up from a formulae rather than incremental value).

    As they say 'If it ain't broke, don't fix it'.


    ------------------------------
    The Users are always right - when I'm not wrong!

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply