Ordering tables

  • Here's a question I've been mulling over in my head.  Let's say you have a table like an orders table.  It has customer IDs in it, order dates, and lots of details about the order.  And you have got lots and lots of them. 

    I could choose to put this table in order (I'm talking about physical order, primary key, clustered) by customerID.  If I did that, my queries to retrieve all of the orders for a customer would execute very fast.  But inserting new rows would result in the new rows needing to be inserted into the middle of the table.  I'm guessing this would cause a lot of fragmentation so maybe the retrieval would not be as fast as I think.

    I could choose to put this table in order by orderDate.  Then all newly inserted rows would go at the bottom for faster inserts, but retrieving all of a customer's orders, even with an index on customerID, could result in a table scan (since the customer's orders are spread throughout the table) which could be slow.

    I could choose an IDENTITY column for the primary key which, it seems to me, would respond similarly to putting it in order by orderDate.

    So, if I want to balance getting the data into the table quickly with getting it out of the table quickly (assuming that in general I am extracting for a single customerID), what do you folks think is the best order?  And, how does the lazy writer factor into all this?  Does the lazy writer mean that I don't need to worry too much about the speed of the inserts when I put it into order by customerID?

    Looking forward to hearing your thoughts.

    Mark

  • Primary key : OrderID

    Clustered index : OrderDate

    Indexed column : CustumerID

    The select will still be very fast because an index seek will be performed (followed by bookmark lookup which will cost very little.)

  • To describe it a little further, your statement "retrieving all of a customer's orders, even with an index on customerID, could result in a table scan " is not likely.  The query with a customer id will use the index to find the row ID (the data page and offset), and the data for the row would be retrieved directly. 

    Unless you have only 2-3 customers, or one customer comprises 60-80% of your orders, SQL will use the CustomerID index to avoid doing a table scan.  (Query Optimizer will ignore a index and do a table scan if it thinks from the table and index statistics that the random access through the table will be slower than just sequentially reading the entire table and throwing away unmatched rows: the proverbial table scan.)

    Hope this helps



    Mark

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply