How do I decide which column(s), if any, I should place my clustered index on?

  • I have seen a lot of discussion inadvertently pertaining to clustered indexes and which columns they should be on. Some people also claim that it may not even be needed. Can anyone direct me to an article or reference that discusses this? Of course, by default I am always placing the clustered index on an id field (either automatically generated or explicitly created). Thoughts?

    Thanks,

    Jared

    Jared
    CE - Microsoft

  • Creating any kind of index depends on your queries. Your CL index can be your PK or can be something else. It all depends on how you are using the table and on what column criteria you query the most.

    Just my .5 cents

    -Roy

  • Suggest you start your search by reading this:

    http://msdn.microsoft.com/en-us/library/ms190639(v=SQL.100).aspx

    and then read: (Written by Brad McGehee)

    http://www.sql-server-performance.com/2007/clustered-indexes/

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • And maybe this: http://qa.sqlservercentral.com/articles/Indexing/68563/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks everyone! Its funny how I have missed some of the real fundamentals on my journey to DBA... I guess no time is better than the present to catch up!

    Jared

    Jared
    CE - Microsoft

  • There's an article on MSDN on that: http://msdn.microsoft.com/en-us/library/ms190639.aspx

    It also contains links to a number of related articles.

    There's some data on heaps here: http://msdn.microsoft.com/en-us/library/ms188270.aspx

    Since heaps are tables without a clustered index, that should get you started on answering that question.

    There's some slightly out-of-date information on performance with heaps here: http://msdn.microsoft.com/en-us/library/aa964133(v=SQL.90).aspx

    It's written for SQL 2005, but it looks applicable to 2008 as well.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Aside from the original question... In a search to understand the way that queries work against our data I pose the following:

    If rows in the table are stored logically in the order of the clustering key, why is it that you are not "guaranteed" an order when none is specified in the SELECT statement?

    My assumption is that although the data is stored in this order, the returned results do not necessarily reflect the order in which the data is stored. The results can be returned however the engine deems the fastest way to retrieve them. i.e. Most times data is returned in the order of the clustered index key when no ORDER BY is specified, but this is certainly not guaranteed.

    Is the above a correct statement? I am trying to convince our programmers that the guarantee of the order of results is not determined by clustered index when no ORDER BY is specified. Although it is documented, they are fighting that it is true or has always existed. In which case, can anyone provide a technical example why the engine would choose to return the results out of the order of the clustered index when no ORDER BY is present. I would like to put the nails in this coffin with an indisputable example.

    Thanks,

    Jared

    Jared
    CE - Microsoft

  • jared-709193 (10/24/2011)


    If rows in the table are stored logically in the order of the clustering key, why is it that you are not "guaranteed" an order when none is specified in the SELECT statement?

    Because without an Order By the optimiser is under no obligation to do an ordered read of the index or to use order-preserving operators in the plan. If it calculates that a particular operator is best for this plan, and it does not preserve order, then the optimiser will use it if there's no Order By.

    From a senior member of the QO team: http://blogs.msdn.com/b/conor_cunningham_msft/archive/2008/08/27/no-seatbelt-expecting-order-without-order-by.aspx

    Do your devs want to argue with someone who helped write the SQL engine?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Do your devs want to argue with someone who helped write the SQL engine?

    They want to argue with anyone who blames their original code. They are caching a result set in the order in which it is retrieved. This is a SELECT ... from a view in which they used a TOP and ORDER BY. They believe that since "this has worked with no issues for 3 years" that WE changed something. The fact that they HAD to use a TOP clause in order to even get the view to be created should have been their first clue that something was not right (this was before the company had a DBA team and it was strictly developers creating the databases). I told them that just because you flip a penny 1000 times and it lands on heads every time does not mean that the 1001st time it will not land on tails. They don't like my analogies 🙂 Thanks for the article, I will use that if needed!

    Jared

    Jared
    CE - Microsoft

  • In 2005, the TOP (100) PERCENT ... ORDER BY is completely and totally ignored. The optimiser knows that the top won't limit rows and it will no honour any order by that's not in the outer-most query.

    I'm sure this is documented, somewhere.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (10/24/2011)


    In 2005, the TOP (100) PERCENT ... ORDER BY is completely and totally ignored. The optimiser knows that the top won't limit rows and it will no honour any order by that's not in the outer-most query.

    I'm sure this is documented, somewhere.

    They didn't even try that! They did SELECT TOP 10000 ... ORDER BY knowing that the data would never exceed 5000 rows. It was a work around to use the order by, but since it is a view and they have to select from that view, it really makes no difference. Silly devs...

    Jared

    Jared
    CE - Microsoft

  • jared-709193 (10/24/2011)


    This is a SELECT ... from a view in which they used a TOP and ORDER BY.

    are you saying the view definition is created upon a select statement that uses TOP and ORDER BY?

    jared-709193 (10/24/2011)


    They believe that since "this has worked with no issues for 3 years" that WE changed something.

    a sql server upgrade maybe?

    One documented example here

    This from Microsoft

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry Whittle (10/24/2011)


    jared-709193 (10/24/2011)


    This is a SELECT ... from a view in which they used a TOP and ORDER BY.

    are you saying the view definition is created upon a select statement that uses TOP and ORDER BY?

    jared-709193 (10/24/2011)


    They believe that since "this has worked with no issues for 3 years" that WE changed something.

    a sql server upgrade maybe?

    One documented example here

    Yes, their view definition included the TOP and ORDER BY, but since they were always grabbing all of the data, it ordered the entire set. They did this because in the view definition you cannot have an ORDER BY without a TOP clause. The problem was that they assumed that selecting from this would always return in the order of the view definition. However, it is clear to me now that this is not the case.

    In fact the data and view are replicated to another server. The same query returned a different order from each server when the query was run. I will have to look into any version differences. However, I told them that they must add the ORDER BY to their query, not have it in the view definition. We have since created a stored proc for them to run instead of their hard-coded query. The ignorant still blame us for the different results instead of their poor understanding of views and how SQL Server decides how to return results.

    Jared

    Jared
    CE - Microsoft

  • I've run into this before.

    Explaining that Top and Order By in views doesn't order the data, it just restricts which rows are returned, has clarified it for a large number of people.

    Relying on the "order" in the view, or in a table, is relying on luck. They've just been lucky so far, that's all it is.

    So, explain to them that, if they want to continue to rely on luck, they need to stop bathing and changing their clothing, just like some baseball players. Oh, and they also have to eat the exact same meals every day for the rest of their lives, and never, ever, never spend any of the money in their pockets. Avoiding ladders is good, and salt and mirrors are both potentially problematic, so they should avoid them too.

    But the clothing and bathing thing is the key to successful coding. Lucky in datasets, unlucky in love, you know! 😛

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • jared-709193 (10/24/2011)


    Yes, their view definition included the TOP and ORDER BY

    don't know how many times it's been said but "Don't order by in a view" 🙂

    A view is essentially a table, all ordering should be done outside the view

    This is straight from Microsoft

    http://support.microsoft.com/kb/926292

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply