QOD 11/21

  • Kelvin:

    quote:


    Having a non clustered index on the date column will NOT HELP with this query. SQL probably wouldn't even use the index if you created it!


    How can you say that? How do you know? Have you ever analyzed the internals of the Query Plan Optmizer? Have you ever heard about it's statistics creation? What about selectivity? What happens if you have 1.000.000 rows in your table and WHERE LAST_ORDER_DT > '12/1/2003' returns 3 of them? The selectivity would be so good that even a NON-Clustered index would speed up your query so much that you would not think twice about adding it.

    N.B It will use the NON-Clustered index if the Query Optimizer finds that it would be beneficial. Ok. Sometimes it can be wrong but those cases are rather rare. Exercise: Do your testcases based on the selectivity of your queris in Query Analyzer!

    Happy Hunting!

    /Hans

  • Having a non clustered index on the date column WILL HELP with this query and it would probably be used due to the Where clause.

  • quote:


    Solution IMHO is wrong becouse we loose unique constraint on customer ID.

    What do you think about it?


    Actually you can create a unique constraint with a Non-Clustered Index.

    I got this one wrong because I did not pay attention to the key bit of info

    quote:


    The only query that's executing (every 1 second) is performing poorly


    With that info we can assume that for best performance we need a clustered index for this table.

    Howerver I do question the sanity of the fuy who runs this every 1 sec.

  • lansley:

    What I wanted to say is that you cannot assume that the LAST_ORDER_DT column actually is going to contain the last date a customer made an order. If you ever worked with third-party software or developers with full DB access or even DBAs with less clues then they should have you know what I mean. The field might never be updated if you don't know the full story behind it. Would your opinion change if it was called ORDER_DT and WAS updated with the Latest Order DateTime? You have to analyze the situation more to be able to draw drastic conclusions, do not make the mistake of just judging the column based on it's name. To a relevant discussion, as Brian Knight pointed out

    quote:


    Regardless, unless you're periodically ordering on the primary key, which most don't in an identity type environment, the clustered index there doesn't do you a whole lot of good


    a clustered index on your IDENTITY column is abit wasted since you (probably) don't do so many range queries on it...

    /Hans

  • Oops! Kelvin said "non-clustered" index. You are right. A non-clustered index isn't always used, but this one may be selective enough to be used for a result set that is a small percentage of the table. For a larger percentage (greater than 5% to 10% or so) SQL Server will probably scan the clustered index, instead of seeking on a non-clustered index.

    Another point. Order date is not logically an attribute of the customer. This information should probably be obtained through a join on the order table.

    Larry Ansley

    Atlanta, GA


    Larry Ansley
    Atlanta, GA

  • quote:


    Howerver I do question the sanity of the fuy who runs this every 1 sec.


    Actually when he is able to query 1,000,000 rows every second (and get a result), I wouldn't consider this poor performance.

    But just to be sure he should hire an independent performance consultant.

    I like these questions!

    Frank

    http://www.insidesql.de

    http://www.familienzirkus.de

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Geez. I got it wrong. However, I feel justified because I was in the majority -- misery loves company and all that.

    The date column threw me as I would never (say again NEVER 😉 would have created a clustered index on what I thought was a column that might change.

    Good question! Good debate! Is there anything else (of course good beer comes to mind, but...)?

    Joe Johnson

    NETDIO,LLC.


    Joe Johnson
    NETDIO,LLC.

  • There are so many things wrong with the scenario that make this question and its "so-called" correct answer worthless. First off, who can issue a query against a 1 million row table every second and get results back in time to display them, index or not???

    Secondly, if you currently have a table that has a clustered index (primary key) on the customer id column, then it is safe to assume that you have 1 million unique customers in that table. So, if you have a last_order_dt column for a customer, then it's going to contain the last order date for that customer (regardless if it is updated or not ... also it has to have a value otherwise the clustered index would fail since it can't contain nulls).

    Now, your telling me that you want to change the clustered index (primary key) from customer_id to last_order_dt to improve performance??? That means that you now have to have 1 million unique last order dates in your table. So, that also means that only one customer could place an order from you per day without violating the constraints on the table. Are you kidding me?? Who in the hell does this?

    Come on people! Stop reading so much into the question, and provide answers that everyday DBAs have to deal with.

    Jason

    BTW - 1 million unique days from today back into history is 2739 years (with a little left over), which would cause an overflow based on SQL Server 2000's current datetime data type.

  • quote:


    What I wanted to say is that you cannot assume that the LAST_ORDER_DT column actually is going to contain the last date a customer made an order. If you ever worked with third-party software or developers with full DB access or even DBAs with less clues then they should have you know what I mean. The field might never be updated if you don't know the full story behind it. Would your opinion change if it was called ORDER_DT and WAS updated with the Latest Order DateTime? You have to analyze the situation more to be able to draw drastic conclusions, do not make the mistake of just judging the column based on it's name.


    OK. That fair. And you are right. Most, or all, customers in this hypothetical example could be one time only, so the date might not change anyway, despite the suggestive name. If this were the case, a clustered key on this column is a correct answer. So is a non-clustered key.

    In any case, I think this question successfully served as another learning opportunity about range queries best suited to clustered indexes.

    Larry Ansley

    Atlanta, GA


    Larry Ansley
    Atlanta, GA

  • quote:


    BTW - 1 million unique days from today back into history is 2739 years (with a little left over), which would cause an overflow based on SQL Server 2000's current datetime data type.


    What a shame, this could have been the next QOD

    Frank

    http://www.insidesql.de

    http://www.familienzirkus.de

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • quote:


    That means that you now have to have 1 million unique last order dates in your table.


    Since when did clustered indices have to be unique?

  • quote:


    Get real!

    You never (say again NEVER) want to create the clustered index on a volatile column. Every time the customer had a new order, SQL Server would have to move the record within the clustered index. Plus all non-clustered indexes would have to be updated for this change as well.

    Your system administrator would kick you a-- for burning up his hard drives. Your users would hang you for the lousy performance. After this, your boss would fire you. All this would happen before lunch.

    Larry Ansley

    Atlanta, GA


    He is right. Whoever made up this question was on crack. If you place a clustered index on a column that is going to change you are quite simply asking for trouble.

    Also whoever took it upon themselves to make the "correct" answer different from any answer on the email should apologize in the next email.

  • This question's wording was obviously quite misleading judging by the debate. I also followed the line of thought that LAST_ORDER_DT was a column that would be frequently updated making it a poor choice for a clustered index. I understand Hans's argument about assuming the function of the column, but we do all of our database design so in any of our systems, LAST_ORDER_DT would mean the last time that particular customer placed an order. Oh well, better luck next time!

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • ejdelaune:

    quote:


    So, that also means that only one customer could place an order from you per day without violating the constraints on the table. BTW - 1 million unique days from today back into history is 2739 years (with a little left over), which would cause an overflow based on SQL Server 2000's current datetime data type.


    Here you go assuming things 😉 Why does the LAST_ORDER_DT have to contain DATES WITHOUT TIME? Is there some physical law in your country that makes columns with the postfix _DT ignore the time part in a DateTime field?

    Regards, Hans!

    P.S Profanities excluded from Quote...

    Edited by - hanslindgren on 11/21/2003 08:00:50 AM

  • quote:


    He is right. Whoever made up this question was on crack. If you place a clustered index on a column that is going to change you are quite simply asking for trouble.

    Also whoever took it upon themselves to make the "correct" answer different from any answer on the email should apologize in the next email.


    I ensure you I wasn't doing lines of crack while I wrote this question I just didn't expect people to read as much into the question. For example, I didn't think people would make an assumption that the LAST_ORDER_DT meant that it's constantly updated. That's one reason we do change the question on the fly on the web page to clarify points or make things less vague. In the LAST_ORDER_DT column name instance, I just changed the column name to START_DT hoping to make it a little clearer that this column isn't on a regular basis. We don't apologize for this though since we have a warning in red on the newsletter and on the webpage to double-check it before clicking submit. The clarification always works in favor of the reader though.

    Brian Knight

    bknight@sqlservercentral.com

    http://qa.sqlservercentral.com/columnists/bknight

Viewing 15 posts - 16 through 30 (of 54 total)

You must be logged in to reply to this topic. Login to reply