Reduce Aggravating Aggregation

  • I've just read the article from my latest sqlservercentral newsletter called "Reduce Aggravating Aggregation: Improve the Performance of History or Status Tables" by Merrill Aldrich. I use the kind of historical status tables he mentions in the article, so was interested to read how I might improve on them. To demonstrate his method, he explains a setup containing orders, customers, status and statushistory tables. He explains that with this typical setup we usually want to just find the latest status from the statushistory table. He suggests that the way to do this is with the following query, which he explains is not the best performer due to two trips to the table and two joins:

    SELECT    o.orderID,

        o.customerID,

        o.orderDate,

        o.description,

        os.statusDate,

        os.status

    FROM orders o

    INNER JOIN (

        SELECT orderid, MAX(statusdate) maxdate FROM orderStatus GROUP BY orderid

    ) lastStatusDates ON o.orderID = lastStatusDates.orderid

    INNER JOIN orderStatus os

        ON o.orderID = os.orderID AND lastStatusDates.maxdate = os.statusDate

     
    My question is, why can't you simply use the following query?
     
    SELECT TOP 1 o.orderID,

        o.customerID,

        o.orderDate,

        o.description,

        os.statusDate,

        os.status

    FROM orders o

    INNER JOIN orderStatus os

        ON o.orderID = os.orderID

    ORDER BY os.statusDate DESC
     
    Is the "TOP 1" bit inefficient? Surely this would produce the result you want with only one trip to the table, one join and the added advantage that it is also easier to get the full history in a separate query, unlike when using his suggested schema?
     
  • HI Edmund,

    Your method does not guarantee that you will get the status with the maxdate.

    Thanks

    Chris

     

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • Hi Chris

    Can you explain? (sorry: newbie)

  • 🙂

     

    Lets say for example the data in our tables was stored in an order on disk that was not what we expected it to .

    for example the 1st records was the lowest date and teh 2nd record was the highest date.

    Then using the top 1 would return the lowest date and not the highest date.

    The order in which the data is stored on disk can be affect by the way they are inserted as well as by clustered indexes on the tables in questions.

     

    does this make sense?

    Or would you like an example?

     

    Thanks

    Chris

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • Hi Edmund,

     

    Sorry that explination was incorrect please ignore my last post!

     

    Thanks

    Chris

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • Are you saying that TOP will only return data based on the order in which it is stored on the disk and *not* based on the order in my ORDER BY statement? I'm scared if so. Or am I completely missing the point here?

  • Hi,

     

    Sorry please ignore my first explination as I didn't see your order by clause!

     

    I am also now asking myself why your way isn't the correct way to do it.

    I have run multiple tests and you statement always returns the correct result.

     

    Does anyone else have an anwser to Edmunds question?

     

     

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • Thanks Christopher. I can see that after a long time, this statushistory table might produce a large and ever increasing number of rows for a particular order. OK so with the order example this might be unlikely, since orders tend to have a finite history, but you might be using a statushistory table with an entity that can change over an indefinite time period, and therefore have a large number of historical statuses. In this case getting TOP 1 each time might start to get inefficient. That is why I'm asking the question, because I don't know if this is actually the case. Could using TOP 1 become less and less efficient, or will it win over the article's suggested method anyway?

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply