Reduce Aggravating Aggregation

Question

Post reply

Reduce Aggravating Aggregation

Edmund Ward

SSChasing Mays

Points: 620
More actions
September 13, 2007 at 2:15 am

#180833

I've just read the article from my latest sqlservercentral newsletter called "Reduce Aggravating Aggregation: Improve the Performance of History or Status Tables" by Merrill Aldrich. I use the kind of historical status tables he mentions in the article, so was interested to read how I might improve on them. To demonstrate his method, he explains a setup containing orders, customers, status and statushistory tables. He explains that with this typical setup we usually want to just find the latest status from the statushistory table. He suggests that the way to do this is with the following query, which he explains is not the best performer due to two trips to the table and two joins:

SELECT    o.orderID,

    o.customerID,

    o.orderDate,

    o.description,

    os.statusDate,

    os.status

FROM orders o

INNER JOIN (

    SELECT orderid, MAX(statusdate) maxdate FROM orderStatus GROUP BY orderid

) lastStatusDates ON o.orderID = lastStatusDates.orderid

INNER JOIN orderStatus os

    ON o.orderID = os.orderID AND lastStatusDates.maxdate = os.statusDate

My question is, why can't you simply use the following query?

SELECT TOP 1 o.orderID,

    o.customerID,

    o.orderDate,

    o.description,

    os.statusDate,

    os.status

FROM orders o

INNER JOIN orderStatus os

    ON o.orderID = os.orderID

ORDER BY os.statusDate DESC

Is the "TOP 1" bit inefficient? Surely this would produce the result you want with only one trip to the table, one join and the added advantage that it is also easier to get the full history in a separate query, unlike when using his suggested schema?

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Christopher Stobbs SSC-Insane Points: 21098 More actions · Answer 1

Christopher Stobbs

SSC-Insane

Points: 21098

September 13, 2007 at 4:32 am

#733414

HI Edmund,

Your method does not guarantee that you will get the status with the maxdate.

Thanks

Chris

----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley

:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]

SQL-4-Life

Edmund Ward SSChasing Mays Points: 620 More actions · Answer 2

Edmund Ward

SSChasing Mays

Points: 620

September 13, 2007 at 4:38 am

#733417

Hi Chris

Can you explain? (sorry: newbie)

Christopher Stobbs SSC-Insane Points: 21098 More actions · Answer 3

🙂

Lets say for example the data in our tables was stored in an order on disk that was not what we expected it to .

for example the 1st records was the lowest date and teh 2nd record was the highest date.

Then using the top 1 would return the lowest date and not the highest date.

The order in which the data is stored on disk can be affect by the way they are inserted as well as by clustered indexes on the tables in questions.

does this make sense?

Or would you like an example?

Thanks

Chris

----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley

:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]

SQL-4-Life

Christopher Stobbs SSC-Insane Points: 21098 More actions · Answer 4

Christopher Stobbs

SSC-Insane

Points: 21098

September 13, 2007 at 4:49 am

#733423

Hi Edmund,

Sorry that explination was incorrect please ignore my last post!

Thanks

Chris

----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley

:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]

SQL-4-Life

Edmund Ward SSChasing Mays Points: 620 More actions · Answer 5

Edmund Ward

SSChasing Mays

Points: 620

September 13, 2007 at 4:49 am

#733422

Are you saying that TOP will only return data based on the order in which it is stored on the disk and *not* based on the order in my ORDER BY statement? I'm scared if so. Or am I completely missing the point here?

Christopher Stobbs SSC-Insane Points: 21098 More actions · Answer 6

Hi,

Sorry please ignore my first explination as I didn't see your order by clause!

I am also now asking myself why your way isn't the correct way to do it.

I have run multiple tests and you statement always returns the correct result.

Does anyone else have an anwser to Edmunds question?

----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley

:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]

SQL-4-Life

Edmund Ward SSChasing Mays Points: 620 More actions · Answer 7

Thanks Christopher. I can see that after a long time, this statushistory table might produce a large and ever increasing number of rows for a particular order. OK so with the order example this might be unlikely, since orders tend to have a finite history, but you might be using a statushistory table with an entity that can change over an indefinite time period, and therefore have a large number of historical statuses. In this case getting TOP 1 each time might start to get inefficient. That is why I'm asking the question, because I don't know if this is actually the case. Could using TOP 1 become less and less efficient, or will it win over the article's suggested method anyway?