RE: 4 hour plus query – SQLServerCentral

Antares686

SSC Guru

Points: 125444

June 25, 2003 at 4:05 am

#462961

First off based on

quote:

UserDate (userId, latestDate)

should this line

quote:

join UserDate D on D.userId = A.userId

and D.date = A.date

actually be

join UserDate D on D.userId = A.userId

and D.latestDate = A.date

and if so then why not just do



SELECT 

	I.*

FROM 

	UserActivity A

INNER JOIN 

	UserImpression I 

ON 

	A.userId= I.userId AND 

	A.[date] = I.[date]

Even so based on this

quote:

UserImpression (userId, date, cola, colb, colc)

70 million rows may or may not fit in 2GB as I don't know what userid, date, cola, colb, colc are as far as width so I can't tell you if your running into any memory bottlenecks here.

But you come back and say

quote:

I would return the 70 million rows into a new table

To get maximum insert performance make sure all constraints, triggers and indexes are disabled or removed from this table then add or enable after insert is done.

Also I assume your indexes would be

UserDate (userId, latestDate) CLUSTERED

UserImpression (userId, date) CLUSTERED

UserActivity (userId, date) CLUSTERED

To maximize join performance, if you always use userid and latestdate/date to join these tables and work with them I would consider also making them

datefield, userid as the order of the columns in the index. It is best to have you most unique value first in a composite (multicolumn) index because only the first column is sampled to build the statistics value.

Another thing which I assume you have already done is the columns used in the join work fastest as long as they are the same datatype. Otherwise SQL will auto-type or you will need to perform an explicit CAST (which is better than SQL auto-typing) if they are not.

Other than that it is a matter of execution plan and tweaking to see what can be done.