First off based on
quote:
UserDate (userId, latestDate)
should this line
quote:
join UserDate D on D.userId = A.userIdand D.date = A.date
actually be
join UserDate D on D.userId = A.userId
and D.latestDate = A.date
and if so then why not just do
SELECT
I.*
FROM
UserActivity A
INNER JOIN
UserImpression I
ON
A.userId= I.userId AND
A.[date] = I.[date]
Even so based on this
quote:
UserImpression (userId, date, cola, colb, colc)
70 million rows may or may not fit in 2GB as I don't know what userid, date, cola, colb, colc are as far as width so I can't tell you if your running into any memory bottlenecks here.
But you come back and say
quote:
I would return the 70 million rows into a new table
To get maximum insert performance make sure all constraints, triggers and indexes are disabled or removed from this table then add or enable after insert is done.
Also I assume your indexes would be
UserDate (userId, latestDate) CLUSTERED
UserImpression (userId, date) CLUSTERED
UserActivity (userId, date) CLUSTERED
To maximize join performance, if you always use userid and latestdate/date to join these tables and work with them I would consider also making them
datefield, userid as the order of the columns in the index. It is best to have you most unique value first in a composite (multicolumn) index because only the first column is sampled to build the statistics value.
Another thing which I assume you have already done is the columns used in the join work fastest as long as they are the same datatype. Otherwise SQL will auto-type or you will need to perform an explicit CAST (which is better than SQL auto-typing) if they are not.
Other than that it is a matter of execution plan and tweaking to see what can be done.