• First off based on

    quote:


    UserDate (userId, latestDate)


    should this line

    quote:


    join UserDate D on D.userId = A.userId

    and D.date = A.date


    actually be

    join UserDate D on D.userId = A.userId

    and D.latestDate = A.date

    and if so then why not just do

    
    
    SELECT
    I.*
    FROM
    UserActivity A
    INNER JOIN
    UserImpression I
    ON
    A.userId= I.userId AND
    A.[date] = I.[date]

    Even so based on this

    quote:


    UserImpression (userId, date, cola, colb, colc)


    70 million rows may or may not fit in 2GB as I don't know what userid, date, cola, colb, colc are as far as width so I can't tell you if your running into any memory bottlenecks here.

    But you come back and say

    quote:


    I would return the 70 million rows into a new table


    To get maximum insert performance make sure all constraints, triggers and indexes are disabled or removed from this table then add or enable after insert is done.

    Also I assume your indexes would be

    UserDate (userId, latestDate) CLUSTERED

    UserImpression (userId, date) CLUSTERED

    UserActivity (userId, date) CLUSTERED

    To maximize join performance, if you always use userid and latestdate/date to join these tables and work with them I would consider also making them

    datefield, userid as the order of the columns in the index. It is best to have you most unique value first in a composite (multicolumn) index because only the first column is sampled to build the statistics value.

    Another thing which I assume you have already done is the columns used in the join work fastest as long as they are the same datatype. Otherwise SQL will auto-type or you will need to perform an explicit CAST (which is better than SQL auto-typing) if they are not.

    Other than that it is a matter of execution plan and tweaking to see what can be done.