Distinct Joined Insert

  • I am using an inner join between two source tables (Album and Tracks) to insert records into a destination table (NewTracks). The statement goes something like this:

    INSERT NewTracks

    (AlbumID, TrackTitle, TrackPosition, TrackDuration)

    SELECT

    distinct Album.ProductID, Tracks.Title, Tracks.Position, Tracks.duration

    FROM

    Album

    INNER JOIN

    Tracks on Album.ProductID = Tracks.ProductID

    This kind of works, except that in some cases there is more than one row in the Album table per album. For example, for a given album, the Album table would have a separate row for the main artist, the producer and the remixer, but each of these rows would have the same Album.ProductID. So in this case, the Insert command ends up having multiple sets of tracks in the NewTracks table, one for each of these album contributors.

    Is there a way of doing a kind of 'distinct join' so that the join inserts just one set of tracks per album, regardless of how many rows there are in the Album table?

  • thomas.lemesurier (10/13/2008)


    This kind of works, except that in some cases there is more than one row in the Album table per album. For example, for a given album, the Album table would have a separate row for the main artist, the producer and the remixer, but each of these rows would have the same Album.ProductID.

    That sounds like a flawed database design. Without seeing the structure, I can't say for sure, but is sounds like that needs to be normalised a bit. From the sound of things, it's a completely unnormalised design, which leads to problems like this.

    You can work around it like this, but it would definitely be better if you could fix the design, as problems like this will keep cropping up.

    SELECT UniqueProducts.ProductID , Tracks.Title, Tracks.Position, Tracks.duration FROM

    (SELECT distinct ProductID FROM Album) UniqueProducts

    INNER JOIN

    Tracks on UniqueProducts.ProductID = Tracks.ProductID

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • INSERT NewTracks

    (

    AlbumID,

    TrackTitle,

    TrackPosition,

    TrackDuration

    )

    SELECT ProductID,

    Title,

    Position,

    Duration

    FROM (

    SELECT Album.ProductID,

    Tracks.Title,

    Tracks.Position,

    Tracks.duration,

    ROW_NUMBER() OVER (PARTITION BY Album.ProductID ORDER BY ...) AS RecID

    FROM Album

    INNER JOIN Tracks ON Tracks.ProductID = Album.ProductID

    ) AS d

    WHERE RecID = 1


    N 56°04'39.16"
    E 12°55'05.25"

  • Thanks Gail. I know what you're saying about data normalisation. The issue here is that these are raw data dumps (from another database with a completely different structure) that are being imported from huge CSV files. A SQL DTS package is bringing the data into intermediate tables and then I'm running scripts to get the data from there into the destination tables. It's a one-off import and so I kind of need to make the data fit this one time.

    Thanks for your suggestion, looks like it'll do the trick. Thanks,

    Tom

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply