Copying only rows with unique values from one table to another

  • Hi all,

    I am trying to copy rows from table Weekly into table Current. Table Weekly has 600 columns with a unique id which contains 50 duplicates. I therefore only want 550 rows to be copied into table Current which has the same structure.

    The query I have composed is as follows:-

    INSERT INTO current                                                                                          SELECT DISTINCT [Unique ID] AS UniqueID, Country AS Country, [Company Name] AS CompanyName, [Deal Name] AS DealName, [Line Amount] AS LineAmount, [SR Name] AS SalesRep, [Fiscal Quarter] AS FiscalQuarter, [WW Sales Stage] AS WWSalesStage

    FROM Weekly

    WHERE Country<>"Total";

    All the rows are being copied into Current. Where am I going wrong?

     

     

  • This should work with an aggregate query:

    INSERT INTO current                                                                                          SELECT [Unique ID] AS UniqueID, Country AS Country, [Company Name] AS CompanyName, [Deal Name] AS DealName, [Line Amount] AS LineAmount, [SR Name] AS SalesRep, [Fiscal Quarter] AS FiscalQuarter, [WW Sales Stage] AS WWSalesStage

    FROM Weekly

    WHERE Country<>"Total"

    Group by [Unique ID], Country,[Company Name], [Deal Name],[Line Amount],[SR Name], [Fiscal Quarter], [WW Sales Stage]

     

     

    Aunt Kathi Data Platform MVP
    Author of Expert T-SQL Window Functions
    Simple-Talk Editor

  • Surely though if it truely is a row Unique ID (I am assuming here that this is your PK) you will never able to SELECT DISTINCT since the Unique ID will always be different, this will still be the case with the aggregate function since gouping by a true Unique ID will leave you with 1 entry for each group.  If this is the case and you did not require the original PK from Weekly in the current table the first statement would be fine if you ommitted the [Unique ID column], as long as you had a PK in the current table which allowed autogeneration of values.

  • Thanks very much for replies.

    The Weekly table does not have a PK as it is a table linked to an Excel spreadsheet which is to be copied on a weekly basis which is why there are duplicates. There is a monetary column for each row in both tables - the aim is to sum the values of this column for each "Unique ID" from the Weekly table and then transfer just one occurrence with the aggregated value to the Current table which does have a PK. 

    Am I also right in thinking that in Access (unlike in Enterprise Manager) you can only include one SQL statement per query. It would be nice to have several queries execute at one time (to make life easier for the user).

  • >> ... from the Weekly table and then transfer just one occurrence with the aggregated value to the Current table which does have a PK. <<

    What is the PK in there? can you post the structure of your Weekly And Current Table?

    >>Am I also right in thinking that in Access (unlike in Enterprise Manager) you can only include one SQL statement per query.<<

     

    Yes, you are correct! Access can nest the queries for you to implement the "subquery" feature. You do get things built in like CrossTab that SQL Server won't be implementing until SQL2005 goes Live (with PIVOT and UNPIVOT). You get vb Function Flexibility which you won't will be able to use on SQL until 2005 comes out with CLR integration, etc. I am not saying that Access is superior I am just impliying that you have some good and some bad things and you have to weight the features you use in each one of them.

     

    hth

     

     

     


    * Noel

  • Louise,

     

    I think Kathi was on the right track by using an aggregate query. You do have to keep in mind what is that is duplicated in your Excel table and what it is that you would like to add up.

     

    It looks like you need to have the sum of the "Line Amount" column for each "Unique ID". If this is the case, then you only want to group by the UniqueID and not all of the other columns. For example, if you group by all the columns in the table, you still have your 50 duplicates included as, I think James was eluding to. So, if you have data that looks like this:

     

    UniqueID        Country           LineAmount

    1                      USA                100

    1                      France            200

    2                      USA                100

    2                      USA                300

     

    Then when you group by UniqueID and Country and Sum the LineAmount, you get this result:

     

    UniqueID        Country           LineAmount

    1                      USA                100

    1                      France            200

    2                      USA                400

     

    If what you want is to know that UniqueID 1 has a total of 300 and 2 has a total of 400, then you have to Group By just the UniqueID.

     

    So, with your data, that would look like this:

     

    SELECT [Unique ID], SUM([Line Amount]) AS TotalForUniqueID

                              FROM Weekly

                              WHERE Country <> 'Total'

                            GROUP BY [Unique ID]

     

    Now, if when you insert, you want include all the other columns, you have to join the above query with the query of all your other data, like so:

     

    INSERT INTO current

    SELECT W.[Unique ID], Country, [Company Name], [Deal Name], WSums.TotalForUniqueID, [SR Name], [Fiscal Quarter], [WW Sales Stage]

      FROM Weekly W

      JOIN (

                            SELECT [Unique ID], SUM([Line Amount]) AS TotalForUniqueID

                              FROM Weekly

                              WHERE Country <> 'Total'

                            GROUP BY [Unique ID]

                            ) WSums

                ON W.[Unique ID] = WSums.[Unique ID]

     

    You really have to know how many of the columns are involved in the duplication in order to know what to group by to get the correct sum. If the only thing that changes with each duplicate line is the LineAmount, then you can use a query like Kathi’s and just add SUM([Line Amount]) , otherwise, you may have to post which columns are involved in the duplication or more info about what your data looks like.

     

  • I may have misunderstood the question, but I think she is saying that completely duplicate rows were accidently added. If that is the case, my solution would work.

    Aunt Kathi Data Platform MVP
    Author of Expert T-SQL Window Functions
    Simple-Talk Editor

  • Thanks so much for the great response. Our broadband has been down (good old BT) but we're back up now.

    Query runs OK but I am still getting duplicates in the output table. I have isolated the ones where duplicates occur as follows:-

    Unique ID      Line Amount

    10                 200

    10                 400

    11                 300

    12                 200

    12                 200

    12                 400

    By studying the duplicate rows I have narrowed it down to only Unique ID 12 being duplicated because the line amount is the same in 2 of the rows.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply