Identifying duplicate sets

  • Guys,

    I am dealing with following Duplicate sets (I have 40000 of them) for each value of BP we have different sets of seq, type. To distinguish the duplicate set I want to populate a column called subid with no of times the duplicate set occurs for each value of BP. Note that table has rows ordered by BP

    BP SEQ TYPE SUBID

    _____________________________

    100 1 1 1

    100 2 1 1

    100 3 3 1

    100 1 1 2

    100 2 1 2

    100 3 3 2

    --------- SET ONE ------------

    200 1 1 1

    200 2 3 1

    200 1 1 2

    200 2 3 2

    --------- SET TWO ------------

    any suggestions/inputs would help

    Thanks

  • This code might help you:

     

    -- Simulate data

    SET NOCOUNT ON

    CREATE TABLE #DUP (ID INT IDENTITY(1,1), BP INT, SEQ INT, TYPE INT)

    INSERT #DUP VALUES (100 ,1 ,1 )

    INSERT #DUP VALUES (100, 2, 1 )

    INSERT #DUP VALUES (100, 3, 3 )

    INSERT #DUP VALUES (100, 1, 1 )

    INSERT #DUP VALUES (100, 2, 1 )

    INSERT #DUP VALUES (100, 3, 3 )

    INSERT #DUP VALUES (200, 1, 1 )

    INSERT #DUP VALUES (200, 2, 3 )

    INSERT #DUP VALUES (200, 1, 1 )

    INSERT #DUP VALUES (200, 2, 3 )

    -- Query

    SELECT A.BP, A.SEQ, A.TYPE, COUNT(A.ID) AS SUBID

    FROM #DUP A INNER JOIN #DUP B ON A.BP = B.BP AND A.SEQ = B.SEQ AND A.TYPE = B.TYPE AND A.ID >= B.ID

    GROUP BY A.ID, A.BP, A.SEQ, A.TYPE

    ORDER BY A.ID, A.BP, A.SEQ, A.TYPE

     

    DROP TABLE #DUP


    Kindest Regards,

    Roi Assa

  • Thanks for the reply Roi, but I need an update statement to populate the column.

    Thanks

  • You can use the results as derived table / join table within update statement


    Kindest Regards,

    Roi Assa

  • to update the sbuid by the number of times bp, seq and type has occured in the table you may use the following statement.

    update #tmp

    set #tmp.sbuid = #tmp2.sbuid

    from #tmp

    inner join (select bp, seq, type, count(*) as sbuid

    from #tmp

    group by bp, seq, type) as #tmp2 on #tmp.bp = #tmp2.bp and #tmp.seq = #tmp2.seq and #tmp.type = #tmp2.type

    #tmp is assumed to be your table name, #tmp2 is an alias to the select statement derived from grouping your table #tmp

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply