Deduplication Help

Question

Post reply

Deduplication Help

MostInterestingMan

SSCertifiable

Points: 6291
More actions
January 30, 2012 at 10:00 am

#252124

Hi,

My goal is to dedupe (with a twist) the sample table below with the following criteria. For each duplicate record I want to:

1) If [item_ordered?] column is 'Yes' for all the deplicate records keep any one of the records - doesn't matter which record.

2) If [item_ordered?] column is 'No' for all the duplicate records keep any one of the records - doesn't matter with record.

3) If [item_ordered?] column is 'Yes' for some duplicate records and 'No' for others, keep one of the 'Yes' records - doesn't matter which record.

/* sample dataset */

CREATE TABLE #TestTable (id INT, order_status VARCHAR(10), [item_ordered?] VARCHAR(3))

INSERT #TestTable (id, order_status, [item_ordered?])

SELECT 100, 'Ordered', 'Yes' UNION ALL

SELECT 100, 'Staged', 'Yes' UNION ALL

SELECT 100, 'Shipped', 'Yes' UNION ALL

SELECT 200, 'Hold', 'No' UNION ALL

SELECT 200, 'Cancelled', 'No' UNION ALL

SELECT 200, 'Hold', 'No' UNION ALL

SELECT 200, 'Returned', 'No' UNION ALL

SELECT 300, 'Ordered', 'Yes' UNION ALL

SELECT 300, 'Ordered', 'Yes' UNION ALL

SELECT 300, 'Staged', 'Yes' UNION ALL

SELECT 300, 'Shipped', 'Yes' UNION ALL

SELECT 300, 'Shipped', 'Yes' UNION ALL

SELECT 400, 'Cancelled', 'No' UNION ALL

SELECT 400, 'Review', 'No' UNION ALL

SELECT 400, 'Hold', 'No' UNION ALL

SELECT 400, 'Returned', 'No' UNION ALL

SELECT 500, 'Ordered', 'Yes' UNION ALL

SELECT 500, 'Pending', 'No' UNION ALL

SELECT 500, 'Staged', 'Yes' UNION ALL

SELECT 500, 'Shipped', 'Yes' UNION ALL

SELECT 600, 'Hold', 'No' UNION ALL

SELECT 600, 'Review', 'No' UNION ALL

SELECT 600, 'Pending', 'Yes' UNION ALL

SELECT 700, 'Ordered', 'Yes' UNION ALL

SELECT 700, 'Hold', 'No' UNION ALL

SELECT 700, 'Cancelled', 'No' UNION ALL

SELECT 700, 'Returned', 'No' UNION ALL

SELECT 800, 'Hold', 'No' UNION ALL

SELECT 800, 'Returned', 'No' UNION ALL

SELECT 900, 'Staged', 'Yes' UNION ALL

SELECT 900, 'Shipped', 'Yes'

GO

Thanks for your help.

MCTS: BI 2008, MCITP: BI 2008
Stay Thirsty My Friends

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Mark Cowne One Orange Chip Points: 26865 More actions · Answer 1



WITH CTE AS (

 SELECT *,

        ROW_NUMBER() OVER(PARTITION BY id, order_status

                          ORDER BY CASE WHEN [item_ordered?]='Yes' THEN 0 ELSE 1 END) AS rn

FROM #TestTable)

DELETE FROM CTE

WHERE rn>1;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

spaghettidba SSC Guru Points: 105712 More actions · Answer 2

First of all you have to define what "duplicate" means to you. Duplicate records for the same Id? Duplicate records for the same id+status?

If the former, this should do:



SELECT *

FROM (

	SELECT *, RN = ROW_NUMBER() OVER (PARTITION BY id ORDER BY [item_ordered?] DESC, id ASC)

	FROM #TestTable

) AS src

WHERE RN = 1

-- Gianluca Sartori

MostInterestingMan SSCertifiable Points: 6291 More actions · Answer 3

Gianluca Sartori (1/30/2012)

First of all you have to define what "duplicate" means to you. Duplicate records for the same Id? Duplicate records for the same id+status?

If the former, this should do:

SELECT * FROM ( SELECT *, RN = ROW_NUMBER() OVER (PARTITION BY id ORDER BY [item_ordered?] DESC, id ASC) FROM #TestTable ) AS src WHERE RN = 1

Sorry I wasn't clear - Deplicate for the same ID not ID+status.

MCTS: BI 2008, MCITP: BI 2008
Stay Thirsty My Friends

spaghettidba SSC Guru Points: 105712 More actions · Answer 4

spaghettidba

SSC Guru

Points: 105712

January 30, 2012 at 11:45 am

#1440178

I think you should be ok, then.

-- Gianluca Sartori

MostInterestingMan SSCertifiable Points: 6291 More actions · Answer 5

MostInterestingMan

SSCertifiable

Points: 6291

January 30, 2012 at 2:38 pm

#1440264

This is exactly the solution I was looking for..works like a charm. Thanks for all your help.

MCTS: BI 2008, MCITP: BI 2008
Stay Thirsty My Friends

Siva Ramasamy Hall of Fame Points: 3866 More actions · Answer 6

Siva Ramasamy

Hall of Fame

Points: 3866

January 30, 2012 at 2:55 pm

#1440269

Hi,

I was just following this thread and understood the requirement and I also executed your solution and found that it satisfies the requirement...but not able to understand how you achieved it.

Can you please explain..?

Thanks!

Siva.

spaghettidba SSC Guru Points: 105712 More actions · Answer 7

ROW_NUMBER() OVER (PARTITION BY id ORDER BY [item_ordered?] DESC, id ASC)

means "give each row in the table a row number (rank) sorting rows by [item_ordered?] in reverse order (so that "yes" ranks higher than "no") and then by id, partitioning by id (means that the rank will be "reset" for each different id)".

Applied to the test data, it gives this output:



id          order_status item_ordered? RN

----------- ------------ ------------- --------------------

100         Ordered      Yes           1

100         Staged       Yes           2

100         Shipped      Yes           3

200         Hold         No            1

200         Cancelled    No            2

200         Hold         No            3

200         Returned     No            4

300         Ordered      Yes           1

300         Ordered      Yes           2

300         Staged       Yes           3

300         Shipped      Yes           4

300         Shipped      Yes           5

400         Cancelled    No            1

400         Review       No            2

400         Hold         No            3

400         Returned     No            4

500         Ordered      Yes           1

500         Staged       Yes           2

500         Shipped      Yes           3

500         Pending      No            4

600         Pending      Yes           1

600         Hold         No            2

600         Review       No            3

700         Ordered      Yes           1

700         Hold         No            2

700         Cancelled    No            3

700         Returned     No            4

800         Hold         No            1

800         Returned     No            2

900         Staged       Yes           1

900         Shipped      Yes           2

Hope this answers your question.

-- Gianluca Sartori

Ron McCullough SSC Guru Points: 63877 More actions · Answer 8

Siva Ramasamy (1/30/2012)

Hi,

I was just following this thread and understood the requirement and I also executed your solution and found that it satisfies the requirement...but not able to understand how you achieved it.

Can you please explain..?

Thanks!

Siva.

The following may answer your question;

http://www.simple-talk.com/sql/t-sql-programming/sql-server-cte-basics/

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

Siva Ramasamy Hall of Fame Points: 3866 More actions · Answer 9

Siva Ramasamy

Hall of Fame

Points: 3866

January 30, 2012 at 4:09 pm

#1440294

Thanks Gianluca Sartori and Thanks Ron.

I understood the logic.