duplicate records count

Question

Post reply

duplicate records count

T2512

Mr or Mrs. 500

Points: 508
More actions
August 20, 2010 at 6:12 pm

#226100

Hello,

Any help or advice is really appreciated

I have the following table

Table A

vendorname

walmart

walmartinc

walmartinc

cubfoods

cubfoods

cubfood

asper

apser

I need the view the count of the duplicated vendors. if I perform the following

select vendorname,count(*) as duplicates from vendors

group by vendorname having count(*) >1

the following o/p is displayed

Vendorname duplicates

walmartinc 2

cubfoods 2

As we can see that (walmart, walmart inc),(asper,apser) are the same vendors; but because of the extra character or number its been considered unique; and because of which I am unable to specify the correct number of unique and duplicate vendors in the vendor table of nearly 15000 records.

please advice me how can i avoid this problem; is there a way to compare two columns based on the first 7 characters.

Thank you

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

jude.pieries Default port Points: 1438 More actions · Answer 1

Hi T2512

I have done a small script that will help you get started. This script deletes the duplicate records in one go. Just a word of caution, please make sure that there are no transactions referring these venders who are deleted

- If you have not placed the appropriate FK’s, data will be deleted and you will have orphaned detail transactions

- If your application is spanned over many databases and the vender tables PK is referred in different DB’s please make sure that you do the required verifications before deleting the vender data.

-

- CREATE TABLE Vendar ( IID int identity(1,1) , Name varchar(100) )

- CREATE TABLE #Vendartemp ( IID int ,Name Varchar(20))

-

- INSERT INTO Vendar(NAME) VALUES ( 'WALMART')

- INSERT INTO Vendar(NAME) VALUES ( 'WALMART INC')

- INSERT INTO Vendar(NAME) VALUES ( 'apser INC')

- INSERT INTO Vendar(NAME) VALUES ( 'apser')

- INSERT INTO Vendar(NAME) VALUES ( 'cubfoods')

- INSERT INTO Vendar(NAME) VALUES ( 'apser')

- INSERT INTO Vendar(NAME) VALUES ( 'WALMART')

-

- Select * from Vendar

-

--- identifies the duplicate records

- INSERT INTO #Vendartemp ( Name ,IID)

- Select A.Name , A.IID

- from Vendar A

- join Vendar B ON A.name = B.name

- where A.Name in

- (

- Select Name Name

- From Vendar

- Group by Name

- having COUNT(1) > 1

- )

- and A.IID > B.IID

-

- SELECT * FROM #Vendartemp

-

- DELETE FROM VENDAR

- where VENDAR.IID IN

- (

- Select VENDAR.IID

- FROM #Vendartemp B

- Where VENDAR.Name = B.Name AND VENDAR.IID < B.IID

- )

- Select * from Vendar

You can use the substring function to identify the first 7 characters in a column

Hope this helps

steve-893342 SSCarpal Tunnel Points: 4439 More actions · Answer 2

Using the first 7 characters will not identify asper as the same vendor as apser. Building a CASE statement in a CTE allows you to pre-map the variant spellings of vendor names to a single name eg vendor names appearing as asper or apser are mapped to asper. You can then run your query from the CTE to give you the desired output.

IF NOT OBJECT_ID('tempdb.dbo.#Vendors', 'U') IS NULL DROP TABLE #Vendors



SELECT 'walmart' AS vendorname INTO #Vendors UNION ALL

SELECT 'walmartinc' UNION ALL

SELECT 'walmartinc' UNION ALL

SELECT 'cubfoods' UNION ALL

SELECT 'cubfoods' UNION ALL

SELECT 'cubfood' UNION ALL

SELECT 'asper' UNION ALL

SELECT 'apser' 

;WITH cte AS

(

SELECT CASE

WHEN vendorname IN ('walmart', 'walmartinc') THEN 'walmart'

WHEN vendorname IN ('cubfoods', 'cubfood') THEN 'cubfoods'

WHEN vendorname IN ('asper', 'apser') THEN 'asper'

ELSE vendorname END AS vendorname

FROM #Vendors

)

SELECT vendorname, COUNT(*) AS duplicates

FROM cte

GROUP BY vendorname

HAVING COUNT(*) > 1