duplicate records count

  • Hello,

    Any help or advice is really appreciated

    I have the following table

    Table A

    vendorname

    walmart

    walmartinc

    walmartinc

    cubfoods

    cubfoods

    cubfood

    asper

    apser

    I need the view the count of the duplicated vendors. if I perform the following

    select vendorname,count(*) as duplicates from vendors

    group by vendorname having count(*) >1

    the following o/p is displayed

    Vendorname duplicates

    walmartinc 2

    cubfoods 2

    As we can see that (walmart, walmart inc),(asper,apser) are the same vendors; but because of the extra character or number its been considered unique; and because of which I am unable to specify the correct number of unique and duplicate vendors in the vendor table of nearly 15000 records.

    please advice me how can i avoid this problem; is there a way to compare two columns based on the first 7 characters.

    Thank you

  • Hi T2512

    I have done a small script that will help you get started. This script deletes the duplicate records in one go. Just a word of caution, please make sure that there are no transactions referring these venders who are deleted

    - If you have not placed the appropriate FK’s, data will be deleted and you will have orphaned detail transactions

    - If your application is spanned over many databases and the vender tables PK is referred in different DB’s please make sure that you do the required verifications before deleting the vender data.

    -

    -

    - CREATE TABLE Vendar ( IID int identity(1,1) , Name varchar(100) )

    - CREATE TABLE #Vendartemp ( IID int ,Name Varchar(20))

    -

    -

    - INSERT INTO Vendar(NAME) VALUES ( 'WALMART')

    - INSERT INTO Vendar(NAME) VALUES ( 'WALMART INC')

    - INSERT INTO Vendar(NAME) VALUES ( 'WALMART INC')

    - INSERT INTO Vendar(NAME) VALUES ( 'apser INC')

    - INSERT INTO Vendar(NAME) VALUES ( 'apser')

    - INSERT INTO Vendar(NAME) VALUES ( 'cubfoods')

    - INSERT INTO Vendar(NAME) VALUES ( 'apser')

    - INSERT INTO Vendar(NAME) VALUES ( 'WALMART')

    -

    -

    - Select * from Vendar

    -

    -

    --- identifies the duplicate records

    - INSERT INTO #Vendartemp ( Name ,IID)

    - Select A.Name , A.IID

    - from Vendar A

    - join Vendar B ON A.name = B.name

    - where A.Name in

    - (

    - Select Name Name

    - From Vendar

    - Group by Name

    - having COUNT(1) > 1

    - )

    - and A.IID > B.IID

    -

    - SELECT * FROM #Vendartemp

    -

    - DELETE FROM VENDAR

    - where VENDAR.IID IN

    - (

    - Select VENDAR.IID

    - FROM #Vendartemp B

    - Where VENDAR.Name = B.Name AND VENDAR.IID < B.IID

    - )

    - Select * from Vendar

    You can use the substring function to identify the first 7 characters in a column

    Hope this helps

  • Using the first 7 characters will not identify asper as the same vendor as apser. Building a CASE statement in a CTE allows you to pre-map the variant spellings of vendor names to a single name eg vendor names appearing as asper or apser are mapped to asper. You can then run your query from the CTE to give you the desired output.

    IF NOT OBJECT_ID('tempdb.dbo.#Vendors', 'U') IS NULL DROP TABLE #Vendors

    SELECT 'walmart' AS vendorname INTO #Vendors UNION ALL

    SELECT 'walmartinc' UNION ALL

    SELECT 'walmartinc' UNION ALL

    SELECT 'cubfoods' UNION ALL

    SELECT 'cubfoods' UNION ALL

    SELECT 'cubfood' UNION ALL

    SELECT 'asper' UNION ALL

    SELECT 'apser'

    ;WITH cte AS

    (

    SELECT CASE

    WHEN vendorname IN ('walmart', 'walmartinc') THEN 'walmart'

    WHEN vendorname IN ('cubfoods', 'cubfood') THEN 'cubfoods'

    WHEN vendorname IN ('asper', 'apser') THEN 'asper'

    ELSE vendorname END AS vendorname

    FROM #Vendors

    )

    SELECT vendorname, COUNT(*) AS duplicates

    FROM cte

    GROUP BY vendorname

    HAVING COUNT(*) > 1

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply