Which one is the efficient Way from the below two

  • Hi All,

    Which one is the efficient Way from the below two for 50M database

    having 30+ fields

    with EmailRecords as (

    select

    row_number() over (partition by Email order by rowid desc) as RowNumber ,

    [Company],[webAddress] ,[Prefix] ,[Contactname] ,[FirstName] ,[MiddleName] ,

    [LastName] ,[Title] ,[Address] ,[Address1] ,[Address2] ,[Address3] ,[City] ,

    [State] ,[Pincode] ,[STDcode] ,[Phone] ,[Phone1] ,[Phone2] ,[Phone3] ,

    [FaxNumber] ,[Mobile] ,[Email] ,[Industry] ,[Product Code] ,[Revenue] ,

    [Experience] ,[Dateofbirth] ,[dob] ,[age] ,[martialstatus] ,[Keyskills] ,

    [education] ,[category] ,[Dealer]

    from dbo.table2

    )

    insert into dbo.[NewTable]

    select [Company],[webAddress] , [Prefix] , [Contactname] , [FirstName] ,[MiddleName] ,

    [LastName] ,[Title] ,[Address] ,[Address1] ,[Address2] ,[Address3] ,[City] ,

    [State] ,[Pincode] ,[STDcode] ,[Phone] ,[Phone1] ,[Phone2] ,[Phone3] ,

    [FaxNumber] ,[Mobile] ,[Email] ,[Industry] ,[Product Code] ,[Revenue] ,

    [Experience] ,[Dateofbirth] ,[dob] ,[age] ,[martialstatus] ,[Keyskills] ,

    [education] ,[category] ,[Dealer]

    from EmailRecords

    where RowNumber = 1;

    ---------------------------------------------------------

    DELETE FROM table2

    WHERE rowid IN

    (

    SELECT a.rowid

    FROM table2 a,table2 b

    WHERE a.rowid!= b.rowid

    and a.rowid< b.rowid

    and a.[Email]= b.[Email]

    )

  • The two queries do different things, but I guess you are trying to detect duplicate records.

    Putting "distinct" records into a new table involves copying a lot of data, so I don't think it will be very efficient. Detecting duplicates with a ROW_NUMBER() function requires a scan, a segment and a sort. What are you doing then with the non-duplicates in the new table? At some point you will have to put them back into the original table.

    I think deleting duplicates is the way to go, but I would not re-invent the wheel:

    Choose your favourite technique

    Edited: a piece of the post was deleted. Strange: it never happened before.

    -- Gianluca Sartori

  • It also depends on how many duplicate records vs uniqie records you have in the table. How many unique rows (based on email) are there in the table?

  • 5 Million records are duplicates

  • Just to be sure, is rowid unique?

  • Ya row ID is unique

  • Okay, here is one way to delete five million rows of data from a table with thirty seven million rows of data.

    declare @Batch int;

    set @Batch = 10000; -- Batch size to delete. Set this to what ever size you feel works best.

    while @Batch > 0

    begin

    with EmailRecords as (

    select

    row_number() over (partition by Email order by rowid desc) as RowNumber ,

    RowID

    from

    dbo.table2

    )

    delete top (@Batch)

    from

    dbo.table2

    from

    dbo.table2 t2

    inner join EmailRecords er

    on (t2.RowID = er.RowID)

    where

    er.RowNumber > 1;

    set @Batch = @@ROWCOUNT; -- Capture how many rows were deleted

    -- backup log [yourDBName] ...

    -- Here would be a good place to backup your transaction log

    -- should your database be using the FULL or BULK_LOGGED recovery model.

    -- This will keep the log from growing excessively andd keep your log chain

    -- intact.

    end

    Here is an article I wrote that discusses this process:

    http://qa.sqlservercentral.com/articles/T-SQL/67898/

    In addition to the article, you should also read the discussion thread as well.

    Edit: Fix broken link. (Thank you Jeff.)

  • Lynn... the click link you posted is broken...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Thanks Jeff. I fixed the link. 😉

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply