What is the most efficient way to delete records?

  • Sql Server Books online http://msdn.microsoft.com/en-us/library/ms188774.aspx says:

    Important:

    Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server. Do not use SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. Also, for DELETE, INSERT, and UPDATE statements that currently use SET ROWCOUNT, we recommend that you rewrite them to use the TOP syntax. For more information, see DELETE (Transact-SQL), INSERT (Transact-SQL), or UPDATE (Transact-SQL).

    So I guess in future versions this trick won't work...

  • Leo.Miller (5/6/2010)


    How big is the table you are testing against?

    Also are the table layouts the same?

    I'm testing against the real table.

  • These are the stats from the first run of the script (yesterday):

    24 24 2010-05-07 09:50:04.993

    23 23 2010-05-07 07:59:53.770

    22 22 2010-05-07 06:51:33.177

    21 21 2010-05-07 06:01:14.253

    20 20 2010-05-07 05:15:18.837

    19 19 2010-05-07 04:32:22.620

    18 18 2010-05-07 03:54:23.123

    17 17 2010-05-07 03:11:45.967

    16 16 2010-05-07 02:25:01.717

    15 15 2010-05-07 01:47:50.197

    14 14 2010-05-07 01:10:34.580

    13 13 2010-05-07 00:31:06.697

    12 12 2010-05-06 23:51:33.583

    11 11 2010-05-06 23:15:31.950

    10 10 2010-05-06 22:40:30.957

    9 9 2010-05-06 22:05:49.300

    8 8 2010-05-06 21:30:16.277

    7 7 2010-05-06 20:57:32.897

    6 6 2010-05-06 19:30:56.320

    5 5 2010-05-06 18:49:25.710

    4 4 2010-05-06 18:11:45.787

    3 3 2010-05-06 17:29:32.040

    2 2 2010-05-06 16:44:40.407

    1 1 2010-05-06 16:08:33.557

  • And here the first million with the new script:

    131 2 2010-05-07 12:32:35.697

    130 1 2010-05-07 12:01:39.113

    If I have 1 mil deleted every half an hour I'll need >70 more hours to finish the delete process =( Damn!

  • Leo.Miller (5/6/2010)


    You asked: Why are you saying that to find the TOP (1000000) sql must check all the records?

    The only way to find the TOP record is to look at them all, then sort them in the required order, then take the top one off the top. It helps if you have a clustered index, but SQL is still doing a lot of extra reads.

    Try running the queries below with the SHOW QUERY PLAN ON and after setting

    SET STATISTICS IO ON

    SELECT TOP 1 (*) FROM Big_Table where [DATE] < '1 Jan 2010'

    vs

    SET ROWCOUNT 1

    SELECT * FROM Big_Table where [DATE] < '1 Jan 2010'

    I'm just not sure why you think that... TOP stops looking as soon as its sufficiency has been suffonsified. But don't take my word for it... take your own code's word for it...

    DROP TABLE JBMTest

    GO

    --===== Create and populate a 1,000,000 row test table.

    -- Column "RowNum" has a range of 1 to 100,000 unique numbers

    -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

    -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

    -- for all rows.

    -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

    SELECT TOP 1000000

    SomeID = IDENTITY(INT,1,1),

    SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

    SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeHex12 = RIGHT(NEWID(),12)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

    -- Takes about 1 second to execute.

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (SomeID)

    --===== Demo that TOP does NOT have to read through all the rows

    -- any more than ROWCOUNT does

    set statistics io,time on

    SELECT TOP 10000 * FROM dbo.JBMTest where SomeDate < '1 Jan 2010'

    SET ROWCOUNT 10000

    SELECT * FROM dbo.JBMTest where SomeDate < '1 Jan 2010'

    SET ROWCOUNT 0

    set statistics io,time off

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • brutsoft (5/6/2010)


    And here the first million with the new script:

    131 2 2010-05-07 12:32:35.697

    130 1 2010-05-07 12:01:39.113

    If I have 1 mil deleted every half an hour I'll need >70 more hours to finish the delete process =( Damn!

    How many indexes do you have on this bad boy? Also, do you have any Foreign Keys or triggers associated with this table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Ok I think I solved my problem as it happes frequently in a way that could not have been suggested on a forum because it's not possible to present complete information on a forum.

    I do have a "master" copy of my database sitting on the same server. So what I did is truncated the table completely and ran a loop inserting records million by million from the "master" copy. It goes with the rate of one million every two minutes it means that in an hour the copy will be completed. After this I will have to shrink my database that will take several more hours so by tommorow morning I should be all set.

  • Jeff Moden (5/6/2010)


    How many indexes do you have on this bad boy? Also, do you have any Foreign Keys or triggers associated with this table?

    Out of 180GB occupied by the table 60GB is taken up by indexes. There are MASSIVE indexes on the table as you can see. The number of indexes including PK identity clustered index is 5.

    There is not foreign key INTO this table but there are several foreign keys from this table to other tables.

  • brutsoft (5/6/2010)


    Jeff Moden (5/6/2010)


    How many indexes do you have on this bad boy? Also, do you have any Foreign Keys or triggers associated with this table?

    Out of 180GB occupied by the table 60GB is taken up by indexes. There are MASSIVE indexes on the table as you can see. The number of indexes including PK identity clustered index is 5.

    There is not foreign key INTO this table but there are several foreign keys from this table to other tables.

    That's a very big part of why deletes are taking so long. They affect every index and are affected by every FK.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I would like to confirm that I got my problem resolved. Copying data from the master database took around 2 hours. Another 8 hours - shrinking the database, and then I was up and running. Thank you all.

  • brutsoft (5/9/2010)


    I would like to confirm that I got my problem resolved. Copying data from the master database took around 2 hours. Another 8 hours - shrinking the database, and then I was up and running. Thank you all.

    You are aware that shrinking the database causes massive fragmentation of all of your indexes? You should schedule a rebuild of all indexes as soon as possible - and make sure you have enough space available in the data file for the rebuilding to be done.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Very cool and very thoughtful... thanks for letting us know how it went.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Please refer this

    http://tsqltips.blogspot.com/

  • adhiman (6/7/2012)


    Please refer this

    http://tsqltips.blogspot.com/%5B/quote%5D

    First of all, I don't see what your article has to do with deletes. You don't actually need any keys or indexes to do deletes.

    Second, your blog article suggestsusing the SSN as a possible "unqiue" column. Really, really bad idea for many reasons but you also need to be aware that people can and have changed their SSN's thanks to identity theft and the like. Of course, there's also the warning on every SSN card that says it "shall not be used for identification purposes".

    Now that you don't have SSN's to rely on for a unique key, what would you use for a unique key. If it's ANYTHING having to do with the person's name, then you fail because people can easily change their name and they are of course, not unique. Please don't say you'd add a sequence number to that.

    For customer tables, you might just as well get used to the idea that the only proper way to identify a customer is by a customer number and that there's absolutely no reason why that customer number shouldn't be an IDENTITY column in T-SQL.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hi Jeff:

    I appreciate your response to my blog.

    My first article is talking about performing the deletes in an efficient manner. Suppose you have a table that have millions of rows in it and you want to perform delete on it using some condition. If you use this syntax :

    begin tran;

    delete from table where condition

    commit;

    What it will do is , it will lock the table until delete happens. Now you should know that before doing deletes , sql engine has to read the data pages for the rows it has to delete.This way the time it reads the rows to be deleted will be part of the table locking that can block other simultaneous transactions that want to do any other DML on the table(may be insert ,update). To save this blocking , i am suggesting a way to read the row Id's into a temp variable. Then you can use the index to join it to the ID's to the table from where the data needs to be deleted. The benefit of doing this is the query optimizer would already know where the data exists(because of the index) and it may not block other simultaneous inserts to happen at the same time. This way you can prevent blocking to some extent and I have used this methodology on a table of 50 million records and it works like a champ. The main aim is to suggest a better way of doing the same thing.

    Secondly, in my other blog which talks about PK and UK. The example of adding an SSN was an idea(not a perfect suggestion).The main aim of that blog is to help understand the basic misconceptions about the PK and UK that people have and i have read on the internet. You are absolutely correct that in the example i have given the best way to keep track of uniqueness of by adding an ID column. Probably i will come up with a better example. The idea is to make readers aware that they should consider thinking about the uniqueness of the rows not just by adding an Id. It does help query optimizer to come up with an optimal plan for the query execution.

    I really appreciate your thoughts.

    Please let me know if you have questions.

Viewing 15 posts - 16 through 30 (of 35 total)

You must be logged in to reply to this topic. Login to reply