What is the most efficient way to delete records?

Question

Post reply

What is the most efficient way to delete records?

Viewing 15 posts - 16 through 30 (of 35 total)

You must be logged in to reply to this topic. Login to reply

brutsoft SSC-Addicted Points: 485 More actions · Answer 1

Sql Server Books online http://msdn.microsoft.com/en-us/library/ms188774.aspx says:

Important:

Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server. Do not use SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. Also, for DELETE, INSERT, and UPDATE statements that currently use SET ROWCOUNT, we recommend that you rewrite them to use the TOP syntax. For more information, see DELETE (Transact-SQL), INSERT (Transact-SQL), or UPDATE (Transact-SQL).

So I guess in future versions this trick won't work...

brutsoft SSC-Addicted Points: 485 More actions · Answer 2

brutsoft

SSC-Addicted

Points: 485

May 6, 2010 at 6:12 pm

#1163724

Leo.Miller (5/6/2010)

How big is the table you are testing against?

Also are the table layouts the same?

I'm testing against the real table.

brutsoft SSC-Addicted Points: 485 More actions · Answer 3

These are the stats from the first run of the script (yesterday):

24 24 2010-05-07 09:50:04.993

23 23 2010-05-07 07:59:53.770

22 22 2010-05-07 06:51:33.177

21 21 2010-05-07 06:01:14.253

20 20 2010-05-07 05:15:18.837

19 19 2010-05-07 04:32:22.620

18 18 2010-05-07 03:54:23.123

17 17 2010-05-07 03:11:45.967

16 16 2010-05-07 02:25:01.717

15 15 2010-05-07 01:47:50.197

14 14 2010-05-07 01:10:34.580

13 13 2010-05-07 00:31:06.697

12 12 2010-05-06 23:51:33.583

11 11 2010-05-06 23:15:31.950

10 10 2010-05-06 22:40:30.957

9 9 2010-05-06 22:05:49.300

8 8 2010-05-06 21:30:16.277

7 7 2010-05-06 20:57:32.897

6 6 2010-05-06 19:30:56.320

5 5 2010-05-06 18:49:25.710

4 4 2010-05-06 18:11:45.787

3 3 2010-05-06 17:29:32.040

2 2 2010-05-06 16:44:40.407

1 1 2010-05-06 16:08:33.557

brutsoft SSC-Addicted Points: 485 More actions · Answer 4

brutsoft

SSC-Addicted

Points: 485

May 6, 2010 at 6:36 pm

#1163730

And here the first million with the new script:

131 2 2010-05-07 12:32:35.697

130 1 2010-05-07 12:01:39.113

If I have 1 mil deleted every half an hour I'll need >70 more hours to finish the delete process =( Damn!

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 5

Leo.Miller (5/6/2010)

You asked: Why are you saying that to find the TOP (1000000) sql must check all the records?

The only way to find the TOP record is to look at them all, then sort them in the required order, then take the top one off the top. It helps if you have a clustered index, but SQL is still doing a lot of extra reads.

Try running the queries below with the SHOW QUERY PLAN ON and after setting

SET STATISTICS IO ON

SELECT TOP 1 (*) FROM Big_Table where [DATE] < '1 Jan 2010'

vs

SET ROWCOUNT 1

SELECT * FROM Big_Table where [DATE] < '1 Jan 2010'

I'm just not sure why you think that... TOP stops looking as soon as its sufficiency has been suffonsified. But don't take my word for it... take your own code's word for it...

DROP TABLE JBMTest



GO

--===== Create and populate a 1,000,000 row test table.

     -- Column "RowNum" has a range of 1 to 100,000 unique numbers

     -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

     -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

     -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

     -- Column "SomeDate" has a range of  >=01/01/2000 and <01/01/2010 non-unique date/times

     -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

     --        for all rows.

     -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

 SELECT TOP 1000000

        SomeID       = IDENTITY(INT,1,1),

        SomeInt      = ABS(CHECKSUM(NEWID()))%50000+1,

        SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

                     + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

        SomeCSV      = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

        SomeMoney    = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

        SomeDate     = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

        SomeHex12    = RIGHT(NEWID(),12)

   INTO dbo.JBMTest

   FROM Master.dbo.SysColumns t1,

        Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

--===== A table is not properly formed unless a Primary Key has been assigned

     -- Takes about 1 second to execute.

  ALTER TABLE dbo.JBMTest

        ADD PRIMARY KEY CLUSTERED (SomeID)

--===== Demo that TOP does NOT have to read through all the rows

     -- any more than ROWCOUNT does

set statistics io,time on

SELECT TOP 10000 * FROM dbo.JBMTest where SomeDate < '1 Jan 2010' 

SET ROWCOUNT 10000

SELECT * FROM dbo.JBMTest where SomeDate < '1 Jan 2010'

SET ROWCOUNT 0

set statistics io,time off

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 6

brutsoft (5/6/2010)

And here the first million with the new script:

131 2 2010-05-07 12:32:35.697

130 1 2010-05-07 12:01:39.113

If I have 1 mil deleted every half an hour I'll need >70 more hours to finish the delete process =( Damn!

How many indexes do you have on this bad boy? Also, do you have any Foreign Keys or triggers associated with this table?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

brutsoft SSC-Addicted Points: 485 More actions · Answer 7

Ok I think I solved my problem as it happes frequently in a way that could not have been suggested on a forum because it's not possible to present complete information on a forum.

I do have a "master" copy of my database sitting on the same server. So what I did is truncated the table completely and ran a loop inserting records million by million from the "master" copy. It goes with the rate of one million every two minutes it means that in an hour the copy will be completed. After this I will have to shrink my database that will take several more hours so by tommorow morning I should be all set.

brutsoft SSC-Addicted Points: 485 More actions · Answer 8

Jeff Moden (5/6/2010)

How many indexes do you have on this bad boy? Also, do you have any Foreign Keys or triggers associated with this table?

Out of 180GB occupied by the table 60GB is taken up by indexes. There are MASSIVE indexes on the table as you can see. The number of indexes including PK identity clustered index is 5.

There is not foreign key INTO this table but there are several foreign keys from this table to other tables.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 9

brutsoft (5/6/2010)

Jeff Moden (5/6/2010)

How many indexes do you have on this bad boy? Also, do you have any Foreign Keys or triggers associated with this table?

Out of 180GB occupied by the table 60GB is taken up by indexes. There are MASSIVE indexes on the table as you can see. The number of indexes including PK identity clustered index is 5.

There is not foreign key INTO this table but there are several foreign keys from this table to other tables.

That's a very big part of why deletes are taking so long. They affect every index and are affected by every FK.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

brutsoft SSC-Addicted Points: 485 More actions · Answer 10

brutsoft

SSC-Addicted

Points: 485

May 9, 2010 at 7:16 pm

#1164761

I would like to confirm that I got my problem resolved. Copying data from the master database took around 2 hours. Another 8 hours - shrinking the database, and then I was up and running. Thank you all.

Jeffrey Williams SSC Guru Points: 89773 More actions · Answer 11

brutsoft (5/9/2010)

I would like to confirm that I got my problem resolved. Copying data from the master database took around 2 hours. Another 8 hours - shrinking the database, and then I was up and running. Thank you all.

You are aware that shrinking the database causes massive fragmentation of all of your indexes? You should schedule a rebuild of all indexes as soon as possible - and make sure you have enough space available in the data file for the rebuilding to be done.

Jeffrey Williams
Problems are opportunities brilliantly disguised as insurmountable obstacles.

How to post questions to get better answers faster
Managing Transaction Logs

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 12

Very cool and very thoughtful... thanks for letting us know how it went.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

adhiman SSC Veteran Points: 282 More actions · Answer 13

adhiman

SSC Veteran

Points: 282

June 7, 2012 at 5:33 pm

#1498456

Please refer this

http://tsqltips.blogspot.com/

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 14

adhiman (6/7/2012)

Please refer this

http://tsqltips.blogspot.com/%5B/quote%5D

First of all, I don't see what your article has to do with deletes. You don't actually need any keys or indexes to do deletes.

Second, your blog article suggestsusing the SSN as a possible "unqiue" column. Really, really bad idea for many reasons but you also need to be aware that people can and have changed their SSN's thanks to identity theft and the like. Of course, there's also the warning on every SSN card that says it "shall not be used for identification purposes".

Now that you don't have SSN's to rely on for a unique key, what would you use for a unique key. If it's ANYTHING having to do with the person's name, then you fail because people can easily change their name and they are of course, not unique. Please don't say you'd add a sequence number to that.

For customer tables, you might just as well get used to the idea that the only proper way to identify a customer is by a customer number and that there's absolutely no reason why that customer number shouldn't be an IDENTITY column in T-SQL.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

adhiman SSC Veteran Points: 282 More actions · Answer 15

Hi Jeff:

I appreciate your response to my blog.

My first article is talking about performing the deletes in an efficient manner. Suppose you have a table that have millions of rows in it and you want to perform delete on it using some condition. If you use this syntax :

begin tran;

delete from table where condition

commit;

What it will do is , it will lock the table until delete happens. Now you should know that before doing deletes , sql engine has to read the data pages for the rows it has to delete.This way the time it reads the rows to be deleted will be part of the table locking that can block other simultaneous transactions that want to do any other DML on the table(may be insert ,update). To save this blocking , i am suggesting a way to read the row Id's into a temp variable. Then you can use the index to join it to the ID's to the table from where the data needs to be deleted. The benefit of doing this is the query optimizer would already know where the data exists(because of the index) and it may not block other simultaneous inserts to happen at the same time. This way you can prevent blocking to some extent and I have used this methodology on a table of 50 million records and it works like a champ. The main aim is to suggest a better way of doing the same thing.

Secondly, in my other blog which talks about PK and UK. The example of adding an SSN was an idea(not a perfect suggestion).The main aim of that blog is to help understand the basic misconceptions about the PK and UK that people have and i have read on the internet. You are absolutely correct that in the example i have given the best way to keep track of uniqueness of by adding an ID column. Probably i will come up with a better example. The idea is to make readers aware that they should consider thinking about the uniqueness of the rows not just by adding an Id. It does help query optimizer to come up with an optimal plan for the query execution.

I really appreciate your thoughts.

Please let me know if you have questions.