Archiving...INSERT & DELETE

Question

Post reply

Archiving...INSERT & DELETE

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 1

You could try using something like this:



declare @BatchSize int = 10000;

while @BatchSize <> 0

begin

    begin try

        begin transaction

        delete top (@BatchSize)

        from dbo.SPTLog

        output

            deleted.LogID, deleted.LogTime, getdate()

        into

            dbo.SPTLogArchive(LogID,LogTime,ArchiveTime)

        where

            LogTime < @dCutOffDate;

        set @BatchSize = @@ROWCOUNT;

        commit transaction;

    end try

    begin catch

        rollback transaction;

        set @BatchSize = 0; -- if you want to abort the delete, or set to 10000 if to continue

    end catch

end

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 2

Lynn Pettis

SSC Guru

Points: 442447

September 11, 2012 at 9:39 am

#1536343

You might want to put a DELAY in the TRY part of the code after the commit to allow other activities to access the table from which you are deleting rows.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

oscarooko (9/11/2012)

It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!

SELECT [LogID]



 ,[LogTime]

 ,GetDate()

FROM [dbo].[SPTLog]

WHERE [LogTime] < @dCutOffDate;

How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 4

ChrisM@Work (9/11/2012)

oscarooko (9/11/2012)

It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!

SELECT [LogID]
,[LogTime] ,GetDate() FROM [dbo].[SPTLog] WHERE [LogTime] < @dCutOffDate;

How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?

In fact what indexes are defined on the table dbo.SPTLog?

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

Lynn Pettis (9/11/2012)

ChrisM@Work (9/11/2012)

oscarooko (9/11/2012)

It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!

SELECT [LogID]
,[LogTime] ,GetDate() FROM [dbo].[SPTLog] WHERE [LogTime] < @dCutOffDate;

How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?

In fact what indexes are defined on the table dbo.SPTLog?

Is LogID a foreign key in another table?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 6

ChrisM@Work (9/11/2012)

Lynn Pettis (9/11/2012)

ChrisM@Work (9/11/2012)

oscarooko (9/11/2012)

It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!

SELECT [LogID]
,[LogTime] ,GetDate() FROM [dbo].[SPTLog] WHERE [LogTime] < @dCutOffDate;

How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?

In fact what indexes are defined on the table dbo.SPTLog?

Is LogID a foreign key in another table?

Here is one for you, if LogTime is a nonclustered index, SQL Server could be doing a table scan selecting records to delete.

Too bad it looks like the OP has left for now, it would be nice to know what indexes are defined.

oscarooko Old Hand Points: 395 More actions · Answer 7

Lynn Pettis (9/11/2012)

ChrisM@Work (9/11/2012)

Lynn Pettis (9/11/2012)

ChrisM@Work (9/11/2012)

oscarooko (9/11/2012)

It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!

SELECT [LogID]
,[LogTime] ,GetDate() FROM [dbo].[SPTLog] WHERE [LogTime] < @dCutOffDate;

How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?

In fact what indexes are defined on the table dbo.SPTLog?

Is LogID a foreign key in another table?

Here is one for you, if LogTime is a nonclustered index, SQL Server could be doing a table scan selecting records to delete.

Too bad it looks like the OP has left for now, it would be nice to know what indexes are defined.

Good People, I really do appreciate your help with this. The comments thus far have provided me with lots of insight.

LogID is Primary key in SPTLog table. It is the foreign key in SPTLogArchive

In the code I provided, I did NOT include the SPTLogArchiveID column and a two other columns...I was just trying not to clutter my request(may be dumb)

In the table SPTLog, I have indexes on LogTime and LogID(PrimaryKey)

In the table SPTLogArchive I have indexes on LogTime- nonClustered, and on SPTLogArchiveID-Primary key

Lynn, I am testing the code you gave me. Looks like I might have some luck with it. I will keep you posted with the results.

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 8

Lynn Pettis

SSC Guru

Points: 442447

September 11, 2012 at 12:04 pm

#1536424

Appreciate you telling us what indexes are defined, but it would be nice to see the actual definitions.

oscarooko Old Hand Points: 395 More actions · Answer 9

/*Indexes for SPTlogArchive*/

/****** Object: Index [IX_SPTLogArchive_LogTime] Script Date: 09/11/2012 14:46:56 ******/

CREATE NONCLUSTERED INDEX [IX_SPTLogArchive_LogTime] ON [dbo].[SPTLogArchive]

(

[LogTime] ASC

)

INCLUDE ( [LogID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

----------------------

/****** Object: Index [PK_SPTLogArchiveID] Script Date: 09/11/2012 14:53:06 ******/

ALTER TABLE [dbo].[SPTLogArchive] ADD CONSTRAINT [PK_SPTLogArchive] PRIMARY KEY CLUSTERED

(

[SPTLogArchiveID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

----------------------------------

/*Indexes for SPTlog*/

----------------------------------

/****** Object: Index [IX_index_SPTLog_LogTime] Script Date: 09/11/2012 15:00:25 ******/

CREATE NONCLUSTERED INDEX [IX_index_SPTLog_LogTime] ON [dbo].[SPTLog]

(

[LogTime] ASC

)

INCLUDE ( [LogID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

-------------------

/****** Object: Index [PK_SPTLog] Script Date: 09/11/2012 15:00:55 ******/

ALTER TABLE [dbo].[SPTLog] ADD CONSTRAINT [PK_SPTLog] PRIMARY KEY CLUSTERED

(

[LogID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 10

Lynn Pettis

SSC Guru

Points: 442447

September 11, 2012 at 1:12 pm

#1536445

The DELETE will be doing a CLUSTERED INDEX (i.e. table) scan. It won't use the index on LogTime.

Just so you know.

oscarooko Old Hand Points: 395 More actions · Answer 11

oscarooko

Old Hand

Points: 395

September 11, 2012 at 1:25 pm

#1536456

So you saying even though I have a the criteria LogTime < @dCutOffDate and an index is defined on LogTime, it still will NOT be used?

Any ideas on what to change to optimize the delete?

oscarooko Old Hand Points: 395 More actions · Answer 12

oscarooko

Old Hand

Points: 395

September 11, 2012 at 1:32 pm

#1536461

Lynn,

Just got done running the query. I deleted 47,360 records in 1hr 27 mins using your method. Wow, still too long!

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 13

oscarooko (9/11/2012)

So you saying even though I have a the criteria LogTime < @dCutOffDate and an index is defined on LogTime, it still will NOT be used?

Any ideas on what to change to optimize the delete?

Basically, yes, that is what I am saying. Using the index and then bookmark loops to delete records will be more expensive than just scanning the clustered index (table). If you were deleting individual rows using the index, then SQL Server would most likely use the index.

One way would be to have the LogTime defined as your clustered index. If LogID is an identity column, you could have it as the second column of the clustered index and make it a multi-column clustered index. You could still have LogID as the PK, just create it as a nonclustered index.

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 14

oscarooko (9/11/2012)

Lynn Pettis (9/11/2012)

ChrisM@Work (9/11/2012)

Lynn Pettis (9/11/2012)

ChrisM@Work (9/11/2012)

oscarooko (9/11/2012)

It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!

SELECT [LogID]
,[LogTime] ,GetDate() FROM [dbo].[SPTLog] WHERE [LogTime] < @dCutOffDate;

How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?

In fact what indexes are defined on the table dbo.SPTLog?

Is LogID a foreign key in another table?

Here is one for you, if LogTime is a nonclustered index, SQL Server could be doing a table scan selecting records to delete.

Too bad it looks like the OP has left for now, it would be nice to know what indexes are defined.

Good People, I really do appreciate your help with this. The comments thus far have provided me with lots of insight.

LogID is Primary key in SPTLog table. It is the foreign key in SPTLogArchive

In the code I provided, I did NOT include the SPTLogArchiveID column and a two other columns...I was just trying not to clutter my request(may be dumb)

In the table SPTLog, I have indexes on LogTime and LogID(PrimaryKey)

In the table SPTLogArchive I have indexes on LogTime- nonClustered, and on SPTLogArchiveID-Primary key

Lynn, I am testing the code you gave me. Looks like I might have some luck with it. I will keep you posted with the results.

If there's no index on it, then that's your problem. Each delete in SPTLog will cause a table scan of the column in SPTLogArchive to ensure that the key doesn't exist there - if it does, SQL Server will throw an error. They won't exist because you are moving rows from one table to the other - but you will still get table scans for every delete. The circular reference will ensure that keys only exist in one table at any one time but it's an expensive process especially without indexes on the FK columns.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 15

ChrisM@home (9/11/2012)

...

If there's no index on it, then that's your problem. Each delete in SPTLog will cause a table scan of the column in SPTLogArchive to ensure that the key doesn't exist there - if it does, SQL Server will throw an error. They won't exist because you are moving rows from one table to the other - but you will still get table scans for every delete. The circular reference will ensure that keys only exist in one table at any one time but it's an expensive process especially without indexes on the FK columns.

This doesn't quite stack up, and in any case if there were a FK relationship between the tables, then Lynn's code would have raised the following error:

"Msg 332, Level 16, State 1, Line 41

The target table 'dbo.SPTLogArchive' of the OUTPUT INTO clause cannot be on either side of a (primary key, foreign key) relationship. Found reference constraint 'FK_SPTLogArchive_SPTLog'."

However, you've stated that there is a FK relationship somewhere, so here's a code block to work around it, modified from your original code:

DECLARE @dCutOffDate DATE;



SET @dCutOffDate = '2008-02-20'; -- or whatever you want your value to be

ALTER TABLE [dbo].[SPTLog] NOCHECK CONSTRAINT ALL;

ALTER TABLE [dbo].[SPTLogArchive] NOCHECK CONSTRAINT ALL;

begin try;

    begin transaction;

	INSERT INTO [dbo].[SPTLogArchive]

		([LogID]

		,[LogTime]

		,[ArchiveTime])

	SELECT

		[LogID]

		,[LogTime]

		,GetDate()

	FROM [dbo].[SPTLog]

	WHERE [LogTime] < @dCutOffDate;

	DELETE FROM [dbo].[SPTLog]

	WHERE LogTime < @dCutOffDate; 

    commit transaction;

end try

begin catch;

    rollback transaction;

end catch

ALTER TABLE [dbo].[SPTLog] WITH CHECK CHECK CONSTRAINT ALL;

ALTER TABLE [dbo].[SPTLogArchive] WITH NOCHECK CHECK CONSTRAINT ALL;

The code disables the FK constraints before the transaction and enables them again afterwards. Notice that the two enable statements

ALTER TABLE [dbo].[SPTLog] WITH CHECK CHECK CONSTRAINT ALL;

ALTER TABLE [dbo].[SPTLogArchive] WITH NOCHECK CHECK CONSTRAINT ALL;

differ. The second statement (for SPTLogArchive)uses the clause “WITH NOCHECK” which instructs SQL Server NOT to check the new values – if it does, the statement will fail.

This batch works just fine, moving about 40 thousand of 3 million rows from SPTLog to SPTLogArchive in about a second on my local instance. I've not incorporated "chunking" into this, where subsets of the data are moved (INSERT and DELETE of the same set of keys) within separate transactions to mitigate resource waits.

Here’s the sample setup I used for the test:

USE Sandbox



GO

DROP TABLE SPTLogArchive

DROP TABLE SPTLog

GO

----------------------------------

/* SPTlog */

----------------------------------

CREATE TABLE [dbo].[SPTLog] (

	LogID INT IDENTITY (1,1) NOT NULL CONSTRAINT PK_SPTLog PRIMARY KEY CLUSTERED,

	LogTime DATE NOT NULL)

-- LogID is redundant - it's already part of the index as the cluster key

CREATE NONCLUSTERED INDEX [IX_index_SPTLog_LogTime] ON [dbo].[SPTLog]

([LogTime] ASC) INCLUDE ( [LogID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

----------------------------------

/* SPTLogArchive */

----------------------------------

CREATE TABLE [dbo].[SPTLogArchive] (

	SPTLogArchiveID INT IDENTITY (1,1) NOT NULL CONSTRAINT PK_SPTLogArchive PRIMARY KEY CLUSTERED,

	LogID INT NULL CONSTRAINT FK_SPTLogArchive_SPTLog REFERENCES SPTLog (LogID),

	LogTime DATE NOT NULL,

	ArchiveTime DATE NOT NULL) 

CREATE NONCLUSTERED INDEX [IX_SPTLogArchive_LogTime] ON [dbo].[SPTLogArchive]

([LogTime] ASC) INCLUDE ( [LogID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

----------------------------------

/* Load sample data into SPTLog */

----------------------------------

SET IDENTITY_INSERT [dbo].[SPTLog] ON

INSERT INTO [dbo].[SPTLog] (LogID,LogTime)

SELECT

	NewLogid = ROW_NUMBER() OVER(ORDER BY rn DESC),

	NewDate = DATEADD(minute,0-rn,GETDATE())

FROM (

	SELECT TOP 3000000 rn = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

	FROM SYS.columns a, sys.columns b, sys.columns c

) d

SET IDENTITY_INSERT [dbo].[SPTLog] OFF

ALTER INDEX PK_SPTLog ON SPTLog REBUILD

GO

DON’T use this in a production environment. What we need to see now is the full DDL for both tables.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden