Archiving...INSERT & DELETE

  • You could try using something like this:

    declare @BatchSize int = 10000;

    while @BatchSize <> 0

    begin

    begin try

    begin transaction

    delete top (@BatchSize)

    from dbo.SPTLog

    output

    deleted.LogID, deleted.LogTime, getdate()

    into

    dbo.SPTLogArchive(LogID,LogTime,ArchiveTime)

    where

    LogTime < @dCutOffDate;

    set @BatchSize = @@ROWCOUNT;

    commit transaction;

    end try

    begin catch

    rollback transaction;

    set @BatchSize = 0; -- if you want to abort the delete, or set to 10000 if to continue

    end catch

    end

  • You might want to put a DELAY in the TRY part of the code after the commit to allow other activities to access the table from which you are deleting rows.

  • oscarooko (9/11/2012)


    It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!

    SELECT [LogID]

    ,[LogTime]

    ,GetDate()

    FROM [dbo].[SPTLog]

    WHERE [LogTime] < @dCutOffDate;

    How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (9/11/2012)


    oscarooko (9/11/2012)


    It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!

    SELECT [LogID]

    ,[LogTime]

    ,GetDate()

    FROM [dbo].[SPTLog]

    WHERE [LogTime] < @dCutOffDate;

    How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?

    In fact what indexes are defined on the table dbo.SPTLog?

  • Lynn Pettis (9/11/2012)


    ChrisM@Work (9/11/2012)


    oscarooko (9/11/2012)


    It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!

    SELECT [LogID]

    ,[LogTime]

    ,GetDate()

    FROM [dbo].[SPTLog]

    WHERE [LogTime] < @dCutOffDate;

    How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?

    In fact what indexes are defined on the table dbo.SPTLog?

    Is LogID a foreign key in another table?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (9/11/2012)


    Lynn Pettis (9/11/2012)


    ChrisM@Work (9/11/2012)


    oscarooko (9/11/2012)


    It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!

    SELECT [LogID]

    ,[LogTime]

    ,GetDate()

    FROM [dbo].[SPTLog]

    WHERE [LogTime] < @dCutOffDate;

    How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?

    In fact what indexes are defined on the table dbo.SPTLog?

    Is LogID a foreign key in another table?

    Here is one for you, if LogTime is a nonclustered index, SQL Server could be doing a table scan selecting records to delete.

    Too bad it looks like the OP has left for now, it would be nice to know what indexes are defined.

  • Lynn Pettis (9/11/2012)


    ChrisM@Work (9/11/2012)


    Lynn Pettis (9/11/2012)


    ChrisM@Work (9/11/2012)


    oscarooko (9/11/2012)


    It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!

    SELECT [LogID]

    ,[LogTime]

    ,GetDate()

    FROM [dbo].[SPTLog]

    WHERE [LogTime] < @dCutOffDate;

    How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?

    In fact what indexes are defined on the table dbo.SPTLog?

    Is LogID a foreign key in another table?

    Here is one for you, if LogTime is a nonclustered index, SQL Server could be doing a table scan selecting records to delete.

    Too bad it looks like the OP has left for now, it would be nice to know what indexes are defined.

    Good People, I really do appreciate your help with this. The comments thus far have provided me with lots of insight.

    LogID is Primary key in SPTLog table. It is the foreign key in SPTLogArchive

    In the code I provided, I did NOT include the SPTLogArchiveID column and a two other columns...I was just trying not to clutter my request(may be dumb)

    In the table SPTLog, I have indexes on LogTime and LogID(PrimaryKey)

    In the table SPTLogArchive I have indexes on LogTime- nonClustered, and on SPTLogArchiveID-Primary key

    Lynn, I am testing the code you gave me. Looks like I might have some luck with it. I will keep you posted with the results.

  • Appreciate you telling us what indexes are defined, but it would be nice to see the actual definitions.

  • /*Indexes for SPTlogArchive*/

    /****** Object: Index [IX_SPTLogArchive_LogTime] Script Date: 09/11/2012 14:46:56 ******/

    CREATE NONCLUSTERED INDEX [IX_SPTLogArchive_LogTime] ON [dbo].[SPTLogArchive]

    (

    [LogTime] ASC

    )

    INCLUDE ( [LogID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    ----------------------

    /****** Object: Index [PK_SPTLogArchiveID] Script Date: 09/11/2012 14:53:06 ******/

    ALTER TABLE [dbo].[SPTLogArchive] ADD CONSTRAINT [PK_SPTLogArchive] PRIMARY KEY CLUSTERED

    (

    [SPTLogArchiveID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    ----------------------------------

    /*Indexes for SPTlog*/

    ----------------------------------

    /****** Object: Index [IX_index_SPTLog_LogTime] Script Date: 09/11/2012 15:00:25 ******/

    CREATE NONCLUSTERED INDEX [IX_index_SPTLog_LogTime] ON [dbo].[SPTLog]

    (

    [LogTime] ASC

    )

    INCLUDE ( [LogID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    -------------------

    /****** Object: Index [PK_SPTLog] Script Date: 09/11/2012 15:00:55 ******/

    ALTER TABLE [dbo].[SPTLog] ADD CONSTRAINT [PK_SPTLog] PRIMARY KEY CLUSTERED

    (

    [LogID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

  • The DELETE will be doing a CLUSTERED INDEX (i.e. table) scan. It won't use the index on LogTime.

    Just so you know.

  • So you saying even though I have a the criteria LogTime < @dCutOffDate and an index is defined on LogTime, it still will NOT be used?

    Any ideas on what to change to optimize the delete?

  • Lynn,

    Just got done running the query. I deleted 47,360 records in 1hr 27 mins using your method. Wow, still too long!

  • oscarooko (9/11/2012)


    So you saying even though I have a the criteria LogTime < @dCutOffDate and an index is defined on LogTime, it still will NOT be used?

    Any ideas on what to change to optimize the delete?

    Basically, yes, that is what I am saying. Using the index and then bookmark loops to delete records will be more expensive than just scanning the clustered index (table). If you were deleting individual rows using the index, then SQL Server would most likely use the index.

    One way would be to have the LogTime defined as your clustered index. If LogID is an identity column, you could have it as the second column of the clustered index and make it a multi-column clustered index. You could still have LogID as the PK, just create it as a nonclustered index.

  • oscarooko (9/11/2012)


    Lynn Pettis (9/11/2012)


    ChrisM@Work (9/11/2012)


    Lynn Pettis (9/11/2012)


    ChrisM@Work (9/11/2012)


    oscarooko (9/11/2012)


    It actually took me 47 minute to delete 13 684 records in batches of 100. I am there's got to be a faster way, and one of you good folks know it!!

    SELECT [LogID]

    ,[LogTime]

    ,GetDate()

    FROM [dbo].[SPTLog]

    WHERE [LogTime] < @dCutOffDate;

    How long does it take for this query to run with a suitable value for @dCutOffDate? Are you sure there's an index on [LogTime], table [dbo].[SPTLog]?

    In fact what indexes are defined on the table dbo.SPTLog?

    Is LogID a foreign key in another table?

    Here is one for you, if LogTime is a nonclustered index, SQL Server could be doing a table scan selecting records to delete.

    Too bad it looks like the OP has left for now, it would be nice to know what indexes are defined.

    Good People, I really do appreciate your help with this. The comments thus far have provided me with lots of insight.

    LogID is Primary key in SPTLog table. It is the foreign key in SPTLogArchive

    In the code I provided, I did NOT include the SPTLogArchiveID column and a two other columns...I was just trying not to clutter my request(may be dumb)

    In the table SPTLog, I have indexes on LogTime and LogID(PrimaryKey)

    In the table SPTLogArchive I have indexes on LogTime- nonClustered, and on SPTLogArchiveID-Primary key

    Lynn, I am testing the code you gave me. Looks like I might have some luck with it. I will keep you posted with the results.

    If there's no index on it, then that's your problem. Each delete in SPTLog will cause a table scan of the column in SPTLogArchive to ensure that the key doesn't exist there - if it does, SQL Server will throw an error. They won't exist because you are moving rows from one table to the other - but you will still get table scans for every delete. The circular reference will ensure that keys only exist in one table at any one time but it's an expensive process especially without indexes on the FK columns.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • ChrisM@home (9/11/2012)


    ...

    If there's no index on it, then that's your problem. Each delete in SPTLog will cause a table scan of the column in SPTLogArchive to ensure that the key doesn't exist there - if it does, SQL Server will throw an error. They won't exist because you are moving rows from one table to the other - but you will still get table scans for every delete. The circular reference will ensure that keys only exist in one table at any one time but it's an expensive process especially without indexes on the FK columns.

    This doesn't quite stack up, and in any case if there were a FK relationship between the tables, then Lynn's code would have raised the following error:

    "Msg 332, Level 16, State 1, Line 41

    The target table 'dbo.SPTLogArchive' of the OUTPUT INTO clause cannot be on either side of a (primary key, foreign key) relationship. Found reference constraint 'FK_SPTLogArchive_SPTLog'."

    However, you've stated that there is a FK relationship somewhere, so here's a code block to work around it, modified from your original code:

    DECLARE @dCutOffDate DATE;

    SET @dCutOffDate = '2008-02-20'; -- or whatever you want your value to be

    ALTER TABLE [dbo].[SPTLog] NOCHECK CONSTRAINT ALL;

    ALTER TABLE [dbo].[SPTLogArchive] NOCHECK CONSTRAINT ALL;

    begin try;

    begin transaction;

    INSERT INTO [dbo].[SPTLogArchive]

    ([LogID]

    ,[LogTime]

    ,[ArchiveTime])

    SELECT

    [LogID]

    ,[LogTime]

    ,GetDate()

    FROM [dbo].[SPTLog]

    WHERE [LogTime] < @dCutOffDate;

    DELETE FROM [dbo].[SPTLog]

    WHERE LogTime < @dCutOffDate;

    commit transaction;

    end try

    begin catch;

    rollback transaction;

    end catch

    ALTER TABLE [dbo].[SPTLog] WITH CHECK CHECK CONSTRAINT ALL;

    ALTER TABLE [dbo].[SPTLogArchive] WITH NOCHECK CHECK CONSTRAINT ALL;

    The code disables the FK constraints before the transaction and enables them again afterwards. Notice that the two enable statements

    ALTER TABLE [dbo].[SPTLog] WITH CHECK CHECK CONSTRAINT ALL;

    ALTER TABLE [dbo].[SPTLogArchive] WITH NOCHECK CHECK CONSTRAINT ALL;

    differ. The second statement (for SPTLogArchive)uses the clause “WITH NOCHECK” which instructs SQL Server NOT to check the new values – if it does, the statement will fail.

    This batch works just fine, moving about 40 thousand of 3 million rows from SPTLog to SPTLogArchive in about a second on my local instance. I've not incorporated "chunking" into this, where subsets of the data are moved (INSERT and DELETE of the same set of keys) within separate transactions to mitigate resource waits.

    Here’s the sample setup I used for the test:

    USE Sandbox

    GO

    DROP TABLE SPTLogArchive

    DROP TABLE SPTLog

    GO

    ----------------------------------

    /* SPTlog */

    ----------------------------------

    CREATE TABLE [dbo].[SPTLog] (

    LogID INT IDENTITY (1,1) NOT NULL CONSTRAINT PK_SPTLog PRIMARY KEY CLUSTERED,

    LogTime DATE NOT NULL)

    -- LogID is redundant - it's already part of the index as the cluster key

    CREATE NONCLUSTERED INDEX [IX_index_SPTLog_LogTime] ON [dbo].[SPTLog]

    ([LogTime] ASC) INCLUDE ( [LogID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ----------------------------------

    /* SPTLogArchive */

    ----------------------------------

    CREATE TABLE [dbo].[SPTLogArchive] (

    SPTLogArchiveID INT IDENTITY (1,1) NOT NULL CONSTRAINT PK_SPTLogArchive PRIMARY KEY CLUSTERED,

    LogID INT NULL CONSTRAINT FK_SPTLogArchive_SPTLog REFERENCES SPTLog (LogID),

    LogTime DATE NOT NULL,

    ArchiveTime DATE NOT NULL)

    CREATE NONCLUSTERED INDEX [IX_SPTLogArchive_LogTime] ON [dbo].[SPTLogArchive]

    ([LogTime] ASC) INCLUDE ( [LogID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ----------------------------------

    /* Load sample data into SPTLog */

    ----------------------------------

    SET IDENTITY_INSERT [dbo].[SPTLog] ON

    INSERT INTO [dbo].[SPTLog] (LogID,LogTime)

    SELECT

    NewLogid = ROW_NUMBER() OVER(ORDER BY rn DESC),

    NewDate = DATEADD(minute,0-rn,GETDATE())

    FROM (

    SELECT TOP 3000000 rn = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM SYS.columns a, sys.columns b, sys.columns c

    ) d

    SET IDENTITY_INSERT [dbo].[SPTLog] OFF

    ALTER INDEX PK_SPTLog ON SPTLog REBUILD

    GO

    DON’T use this in a production environment. What we need to see now is the full DDL for both tables.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply