Speeding Up Delete's on Large Databases

  • Eirikur Eiriksson (10/31/2016)


    Quick suggestion, reading the data to be deleted into a "bucket" variables can drastically speed up the delete, here is an example based on the DDL previously posted.

    /quote]

    Seems like an extreme case of RBAR, Eirikur. What do you do once you have the variables loaded?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (10/31/2016)


    Eirikur Eiriksson (10/31/2016)


    Quick suggestion, reading the data to be deleted into a "bucket" variables can drastically speed up the delete, here is an example based on the DDL previously posted.

    Seems like an extreme case of RBAR, Eirikur. What do you do once you have the variables loaded?

    After running the pre-loading, then I do the delete, this can speed it up quite significantly. It's in the essence the same principal as I normally use when running a "Dry run" in my normal test harness, makes certain that the affected pages are in memory before any operation is applied.

    ๐Ÿ˜Ž

    Obviously one has to do this for all indices, the easiest way is to target each index with a force'ing hint.

  • Eirikur Eiriksson (10/31/2016)


    Jeff Moden (10/31/2016)


    Eirikur Eiriksson (10/31/2016)


    Quick suggestion, reading the data to be deleted into a "bucket" variables can drastically speed up the delete, here is an example based on the DDL previously posted.

    Seems like an extreme case of RBAR, Eirikur. What do you do once you have the variables loaded?

    After running the pre-loading, then I do the delete, this can speed it up quite significantly. It's in the essence the same principal as I normally use when running a "Dry run" in my normal test harness, makes certain that the affected pages are in memory before any operation is applied.

    ๐Ÿ˜Ž

    Obviously one has to do this for all indices, the easiest way is to target each index with a force'ing hint.

    I guess I don't understand. I don't get how loading a set of variables for each row to be deleted and then deleting one row at a time would improve performance. Either that or I've totally misread what you mean.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (10/31/2016)


    Eirikur Eiriksson (10/31/2016)


    Jeff Moden (10/31/2016)


    Eirikur Eiriksson (10/31/2016)


    Quick suggestion, reading the data to be deleted into a "bucket" variables can drastically speed up the delete, here is an example based on the DDL previously posted.

    Seems like an extreme case of RBAR, Eirikur. What do you do once you have the variables loaded?

    After running the pre-loading, then I do the delete, this can speed it up quite significantly. It's in the essence the same principal as I normally use when running a "Dry run" in my normal test harness, makes certain that the affected pages are in memory before any operation is applied.

    ๐Ÿ˜Ž

    Obviously one has to do this for all indices, the easiest way is to target each index with a force'ing hint.

    I guess I don't understand. I don't get how loading a set of variables for each row to be deleted and then deleting one row at a time would improve performance. Either that or I've totally misread what you mean.

    Guess I haven't been clear enough, this applies for each batch, not each row.

    ๐Ÿ˜Ž

  • Jeff, the method Eirikur's describing works really well. We implemented such a method here for cleaning up a substantial table earlier in the year. The existing method was causing some blocking and was estimated to take around six weeks to complete.

    We looked at locking & blocking, logical reads and physical reads for a few cycles of the old method, and noticed that the first few cycles completed in seconds, extending later to minutes. This coincided with a change from logical reads to physical reads. So we decided to try cache priming - getting the required data into cache with a low-cost read before processing it. A single loop of the code we ran with back then looks something like this:

    INSERT INTO #Temp (ID)

    SELECT TOP(100000) iom.ID

    FROM dbo.CoffeeClubMember iom WITH (NOLOCK)

    WHERE <<funky WHERE clause here>>

    ORDER BY <<funky ORDER BY here>>

    SELECT @id = iom1.ID

    FROM #Temp s

    INNER JOIN dbo.CoffeeClubMember iom1 WITH (NOLOCK, INDEX (IX_CoffeeClubMember_MemberIDEndDateStartDate_InclCoffeeClubID))

    ON s.MemberID = iom1.MemberID

    AND (s.EndDate IS NULL OR s.EndDate = iom1.EndDate)

    AND (s.StartDate IS NULL OR s.StartDate = iom1.StartDate)

    AND s.ID = iom1.ID

    INNER JOIN dbo.CoffeeClubMember iom2 WITH (NOLOCK, INDEX (UIX_CoffeeClubMember_CoffeeClubIDMemberIDStartEndDate))

    ON s.CoffeeClubID = iom2.CoffeeClubID AND s.MemberID = iom2.MemberID AND s.ID = iom2.ID

    INNER JOIN dbo.CoffeeClubMember iom3 WITH (NOLOCK, INDEX (PK_CoffeeClubMember))

    ON s.ID = iom3.ID

    DELETE iom

    FROM dbo.CoffeeClubMember iom

    INNER JOIN #Temp s

    ON s.ID = iom.ID

    and it completed the same day.

    The priming step takes a variable amount of time depending on what's already in cache (we have a lot of RAM). Each delete takes about half a second, every time, every pass, and there's no blocking. Ever.

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (11/1/2016)


    Jeff, the method Eirikur's describing works really well. We implemented such a method here for cleaning up a substantial table earlier in the year. The existing method was causing some blocking and was estimated to take around six weeks to complete.

    We looked at locking & blocking, logical reads and physical reads for a few cycles of the old method, and noticed that the first few cycles completed in seconds, extending later to minutes. This coincided with a change from logical reads to physical reads. So we decided to try cache priming - getting the required data into cache with a low-cost read before processing it. A single loop of the code we ran with back then looks something like this:

    INSERT INTO #Temp (ID)

    SELECT TOP(100000) iom.ID

    FROM dbo.CoffeeClubMember iom WITH (NOLOCK)

    WHERE <<funky WHERE clause here>>

    ORDER BY <<funky ORDER BY here>>

    SELECT @id = iom1.ID

    FROM #Temp s

    INNER JOIN dbo.CoffeeClubMember iom1 WITH (NOLOCK, INDEX (IX_CoffeeClubMember_MemberIDEndDateStartDate_InclCoffeeClubID))

    ON s.MemberID = iom1.MemberID

    AND (s.EndDate IS NULL OR s.EndDate = iom1.EndDate)

    AND (s.StartDate IS NULL OR s.StartDate = iom1.StartDate)

    AND s.ID = iom1.ID

    INNER JOIN dbo.CoffeeClubMember iom2 WITH (NOLOCK, INDEX (UIX_CoffeeClubMember_CoffeeClubIDMemberIDStartEndDate))

    ON s.CoffeeClubID = iom2.CoffeeClubID AND s.MemberID = iom2.MemberID AND s.ID = iom2.ID

    INNER JOIN dbo.CoffeeClubMember iom3 WITH (NOLOCK, INDEX (PK_CoffeeClubMember))

    ON s.ID = iom3.ID

    DELETE iom

    FROM dbo.CoffeeClubMember iom

    INNER JOIN #Temp s

    ON s.ID = iom.ID

    and it completed the same day.

    The priming step takes a variable amount of time depending on what's already in cache (we have a lot of RAM). Each delete takes about half a second, every time, every pass, and there's no blocking. Ever.

    Interesting, I need to keep this in the back of the brain. Actually, I may want to code a sample myself or I may forget considering my short term memory sucks.

  • Lynn Pettis (11/1/2016)


    ChrisM@Work (11/1/2016)


    Jeff, the method Eirikur's describing works really well. We implemented such a method here for cleaning up a substantial table earlier in the year. The existing method was causing some blocking and was estimated to take around six weeks to complete.

    We looked at locking & blocking, logical reads and physical reads for a few cycles of the old method, and noticed that the first few cycles completed in seconds, extending later to minutes. This coincided with a change from logical reads to physical reads. So we decided to try cache priming - getting the required data into cache with a low-cost read before processing it. A single loop of the code we ran with back then looks something like this:

    INSERT INTO #Temp (ID)

    SELECT TOP(100000) iom.ID

    FROM dbo.CoffeeClubMember iom WITH (NOLOCK)

    WHERE <<funky WHERE clause here>>

    ORDER BY <<funky ORDER BY here>>

    SELECT @id = iom1.ID

    FROM #Temp s

    INNER JOIN dbo.CoffeeClubMember iom1 WITH (NOLOCK, INDEX (IX_CoffeeClubMember_MemberIDEndDateStartDate_InclCoffeeClubID))

    ON s.MemberID = iom1.MemberID

    AND (s.EndDate IS NULL OR s.EndDate = iom1.EndDate)

    AND (s.StartDate IS NULL OR s.StartDate = iom1.StartDate)

    AND s.ID = iom1.ID

    INNER JOIN dbo.CoffeeClubMember iom2 WITH (NOLOCK, INDEX (UIX_CoffeeClubMember_CoffeeClubIDMemberIDStartEndDate))

    ON s.CoffeeClubID = iom2.CoffeeClubID AND s.MemberID = iom2.MemberID AND s.ID = iom2.ID

    INNER JOIN dbo.CoffeeClubMember iom3 WITH (NOLOCK, INDEX (PK_CoffeeClubMember))

    ON s.ID = iom3.ID

    DELETE iom

    FROM dbo.CoffeeClubMember iom

    INNER JOIN #Temp s

    ON s.ID = iom.ID

    and it completed the same day.

    The priming step takes a variable amount of time depending on what's already in cache (we have a lot of RAM). Each delete takes about half a second, every time, every pass, and there's no blocking. Ever.

    Interesting, I need to keep this in the back of the brain. Actually, I may want to code a sample myself or I may forget considering my short term memory sucks.

    Heh - I have the same problem Lynn - in spades.

    I've been writing it up in bits and bobs for what seems like forever, because for us it was a much more significant issue than the tran log, which is what most folks pounce on in this scenario. Showing folks an article with testable code is always going to be better than "trust me".

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (11/1/2016)


    So we decided to try cache priming - getting the required data into cache with a low-cost read before processing it. A single loop of the code we ran with back then looks something like this:

    Seems horribly complex for what it does. Take a look at the code that I posted. Does that not "warm the cache" that you're talking about? For temporal (mostly WORM) tables, it also deletes in contiguous page blocks at the "beginning" of the table far away from the "active end" of the table according to the CI, which makes the deletes nasty fast.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (11/1/2016)


    ChrisM@Work (11/1/2016)


    So we decided to try cache priming - getting the required data into cache with a low-cost read before processing it. A single loop of the code we ran with back then looks something like this:

    Seems horribly complex for what it does. Take a look at the code that I posted. Does that not "warm the cache" that you're talking about? For temporal (mostly WORM) tables, it also deletes in contiguous page blocks at the "beginning" of the table far away from the "active end" of the table according to the CI, which makes the deletes nasty fast.

    If uod.ID is the cluster key then you're going to buffer the relevant pages of the clustered index. Ordinary indexes will not be affected by the read - so they won't be in cache.

    If you've got an index on uod.ID and uod.Created, then the relevant pages of that index alone is all that will be buffered - and the clustered index and any other indexes will still have to be read from disk - during the delete.

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • In most cases, we will see up to 2 times better performance when applying this method, more commonly the difference is greater. Here is an example, results speak for themselves

    ๐Ÿ˜Ž

    Simple test harness

    USE TEEST;

    GO

    SET NOCOUNT ON;

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    --/* -- UNCOMMENT THIS LINE TO SKIP THE RECREATION OF THE TEST DATA SET

    --DECLARE @SAMPLE_SIZE INT = 10;

    IF OBJECT_ID('dbo.TBL_TEST_DELETE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DELETE;

    CREATE TABLE dbo.TBL_TEST_DELETE

    (

    TD_ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DELETE_TD_ID PRIMARY KEY CLUSTERED

    ,TD_TS DATETIME NOT NULL CONSTRAINT DFLT_DBO_TBL_TEST_DELETE_TD_TS DEFAULT (GETDATE())

    ,TD_C001 VARCHAR(4000) NOT NULL

    ,TD_C002 VARCHAR(4000) NOT NULL

    ,TD_C003 VARCHAR(4000) NOT NULL

    ,TD_C004 VARCHAR(4000) NOT NULL

    ,TD_C005 VARCHAR(4000) NOT NULL

    ,TD_C006 VARCHAR(4000) NOT NULL

    ,TD_C007 VARCHAR(4000) NOT NULL

    ,TD_C008 VARCHAR(4000) NOT NULL

    ,TD_C009 VARCHAR(4000) NOT NULL

    ,TD_C010 VARCHAR(4000) NOT NULL

    );

    DECLARE @SAMPLE_SIZE INT = 100000;

    ;WITH T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))

    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N

    FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)

    INSERT INTO dbo.TBL_TEST_DELETE(TD_TS,TD_C001,TD_C002,TD_C003,TD_C004,TD_C005,TD_C006,TD_C007,TD_C008,TD_C009,TD_C010)

    SELECT

    DATEADD(SECOND,NM.N,GETDATE())

    ,REPLICATE('S',4000)

    ,REPLICATE('A',4000)

    ,REPLICATE('M',4000)

    ,REPLICATE('P',4000)

    ,REPLICATE('L',4000)

    ,REPLICATE('E',4000)

    ,REPLICATE('D',4000)

    ,REPLICATE('A',4000)

    ,REPLICATE('T',4000)

    ,REPLICATE('A',4000)

    FROM NUMS NM;

    --*/

    CHECKPOINT 1;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    DBCC FREESYSTEMCACHE ( 'ALL' ) WITH NO_INFOMSGS;

    RAISERROR('COLD DELETE 1

    -----------------------------------------------------------------------------------------------------',0,0) WITH NOWAIT;

    SET STATISTICS IO,TIME ON;

    DELETE X

    FROM dbo.TBL_TEST_DELETE X

    WHERE X.TD_ID < 10001;

    SET STATISTICS IO,TIME OFF;

    CHECKPOINT 1;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    DBCC FREESYSTEMCACHE ( 'ALL' ) WITH NO_INFOMSGS;

    RAISERROR('HOT DELETE

    -----------------------------------------------------------------------------------------------------',0,0) WITH NOWAIT;

    SET STATISTICS IO,TIME ON;

    DECLARE @TD_ID INT ;

    DECLARE @TD_TS DATETIME ;

    DECLARE @TD_C001 VARCHAR(4000);

    DECLARE @TD_C002 VARCHAR(4000);

    DECLARE @TD_C003 VARCHAR(4000);

    DECLARE @TD_C004 VARCHAR(4000);

    DECLARE @TD_C005 VARCHAR(4000);

    DECLARE @TD_C006 VARCHAR(4000);

    DECLARE @TD_C007 VARCHAR(4000);

    DECLARE @TD_C008 VARCHAR(4000);

    DECLARE @TD_C009 VARCHAR(4000);

    DECLARE @TD_C010 VARCHAR(4000);

    SELECT

    @TD_ID = TD.TD_ID

    ,@TD_TS = TD.TD_TS

    ,@TD_C001 = TD.TD_C001

    ,@TD_C002 = TD.TD_C002

    ,@TD_C003 = TD.TD_C003

    ,@TD_C004 = TD.TD_C004

    ,@TD_C005 = TD.TD_C005

    ,@TD_C006 = TD.TD_C006

    ,@TD_C007 = TD.TD_C007

    ,@TD_C008 = TD.TD_C008

    ,@TD_C009 = TD.TD_C009

    ,@TD_C010 = TD.TD_C010

    FROM dbo.TBL_TEST_DELETE TD

    WHERE TD.TD_ID < 20001;

    DELETE X

    FROM dbo.TBL_TEST_DELETE X

    WHERE X.TD_ID < 20001;

    SET STATISTICS IO,TIME OFF;

    CHECKPOINT 1;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    DBCC FREESYSTEMCACHE ( 'ALL' ) WITH NO_INFOMSGS;

    RAISERROR('COLD DELETE 2

    -----------------------------------------------------------------------------------------------------',0,0) WITH NOWAIT;

    SET STATISTICS IO,TIME ON;

    DELETE X

    FROM dbo.TBL_TEST_DELETE X

    WHERE X.TD_ID < 30001;

    SET STATISTICS IO,TIME OFF;

    Output

    COLD DELETE 1

    -----------------------------------------------------------------------------------------------------

    SQL Server parse and compile time:

    CPU time = 155 ms, elapsed time = 306 ms.

    Table 'TBL_TEST_DELETE'. Scan count 1, logical reads 90367, physical reads 462, read-ahead reads 6223, lob logical reads 90000, lob physical reads 6756, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1125 ms, elapsed time = 7132 ms.

    HOT DELETE

    -----------------------------------------------------------------------------------------------------

    SQL Server parse and compile time:

    CPU time = 345 ms, elapsed time = 1219 ms.

    Table 'TBL_TEST_DELETE'. Scan count 1, logical reads 12620, physical reads 1, read-ahead reads 2666, lob logical reads 90000, lob physical reads 5620, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 297 ms, elapsed time = 3144 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table 'TBL_TEST_DELETE'. Scan count 1, logical reads 107747, physical reads 0, read-ahead reads 0, lob logical reads 90000, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 906 ms, elapsed time = 1810 ms.

    COLD DELETE 2

    -----------------------------------------------------------------------------------------------------

    SQL Server parse and compile time:

    CPU time = 391 ms, elapsed time = 1272 ms.

    Table 'TBL_TEST_DELETE'. Scan count 1, logical reads 108950, physical reads 123, read-ahead reads 1248, lob logical reads 90000, lob physical reads 5620, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1250 ms, elapsed time = 7753 ms.

    ....to be continued ๐Ÿ˜‰

  • Wow... That's just silly how well that works! And I've tried a couple of different methods to beat it and haven't been able to even in the SIMPLE Recovery Model. Well done!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply