Deleting rows and table size is growing?

  • I have a table that had 770 million rows in it. I have already copied over the historical data to another archive database, and now I am in the process of deleting the rows that meet the historical criteria. The thing is is that the table is growing, at least according to sp_spaceused. I did change the recovery mode from full to simple. My thinking was that I wouldn't accumulate such a large transaction log that I would need to truncate that often.

    At one point the data size of the table was 62,388.328 MB but now it's 68,480.672 MB but I have deleted around 130 million rows. Why would it be growing? Further more how should I go about reducing the size? I don't shrink would necessarily work since the unused space is only 274,880 KB.

    What am I missing?

    Also here is the script I am using

    set rowcount 100000

    WHILE 1=1

    BEGIN

    delete top(100000)

    FROM rates_history

    where rates_history.last_update <= '2011-05-24 00:00:00.000'

    IF @@ROWCOUNT = 0

    BREAK

    END

  • I wouldn't necessarily trust sp_spaceused, especially in a short period of time.

    If you are deleting rows, the size is going down. Garbage collection, and perhaps lots of free space in the middle of the indexes might make it appear larger.

    Finish deletes, clean up your clustered index, and then check.

  • It seems likely the table is a heap (i.e. there is no clustered index). Empty pages are not automatically deallocated when deleting from a heap, unless a WITH (TABLOCK) hint is used. In SQL Server 2008, you can also use the ALTER TABLE ... REBUILD syntax after all the deletes are complete to reclaim space. This is not available in 2005, so you would need to create a clustered index (and then drop it, if it is important the table retains a heap structure).

    There is no value in setting ROWCOUNT and using TOP in your delete. Using TOP alone is the preferred way to specify how many rows to delete (at most).

  • The table in question is does have a primary key/clustered index.

    As for the ROWCOUNT, noted. There are various resources that say how one should limit the amount of work done by a query, and there isn't a clear consensus on the appropriate method. However, BOL do state that ROWCOUNT is ignored if top is used. Regardless, that was my ignorance showing.

  • dclark 19140 (5/25/2012)


    The table in question is does have a primary key/clustered index.

    Thanks. Perhaps the problem is ghost cleanup instead? The following link contains information about this, as well as some links to bugs that have been fixed (you didn't say what build of SQL Server you are running):

    http://www.sqlskills.com/blogs/paul/post/inside-the-storage-engine-ghost-cleanup-in-depth.aspx

    Depending on the edition of SQL Server you have, you might find partitioning this table makes sense. The following document also includes some great information on performing bulk changes efficiently (it is targeted at 2008, but much of the information is applicable to 2005 too):

    http://msdn.microsoft.com/en-us/library/dd425070.aspx

    However, BOL do state that ROWCOUNT is ignored if top is used.

    Yes. The page also says:

    Important:

    Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server. Avoid using SET ROWCOUNT together with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. Also, for DELETE, INSERT, and UPDATE statements that currently use SET ROWCOUNT, we recommend that you rewrite them to use the TOP syntax.

    The current version of this entry (the above is from 2005) has this instead:

    Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax.

  • I am running Microsoft SQL Server 2005 - 9.00.5000.00 Workgroup

    Regardless, I took a look for the ghost process, and one is running. The ghost records are little over my head at the moment. As for partitioning, eventually I want to get to something that is structured better than what I have. It seems that this database has never really been in the hands of someone with the appropriate knowledge to administer it. I am attempting to nurse it back to health, and I thought the first step is dealing with this one monstrous table. It alone is about 3/4 of the total size of the database, and it is making backups impractical by generating incredibly large full backups which are filling up the server. That is why I opted to move most of this history table into an archive, especially since none of the users even knew that data like this even existed. Once the old data is off and smaller backups are possible I can focus on other things, such as partitioning the table or figuring out if it really should be growing as large as it is.

    And yes I have changed my script to just use top now.

  • One other odd thing, and maybe it has to do with sp_spaceused. I noticed that the row number wasn't dropping. There was no progress at all. I tried to stop and start it with no results. I then decided to reorganize the (non-clustered) index that was on the date column. It took about 40 minutes, but once it was done I was able to see the deletes going again and at a faster pace. I don't doubt that this helped, but I am curious as to why the deletes were getting gummed up in the first place where sp_spaceused was showing no progress.

    At this rate I am tempted to stop the deletion and rebuild the clustered PK index to see what happens and restart the deletions, or should I just let it go?

  • If sp_spaceused is the same a 2008, then there are queries against the allocation units system DMV. This is not necessarily going to be 100% accurate at a point in time with the ghost cleanup running, and the deletes going. It's asking for a lot over overhead to keep this up to date in real time.

    If the deletes are running, let them run. after a large delete, I would think about updating the clustered index, depending on fragmentation. You might check that, but check sp_spaceused after things are done and the system gets to a steady state.

    You shouldn't be worried about this. Millions of people have run millions of deletes of scale in SQL Server. It works, and the storage cleans up over time. Sometimes sp_spaceused gets out of whack, but that isn't a reflection on the core operation of the database engine. 770 million rows is far from a strain on the engine, so I would caution you not to try and micromanage the instance.

  • Still running at this point, but interesting development. It looks like the database outgrew its current allocation and took in another 10GB from the file system. This is also confirmed by checking the free space on the drive where the database is. Regardless, I am going to wait it out and then rebuild the clustered index at the end.

    I just have never deleted this much data from a table let alone delete this much to reduce the total size of the database.

  • So incredibly happy right now. The index rebuild on the two indexes for the table (one clustered and one non) finished, and the total size of the table is now just around 7GB. At its peak it was around 92 just after the deletes finished. The rebuild took 2 and a half hours, but now all is good. I just need to shrink the db to get back all of that excess space. Do some more work on 3 other tables, shrink, and then rebuild the indexes.

    I need to shrink twice because I only have 17GB available on the drive right now, and I would feel more comfortable if I got back the extra 80 or so before doing the work on the three other tables. The second shrink is to recover the space from those tables. The final rebuild will put my database back where it should be.

    Regardless, the "big boy" has been slayed and now I just need to pick off the next largest tables which weigh in at 8GB, 2GB, and 2GB. So happy right now.

  • A couple things.

    First, glad things cleaned up for you.

    Second, if you shrink the database, you may undo all the work you did in rebuilding the clustered index to remove fragmentation. Shrink throws extents back to beginning of the file, without any ordering.

    Third, you always need some free space for maintenance operations.

    Fourth, you should monitor space and add it in chunks when needed. Do not allow autogrow to manage space for you.

    With all that, plan your shrink. Know how much extra space you need for maintenance, usually around 2x your largest table + allow for data growth for a few months. That pad is needed in SQL Server, and do not shrink below it.

  • Thanks for the advice. Believe me I never what to shrink again, and hopefully with proper management I never have to. I needed to shrink because I had an outrageous amount of free space on the database. The shrink freed up about 90-100GB. Then I cleared out the next three tables. Then another shrink to reclaim that space. (I couldn't do it in one swoop because by the end of the first delete I performed I only had 17GB left). Then after all of that I used the dba_indexDefrag_sp by Michelle Ufford at sqlfool.com. <edit> I believe I also rebuilt the huge database in question as well </edit>

    Regardless, I am now at a 18 GB full backup (used to be 85), and I am creating the maintenance plans to do regular full, differential, and transaction log backups. I have a long way to go until I am comfortable with the database, but at least I feel relieved that it isn't going to blow up on me.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply