mysterious table growth

  • the ghost_record_count query has been going over 5 hours now and has over 1100 rows. I stupidly ran it to grid results, which isnt likely to give me the FULL results till its done, but...

    is the value in ghost_record_count the number of rows that have not yet been removed from the system by ghost cleanup?

    if so, I think we're onto something. I have some records returned thus far with the value for ghost_record_count in the 10's of millions.

  • also of note, and I didnt make this connection till now:

    the OTHER table that is exhibiting this behavior AND the original table are the two with the highest volume of data turn-over.

    they get tons of new data, and have a fixed days of data retention... purges running on both every day removing a huge volume of data. looks like its being marked for deletion, but ghost cleanup isnt doing its thing. which, if this is a correct theory, would explain neatly whats really going on here.

  • How are you doing your purges? Maybe a different method than what you are doing will help? For example, we use partitioning. We then switch the partition to a staging table, bcp out, then drop the staging table.

    Thanks,

    Jared

    Jared
    CE - Microsoft

  • its all in a single partition/filegroup, and all in one big fat nasty table.

    purges are done via cursor or where statement etc so we can control how long the purge runs for, and a number of records to delete per run. it loops through the data and issues a bunch of deletes against a smallish number of rows at a time, a few thousand per execution till its time is up, or no more qualifying rows exist. its designed to follow the columns in the clustered index.

  • Well, you either have to change your purge to fix this, or enhance it. Look at this and surely test it first:

    http://serverfault.com/questions/23446/ghost-cleanup

    Thanks,

    Jared

    Jared
    CE - Microsoft

  • LAW1143 (9/28/2011)


    the ghost_record_count query has been going over 5 hours now and has over 1100 rows.

    1100 rows seems way too high for 1 table (unless you have literally hundreds of indexes). Did you mistype or neglect to replace "DatabaseName" and "dbo.TableName" with appropriate values?

  • I totally agree with Jared about changing the purge method. You don't have many choices when dealing with such big volumes. The good news is that your clustered index is on date (leading) and is perfect candidate to be partitioned on a monthly scheme. As Jared explained you can switch old partition (3 month old in your case) out to staging table, which is metadata operation, than bcp it out to flat file if required and simply drop that staging table. No headache with deletes 🙂

  • for the sake of argument/discussion, regarding the purge method/ghost_cleanup not being logically capable of keeping up.

    Why has this purge worked for a couple of years without issue/incident on sql 2005? The problem began roughly 3 weeks after a migration to sql2008 SP2.

    We're using this same purge method with a slew of other tables in other databases/environments, but admittedly just not on a system of this magnitude.

    If its a factor of sheer data volume, why didnt we see a GRADUAL rise in size as it started to get further and further behind? What we've observed is a very SUDDEN increase in reserved size of data and indexes. Before whatever happened happened, size and row count were holding pretty even. one day, it began growing at an exponential rate.

    thanks for the link on the GHost Cleanup articles. great reading for later when I have some time.

    and I did qualify the select with the correct db/table name, so the results so far ARE all associated with this single object.

    ALSO,

    the query completed in 7 1/2 hours. 1200 rows, 3.3Billion rows total in the ghost_record_count column.

    AND AGAIN: Thanks a ton for everyone taking a moment to consider and address my problem! Its VERY much appreciated!

  • and using the query against dm_exec_requests from Paul Randal's blog, I see only one active ghost_cleanup task, with a start_time of almost 48 hours ago.

    so this isnt an issue of ghost cleanup not being adequate in its 10 pages every 5 seconds logic, its just not running as quickly as we need it to.

    edit: however on successive runs, the SPID is always 12, 13, or 14 with a differnt login time.

  • My guess is that you are seeing a combination of things.

    1. Ghost records growing (plus if your data is growing at a rate of 2% that means that the ghost clean up is getting behind exponentially)

    2. I still think you may have some data overflow with your growing primary key. As soon as a row becomes over 8kb, it gets added to an overflow page that takes more data. Plus, that key gets added to all non-clustered indexes.

    Maybe other things....

    Jared

    Jared
    CE - Microsoft

  • found this:

    http://www.sqlskills.com/BLOGS/PAUL/post/Turning-off-the-ghost-cleanup-task-for-a-performance-gain.aspx

    I'm not recommending that you turn it off, but it mentioned that index reorgs and rebuilds will remove ghost records.

    So, if you take the output from the previous query and group by object_id, index_id, you can identify which indexes have the most ghost rows and reorganize them. This should get space back fairly quickly. And because reorgs are both an online operation and can be stopped without losing work, I'd kick one off, let it run for 10-15 minutes, stop it and see how much space you free up.

    good luck.

  • You may find it an out-of-track suggestion but have you considered creating a new table splitting it into multiple files (may / may not be partitioned) with clustered index (may / may not partitioned). Then populate it with your old data.

    Some times version upgrade results in such mysterious results and we don't find justifications for it.

  • SpringTownDBA (9/29/2011)


    found this:

    http://www.sqlskills.com/BLOGS/PAUL/post/Turning-off-the-ghost-cleanup-task-for-a-performance-gain.aspx

    I'm not recommending that you turn it off, but it mentioned that index reorgs and rebuilds will remove ghost records.

    So, if you take the output from the previous query and group by object_id, index_id, you can identify which indexes have the most ghost rows and reorganize them. This should get space back fairly quickly. And because reorgs are both an online operation and can be stopped without losing work, I'd kick one off, let it run for 10-15 minutes, stop it and see how much space you free up.

    good luck.

    thats a great link, thanks for that!

    another interesting note that might explain some of this... in parentheses he mentions that its every 10 seconds for sql 2008, vs 5 seconds for 2005. wonder what the reasoning behind that change is or was?

    the MS rep and I noticed they have pretty high reads using Paul Randal's other query to capture from dm_exec_requests and we also discussed a reorg yesterday.

  • reorg has been running for ~30 minutes and table size went from 730gb to 729gb (as returned by sp_spaceused which has its margin of error of course).

    considering the quantity of data coming into this table, thats not unreasonably bad.

    edit: and after 1 hour, over 50gb gone!

    so I know how to clean it up, still dont fully understand why it got behind. if it was a sheer matter of volume, I expect we'd have seen a gradual curve of space increase, not a sudden steady jump.

  • Final follow-up... I forgot to update this thread with my findings...

    We were able to get ghost-cleanup to make a great deal of progress by way of index re-organizations.

    but it didnt recover ALL of the space that would bring us back to quite where i needed it to be.

    we hit our monthly maintenance cycle, and for a few unreleated reasons restarted SQL server.

    2 days later, table sizes are right back in line with what we saw before this crazy sceario started.

    so this is NOT a factor or us exceeding the capabilities of the ghost-cleanup routine. It just wasnt working optimally and needed a SQL bounce.

    finally, thanks again for the feedback of all the folks who helped me out with this and contributed ideas and suggestions. I learned alot through this fire-drill. unfortunately it also reenforced the notion that sometimes, the best solution is "reboot"

Viewing 15 posts - 31 through 44 (of 44 total)

You must be logged in to reply to this topic. Login to reply