When is shrinking a data file acceptable?

Question

Post reply

When is shrinking a data file acceptable?

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply

Dan.Humphries SSChampion Points: 12125 More actions · Answer 1

Not to beat the proverbial dead horse here but I thought I would point out one such situation where we do a regular shrink of database files. I am sure there will be more than a few disagreements to this but it works for us all the same.

We have a database that normally runs in the range of about 100 GB and it stays that way most of the month. Once a month however they run a process that increases the size to nearly a TB. Now I know what you are thinking it must be a poorly designed process but this process is for a Power Substation and it pulls in data from something like 90 different servers and runs some calculations that when I tried to understand them frankly made me feel slightly insecure about myself.

Now my belief with this situation is the DB is used 24/7/365 and I want to minimize impact. So after the process I reduce the size thus reducing the amount time it takes to run a full backup and other tasks. I am sure many will disagree but alas I do what works for the situation.

Dan

If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 2

Ninja's_RGR'us

SSC Guru

Points: 294069

June 16, 2011 at 2:18 pm

#1340072

How does that reduce backup time?

And by how much?

Dan.Humphries SSChampion Points: 12125 More actions · Answer 3

Dan.Humphries

SSChampion

Points: 12125

June 16, 2011 at 2:33 pm

#1340076

The Full DB Backup runs about an hour when it is at a TB but less then 10 minutes when it is reduced to the 100GB size.

Dan

If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

Jeffrey Williams SSC Guru Points: 89773 More actions · Answer 4

Dan.Humphries (6/16/2011)

The Full DB Backup runs about an hour when it is at a TB but less then 10 minutes when it is reduced to the 100GB size.

I don't see how - backup only backs up what is used. It doesn't matter how large the file actually is.

So, if it only takes 10 minutes to backup 100GB - and an hour to back 1TB, it is probably backing up almost 1TB of data.

Jeffrey Williams
Problems are opportunities brilliantly disguised as insurmountable obstacles.

How to post questions to get better answers faster
Managing Transaction Logs

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 5

Ninja's_RGR'us

SSC Guru

Points: 294069

June 16, 2011 at 3:16 pm

#1340089

Ya but to pick only the used pages you must scan all pages (even if only the header). So that's an extra 200 - 900GB??? to extract from the san.

That's the only way I see to have such a huge difference in the times... for the same backup file size.

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 6

Ninja's_RGR'us (6/16/2011)

Ya but to pick only the used pages you must scan all pages (even if only the header). So that's an extra 200 - 900GB??? to extract from the san.

No you don't. You just have to read the GAM and SGAM pages every 2 or 4 GB (I forget which) of the file. They're allocation bitmaps that say what state the extents in their interval are in.

One possible reason is lots of partially used extents. Shrink moves data page by page. Backup backs up extents where one of more pages have data. So if there's lots of places where an extent has only a couple used pages, the backup has to back the full extent up.

If that is the case, index rebuilds will have the same effect as the shrink in 'compacting' pages and reducing backup time and size

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 7

GilaMonster (6/16/2011)

Ninja's_RGR'us (6/16/2011)

Ya but to pick only the used pages you must scan all pages (even if only the header). So that's an extra 200 - 900GB??? to extract from the san.

No you don't. You just have to read the GAM and SGAM pages every 2 or 4 GB (I forget which) of the file. They're allocation bitmaps that say what state the extents in their interval are in.

One possible reason is lots of partially used extents. Shrink moves data page by page. Backup backs up extents where one of more pages have data. So if there's lots of places where an extent has only a couple used pages, the backup has to back the full extent up.

If that is the case, index rebuilds will have the same effect as the shrink in 'compacting' pages and reducing backup time and size

Awesome... that's why we need mcms around here. So you can't stop now :hehe:.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 8

Ninja's_RGR'us

SSC Guru

Points: 294069

June 16, 2011 at 3:48 pm

#1340109

By any chance, are the tables being hit really hard HEAPS?

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 9

Dan.Humphries (6/16/2011)

Not to beat the proverbial dead horse here but I thought I would point out one such situation where we do a regular shrink of database files. I am sure there will be more than a few disagreements to this but it works for us all the same.

We have a database that normally runs in the range of about 100 GB and it stays that way most of the month. Once a month however they run a process that increases the size to nearly a TB. Now I know what you are thinking it must be a poorly designed process but this process is for a Power Substation and it pulls in data from something like 90 different servers and runs some calculations that when I tried to understand them frankly made me feel slightly insecure about myself.

Now my belief with this situation is the DB is used 24/7/365 and I want to minimize impact. So after the process I reduce the size thus reducing the amount time it takes to run a full backup and other tasks. I am sure many will disagree but alas I do what works for the situation.

1) what happens to this 900GB of added data after the calculations? It MUST be deleted because you database goes back to 100GB after the fact (and you shrink it). Thus the deleted that 'chunks' up space could be 'recompacted' by doing index rebuilds.

2) Even if data is spread around extents throughout the 1TB file, you could apply the SP/CU that gets backup compression for SQL 2008 and your backups will drop back to 100GB worth of size (actually less) even without the shrink or index rebuild.

3) Seems to me that (without many details here) the best idea BY FAR is to refactor this data import into a staging database and do processing with it back into the production database and then truncate the data in stage. You could leave that database at 1TB so it never has to grow.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

JamesMorrison Hall of Fame Points: 3213 More actions · Answer 10

Here is a scenario where it makes sense.

We have multiple Silos with many facilities assigned to different Silos. When we created the Silo3 databases, we basically just copied the Silo2 database to the Silo3 server, then activited the facilites for Silo3. On the prior Silo2 we deactivated those facilities that would now be maintained on Silo3.

We had to do this because a data delete process did not exist yet. Now that we have a good data delete process, we can go back to Silo 2 and Silo 3 and remove the facility data that doesn't need to be on those systems.

Delete Silo 3 data from Silo 2.

Delete Silo 2 data from Silo 3.

This will likely reduce the overall size requirements for both systems. After that shinking the .MDFs will be done.

We are using Netapp storage, which allows us to also shrink the drives and give that space back to the SAN to be used elsewhere.

We are talking about hundreds of GB worth of space that can be returned to the SAN.

If we left all of that white space in the .MDF it would be a year or two before we fill it up again.

Dan.Humphries SSChampion Points: 12125 More actions · Answer 11

You know I don't know that I really gave it much thought. My mind just said that made sense and I went with it. But now that I have read some of the replies I think I need to look into why it is taking so much longer.

Dan

If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 12

Dan.Humphries (6/17/2011)

You know I don't know that I really gave it much thought. My mind just said that made sense and I went with it. But now that I have read some of the replies I think I need to look into why it is taking so much longer.

My first instinct still stands. Heaps can have pages ALL OVER the place and cause this. I forgot who posted this but he had a table with a few 100 rows that had 15 GB space used (no blobs). Created clustered index and fixed the problem.

deep.tkt SSC Veteran Points: 201 More actions · Answer 13

We have had a MSSQL database for data warehouse of size 1.5TB and recently with the purging and arching we were able to cleanup some data and now we have 900GB for free space in data files. We have two different data file sitting in different LUNs which are made from same disk array of EMC storage.

Our goal is to shrink/resize database without hitting any snag with performance in most efficient manner. Should we shrink the database or data files or create new data files with smaller size and move objects to new files and drop bigger files when empty.

If we use rebuilding clustered indexes to move objects between data file what is impact where there is 100GB clustered index ?

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 14

Ninja's_RGR'us (6/17/2011)

Dan.Humphries (6/17/2011)

You know I don't know that I really gave it much thought. My mind just said that made sense and I went with it. But now that I have read some of the replies I think I need to look into why it is taking so much longer.

My first instinct still stands. Heaps can have pages ALL OVER the place and cause this. I forgot who posted this but he had a table with a few 100 rows that had 15 GB space used (no blobs). Created clustered index and fixed the problem.

I too have seen multi-GB heap tables with a single row. :w00t:

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 15

deep.tkt (1/24/2013)

We have had a MSSQL database for data warehouse of size 1.5TB and recently with the purging and arching we were able to cleanup some data and now we have 900GB for free space in data files. We have two different data file sitting in different LUNs which are made from same disk array of EMC storage.

Our goal is to shrink/resize database without hitting any snag with performance in most efficient manner. Should we shrink the database or data files or create new data files with smaller size and move objects to new files and drop bigger files when empty.

If we use rebuilding clustered indexes to move objects between data file what is impact where there is 100GB clustered index ?

Personally I would build a new database (starting it with necessary size for all data to be migrated, their indexes AND 12-18 months of growth's worth of EMPTY SPACE) and migrate the data over. You can do that with a minimally-logged operations so it should be VERY fast. Build out your non-clustered indexes and you have everything just right: no internal OR external OR OS File fragmentation (assuming contiguous file space available on LUNs) AND 100% accurate/fresh index statistics too. Oh, don't forget to size the tlog properly from the get-go also. You may also want to manually create some column stats as well (or get SQL to do them automatically by stressing the system with expected reports) before turning it live.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service