Table Partitioning - Existing Tables (V. Large)

  • I was wondering what the best strategy would be for partitioning existing tables. The main table we are interested in is a table called FileUploads which is 2.2Tb (7.1 million rows) and growing. Most of this data is BLOB data. The row size is about 37 bytes excluding the BLOB data. We are planning on using filegroups and switching to a filegroup backup strategy. This would reduce the time it takes for us to backup/restore.

    I think we would look to partition the table based on the FileID (an identity specification column).

    We are looking to keep any downtime to a minimum.

    The table itself only ever experiences INSERT/SELECT activity - no updates. I was thinking we could create a partitioned copy of the table and insert data to the new table in batches. This should allow users to continue accessing the table during this time (Although this method is likely to be VERY slow). Once the new table is ready we can prevent end users from accessing the database, copy any remaining rows and rename the table to the same name as the old table. Does this sound like a reasonable plan? We should have enough space to duplicate the table on our SAN. Any better ideas?

    We also have some other tables we are interested in partitioning. We have a journal table that has over 60 million rows (4.6Gb excluding indexes) and another sessionlog table that has over 236 million rows (25.8Gb excluding indexes). I think partitioning these tables will be more for performance reasons than anything else. The Journal table has a JournalType column that is used in the where clause of most queries so I thought this might be a good candidate for the partitioning column (27 distinct entries). There is also a date column that is a potential candidate. The sessionlog table also has a date column that we will probably look to partition the table by. Both the journal and sessionlog table also experience only INSERT/SELECT activity (No updates) so we could probably use the strategy mentioned above for these tables also.

    You input is much appreciated.

    Many Thanks,

    David

    DBA Dash - Free, open source monitoring for SQL Server

  • Very interesting, and I have ideas, but they're probably ones you've thought of and since I don't have any direct experience with something this scale in 2005, I hope someone else chimes in.

    I might contact MS consulting services and ask the question of them. They're likely to have more data on how things perform during a move. Might be worth a few hours of consulting time to be sure this doesn't impact your system too much.

  • partitioning tables is pretty easy and fairly painless, however I would say that you need to plan it carefully and work out exactly what your partitioning criteria will be, then create a test environment and test it. I'd suggest you need to be able to place the database into simple recovery whilst you move data into your new partitioned table(s).

    I'd make the usual point about not reading and writing to the same disk array, so create your new partitioned tables on new arrays ( which don't share spindles with the old data ) how long it will take is dependant upon many factors - I did some testing of putting data into partitioned tables , there were 7 partitions and I generated some 70 million rows which included 2 text columns holding just over 8k each, I managed just under 2 thousand rows per second. What you need to make sure is that all your filegroups and t-log are pre-allocated with space so there are no autogrow events - I also had 8 files per filegroup ( 4 x dual core box ) the storage was an external DAS array.

    you need to make sure your indexes and such are carefully planned too as just like partitioned views if you run selects which scan you get really bad performance.

    as with all things, test, test and test again. I don't think anyone can give you a precise idea of how long it might take.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • the tables you're talking about are fairly small too - create your secondary indexes on your partitioned tables after population - the size of your current indexes have no real bearing. I would do this off line, although you could pre-populate with the older data prior to switch over - the only worru I would have is object names - I don't think re-naming a partitioned table would be something I'd want to try ( if you could ) or advise.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Maybe even a view can help out, pointing to a staging table where your active inserts happen during migration time.

    Selecting from both the "old" table and the staging table.

    Copying data to the new partitioned table from the old table and only during a minimal switch time you could prevent inserts into the staging table so cyou can copy that data to then new partitioned table and then pointing the view to your partitioned table or even rename it and rename your partitioned table for use.

    Also test restores on partitioned objects:

    - file(group) restore

    - online restore

    - piecemeal restore

    - point in time restores for partitioned objects !

    As Steve suggested, maybe it's worth investing into some consulting if this system is critical for your company.

    Be sure your partitioned table has alligned objects !

    Read these articles !!

    Planning Guidelines for Partitioned Tables and Indexes @ msdn2.microsoft.com/en-us/library/ ms180767.aspx

    Partitioned Tables and Indexes in SQL Server 2005 @ http://msdn2.microsoft.com/en-us/library/ms345146.aspx

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • without wanting to start a whole new thread within a thread - on-line restores, filegroup restores and backups etc. are a whole different area to partitioned tables. I don't find the partial restore / filegroup restore quite as functional as I'd actually like it to be; and most critical if you want to do on-line filegroup restores you musn't put any user in the primary filegroup.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Thanks for the responses so far.

    I'm not planning on doing anything to our production server without testing it first on our QA environment. 🙂

    I've already started playing about with table partitioning on our QA server. Unfortunatly we don't have enough disk space to do a full test of the fileuploads table in our QA system - we need to purchase some additional disks.

    I think I've already read the articles mentioned - will read them again though!

    Just wanted some feedback on my proposed strategy really and was wondering if anyone had any alternative suggestions. We are more or less a 24/7 system (we get the odd maintenance window) so downtime needs to be kept to a minimum. Any solution will be tested before modifying our production database. The main worry is the fileuploads table - at 2.2Tb it's going to take some time to partition...

    DBA Dash - Free, open source monitoring for SQL Server

  • I had a similar task in sql 2000 with partitioned views - we'd trended our server so knew exactly when it was quiet - it was an international application so was effectively 6 x 24 - we scheduled the movement of the data into partitioned tables ( via the views ) over a week until we could use sunday, which was the quietest day, to make the switch - we managed without any real downtime and had less than a day's data to do at switch over - we generated about 1 million rows per day in one table and about 100k per day in the other to give an idea of scale. Getting the secondary indexes right is most crucial - I used articles by Kimberly Tripp to help me. I will say I did do lots of testing of both 2000 and 2005 and I'm fortunate to have 5 servers at home to run tests against so i'm usually well up with anything like this I want to do. btw I'm not totally sad - as a consultant/contractor I need to be able to do this sort of stuff before I get to a client site!

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • colin Leversuch-Roberts (12/18/2007)


    ... I don't find the partial restore / filegroup restore quite as functional as I'd actually like it to be....

    Indeed, for the moment we're doing some test to get our DRP for partitioned objects in place. Maybe an idea for a next SSC art 😉

    Perhaps Allen can give you some tips from the field as he's just performed such kind of action on their system.

    http://sqlblog.com/blogs/allen_white/archive/2007/11/26/very-large-database-maintenance.aspx

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • I might write something up - I have the partitioning documented as it was using lots of filegroups with multiple files so was quite complex - it also used the sliding window scenario to move data out for archiving. The database could grow by 50gb in a day ( or more ) and generate 100's of millions or rows each day. It was the bringing back of data from read only filegroups by partial restores which didn't quite work as well as I wanted. I also found difficulties changing the partition functions whilst data was being input into the partitioned table - I will revisit it as I've built a couple of new 64bit servers with lots of disks.

    I totally agree about using "proper" data sizes, I've seen so many problems where limited test data sets are used.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Here's another approach; the main difference should be faster execution time at the cost of more code to write. So, up front, you've already decided what your partition criteria will be and how many partitions you will use given the existing data. Suppose you will partition on File ID and will start off with 8 partitions; 1 to 1MM, >1MM to 2MM, ... >6MM to 7MM, and >7MM. Run 8 SELECT INTO statements using a WHERE clause to create 8 tables containing the data from the source table with the File ID values that match each partition. For each of the 8 tables, ALTER TABLE to add a check constraint on the File ID which matches the rule that will be enforced by the partition function. Then you can start building the partitioned table. The initial step is to create the empty table with an empty partition - File ID < 1. So that step involves, creating the partition function, creating the partition scheme and creating the empty table on the scheme (this will be the production table). Then it's just constant-time operations for each of the 8 tables: a) to create a new partition function, scheme, and table; b) switch the table to new partition table; c) alter the production partition scheme to NEXT USED; d) alter the production partition function with SPLIT RANGE; and e) alter the new partition table from b) using SWITCH PARTITION to the new partition in the production partition table. Hopefully this gives you the general idea, I'm pretty sure it will be much faster than running inserts since SELECT INTO is way faster, and the only other execution time is building the check constraints. Last step is to index the table if you need to; depending on what you need, you might have to manage the indexes on each of the temp tables as part of the build process.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply