Implementing Partitioning on 100mil+ row table with >30 cols

  • Ever been in the interesting position of having a 3rd party software solution that's poorly written to start with and was never meant to scale to the point your organization has? Surely!

    My scenario uses 2 tables per frequently used object - name1, and name1archive. A process runs monthly to move flagged items (using "flagged" loosely) from name1 to name1archive. Due to the sheer volume of was-transactional data and the number of columns involved, this archive table has grown to 100 million rows and is 32 columns wide. Vendor's solution, instead of seriously reapproaching the design of the product, throws index after index at the problem. we now have 23 indexes to compensate for what is essentially the 23 most common queries produced by the application. Nice, huh? I've got 17gb of data and nearly 100gb of indexes as a result - and dismal performance to boot. The problem with this design is that i grow 2-3 million rows per month. The data doesn't grow too badly, but the indexes grow 9:1 with the data. Yuck!

    So I'm working on ways to 'DBA AROUND' their pukey design. Compression is an element and makes a 40-45% difference in the space allocated. Great. Now lets try to mix that with partitioning to attempt disk and memory isolation of the data ranges stored in this mammoth table. I'm going to go out on a limb and suggest that most admins aren't intimately familiar with partitioning - I'm certainly not. However, I've read all the entry guides and such and definitely want to experiment with it and attempt to harvest some value.

    I used SSMS to produce a single partition creation script that did this (truncated to reduce volume and not expose vendor's code in any way). I added date stamps between each command to track how long each step took. The gap with dates only are a bunch of nonclustered index create statements.

    Aug 10 2011 3:12PM

    Create partition function ufn_name1store

    Aug 10 2011 3:12PM

    CREATE PARTITION SCHEME [usch_name1] AS PARTITION [ufn_name1store]

    Aug 10 2011 3:12PM

    ALTER TABLE [dbo].[name1Archive] DROP CONSTRAINT

    Aug 10 2011 4:39PM

    ALTER TABLE [dbo].[name1Archive] ADD CONSTRAINT

    Aug 10 2011 4:39PM

    CREATE CLUSTERED INDEX [ClusteredIndex_on_usch_name1_634485239841214519]

    Aug 10 2011 6:37PM

    DROP INDEX [ClusteredIndex_on_usch_name1_634485239841214519]

    Aug 10 2011 8:26PM

    CREATE NONCLUSTERED INDEX [someindex1]

    Aug 10 2011 8:30PM

    Aug 10 2011 8:32PM

    Aug 10 2011 8:39PM

    Aug 10 2011 8:42PM

    Aug 10 2011 8:46PM

    Aug 10 2011 8:52PM

    Aug 10 2011 8:58PM

    Aug 10 2011 9:04PM

    Aug 10 2011 9:09PM

    Aug 10 2011 9:15PM

    Aug 10 2011 9:21PM

    Aug 10 2011 9:26PM

    Aug 10 2011 9:31PM

    Aug 10 2011 9:37PM

    Aug 10 2011 9:40PM

    Aug 10 2011 9:43PM

    Aug 10 2011 9:48PM

    Aug 10 2011 9:53PM

    Aug 10 2011 9:58PM

    Aug 10 2011 10:03PM

    Aug 10 2011 10:07PM

    Aug 10 2011 10:11PM

    CREATE NONCLUSTERED INDEX [lastNCindex]

    Aug 10 2011 10:15PM

    COMMIT TRANSACTION

    What i'm curious about here - is all of this necessary? The drop/add constraint, the create then drop clustered index of some random name. Interestingly the PK/Clustered index is NOT listed by SSMS. The transaction took 7 hours and encountered loads of CXpacket waits while attempting to do that constraint work. System is a PE R710 w/ 5570s, 96G, and using moderately configured SAN disk.

    Also - what do I need to do to 'storage align' my indexes? I'm a newb to partitioning - but I feel I've gotten off to the best start I can. First post!

  • The only thing of value I can add here is tha cxpackets is not a problem and not a real wait. It only means that this is running under parallelisms and thanks god for that.

    Since you love to self-educate here's an excellent article on the topic : http://www.simple-talk.com/sql/learn-sql-server/understanding-and-using-parallelism-in-sql-server

  • Hello, world!

  • Ninja's_RGR'us (8/11/2011)


    The only thing of value I can add here is tha cxpackets is not a problem and not a real wait.

    Can you elaborate on this? In my experience high cxpacket waits can be an indication of a number of issues, poorly written/optimized code (not relevant in this case), not enough ram, slow IO etc.

    chandleya:

    When the process is running there should be one thread with the same spid as the cxpacket waits that doesn't have a cxpacket wait type, can you tell us what that is as this may be why the cxpacket wait type exists. I've got a feeling due to what its doing it'll be something (index create) along the lines of a pageiolatch_xx. This may help us determine if the cxpacket waits are actually an issue or whether its something like the "moderately configured SAN"

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • I'm not actually as worried about the time it takes - the CXPacket waits are occuring during index generation. I want to better understand the correct process of partitioning an existing table and hefty dataset. Once I'm sure that I'm -accurately- creating my partition scheme and process, then I'll look for ways to optimize the performance. I'm concerned about dropping and readding the same constraint on the same filegroup (wasted time?), then creating a clustered index on the scheme, then immediately dropping it. It seems to me that SSMS has created all this unnecessary work.

  • @chris-2, please read the whole article I posted. All your questions will be answered there.

  • chandleya (8/15/2011)


    I want to better understand the correct process of partitioning an existing table and hefty dataset. Once I'm sure that I'm -accurately- creating my partition scheme and process, then I'll look for ways to optimize the performance.

    The largest issue you're going to have with partitioning is that the code will need to be rewritten to use the partition. If partition on dates but you never have any where clauses that apply by date, you've wasted your time. Monkey'ing around with the storage mechanics for a vendor app means that you'll need to review any updated code they may send you.

    I'm concerned about dropping and readding the same constraint on the same filegroup (wasted time?), then creating a clustered index on the scheme, then immediately dropping it. It seems to me that SSMS has created all this unnecessary work.

    Nothing new there.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Fortunately, the purpose of this table is a date-based archive. Entry dates for accounts receivable transaction data are key to identifying the data outside of the primary key (identity column). Does every single query exclusively query by date? No, some are date range 01/01/1980 ->. However, does the partitioning not in effect produce more opportunity for multi-threaded processing in between each partition?

  • I make extensive use of partitioning here's a presentation I did about one aspect of it. www.grumpyolddba.co.uk/gonein60ns/GoneIN60nsWeb.pdf I have another document on partitioning but haven't put it onto my website yet.

    It all depends on what benefits you expect to gain from partitioning. Just because a table appears large doesn't mean you should automatically partition it.

    Queries can be poor if the partitioning column is not inlcuded in the query so choose carefully.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply