Table Partitioning - Better Performance

  • Hello,

    I've read many articles based on partitioning tables in SQL Server 2008 and I'm a little confused. Some articles suggest partitioning does not help performance and others say it does, some say stay away completly.

    We were first of all working on creating 2 databases one holding current data thats reported on constantly and another database holding more historical data, reported on occasionaly. However after consideration, it was deamed better if all the data could be held within 1 database. So initially, we grouped tables together and seperated them with filegroups on different drives. Some of the tables are big, over 50million records and we are having issues with select, insert, updates etc. I've looked at performance improvements (indexing etc), but its seams the amount of data is causing issues.

    The next option was to partition the tables by a flag (0 = historical 1 = current). The partition function would be range right 1 datatype bit.

    Our idea is again to split the partitions into seperate filegroups and spread them across the drives we have available.

    Next would be to place views over the specfic tables with a predicate on the partitioned column, so all queries would only hit the partition where current = 1. is that correct? Partitioned tables would be joined to each other via the views.

    My other question is about the partition column, should that be a primary key? I understand that it doesnt need to be (either primary and or clustered index), but is that best practice? Also, trying to storage align the index only seams to work if the partition column is a primary key. Is this correct or am I doing something wrong?

    Any advise is greatly apprieciated.

    thanks

  • This is yet another example of, 'it depends'. I wouldn't say to stay away completely as partitioning definately has it's place and advantages.

    Here's[/url] a great article on partitioning.

    I trypically use partitioning when I have a date column that I can use for the partitioning key. Planning out your partitioning key is important. You need to make sure that it is a column that will be used in most all of your queries. If not, you're looking at a table scan.

    As far as aligning your index with the partition. Your partitioning key does not have to be a PK, but I would recommend clustering your table on the partitioning key if possible. In your case, if you really do choose to use a bit colulmn, clustering on this would not make sense. What criteria would you use to determine if it is current or historical, do you have a date column available?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Many Thanks for the response. I'll have a look at the link you've suggested.

    Data coming into the tables (from another system) will be thought as current so this flag is set by default. Each year, a process will run to update the relevant records as historical using a date column. Maybe we could use that instead of the bit column. I did think that having a bit column as a clustered index by itself is waisting it, so we are considering adding another column to it which is primerly used within our queries. I was concerned about table scan and so a view will be place over the tables so a particular partition will only be hit. Hope that made sense.

    Just to add to this, (thinking about it), not all the queries will use date ranges and so would not be in the predicate. This is why the flag is used. Do you think this would be better as a PK rather than a clustered index? I can the utilise the clustered index for other columns.

    This is my first attempt at partitioning tables, so any guidence is apprieciated.

  • Keep in mind that with SQL Server 2005/2008 table partitioning, you do not need to define views to limit your partitions. The partitioning functions/schemes do that for you as long as you inculde the partitioning key in the WHERE clause.

    You can validate this by creating a partitioned table, running queries against it, and check the execution plan. The execution plans will show the partitions that the query optimizer has identified as valid for that query and only those partitions will be touched.

    How much data are we talking about here? How many historical and current rows are you expecting?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • The historical data would be much bigger, the largest table containing around 60million records with the smallest around 10million.

    The current data would consist of the largest table containing around 15million and the smallest around 5million.

    Thanks

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply