partitioning

  • Hello, I have two large related tables - parent and child

    The user is mostly interested in having the last 30 days. But also wants the last 90 days and the whole history. It is my intention to have three partitions (a) last 30 days, (b) previous 60 days and (c) all the rest for each table. Each day I will shunt the last day's data down a level.

    Without being provided a tomb what are the overheads/advantages of having the partitions in separate files?

    TIA Bark

  • That's not a good design for partitioning, it involves a lot of data movement. It's generally advised that once rows are in a partition, that they not be moved (because doing so is the equivalent of a delete/insert)

    You can't partition on age, you can on date, so you could partition by month and each month add a new partition and optionally switch the oldest out to another table.

    What's the goal here? Why are you looking at partitioning?

    Fast data load/remove?

    Index maintenance per partition?

    Different compression settings per partition?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Gail, Thanks for the response. I understand that partitioning does not occur on age but date. I also understand having the partitions in separate files has the overhead of moving data from one file to the other when swapping, splitting and merging occurs when the data resides on different disks.

    There is also no guarantee that physical disk blocks are contiguous so there is no physical gain to be made by having partitions in separate files, which tells me that having partitions in separate files is an administrative gain - able to utilise different disks for read and write.

    Each day approximately 20million records are being inserted into a table partitioned by a constant attribute, however general use is of the last 30 days so not having it partitioned by date is the first thing to change.

    My original question was what is the gain of having a file for each partition over having all the partitions in one file?

    TIA

  • It sounds like you're partitioning with the expectation of performance gains. If you are, don't waste your time. Partitioning is not, in general, done for performance and you don't automatically get a performance benefit by partitioning a table.

    One reason to have partitions in different filegroups is to be able to, in the case of a disaster, restore the newest stuff fast and restore the older filegroups later. It requires Enterprise edition and is pretty complicated.

    Having partitions on different physical drive arrays can help if you're accessing all of them and the system is IO-bottlenecked, and the drives are separate arrays, not partitions on the same one

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the info but I beg to differ on not getting any performance gains by partitioning. By partitioning I will not have to reindex the entire table but only the partition. It also provides the ability to swap in the new data further reducing downtime. My question was what is the benefit of files over one file and you confirmed my thoughts on it being convenient for disaster management.

  • That's not what I meant by performance. I asked earlier about index maintenance on part of the table as a reason for you to be doing partitioning, same as data loads and removal.

    If you're partitioning to improve the performance of queries on a table, don't. It doesn't give you that. It gives you index maintenance on part of a table, if gives you fast data load/removal. It gives you compression on part of a table, ability to restore part of a table.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You need to read questions properly before answering. Three times I stated the question and three times you have gone off on a tangent.

  • Let's keep this pleasant here. You have asked about overhead and advantages of partitioning. In each of the posts, Gail has provided an answer, and also warned about not expecting query performance gains.

    - overhead: It's generally advised that once rows are in a partition, that they not be moved (because doing so is the equivalent of a delete/insert)

    - advantage: One reason to have partitions in different filegroups is to be able to, ...

    - advantage: It gives you index maintenance on part of a table...

    I'd say this is on topic. There are administrative advantages, and disadvantages. There are few, if any, gains in query performance.

    If you are looking for a specific answer, or specific evidence of a hypothesis you have, please ask.

  • You're not actually referencing partitions. Here's a good article on setting [/url]up sliding window partitions, which is almost what you're describing. The thing about partitions and partitioning is, that you're managing the partition itself, not the data within it. In your initial requirement, you want 30 days, 90 days, and the rest. But, in order to change the partitions in this manner, you wouldn't be managing a partition, instead, you'd have to move the data into each define group. Now, this could be a valid approach, but it's not using partitioning and partitions. Instead, it's just defining data storage and then moving the data between the different storage models.

    Really, to a degree, what you're describing is similar to the new SQL Server 2016 functionality Stretch Database. That keeps a certain amount of data local, let's say your 30 day window, and then moves the rest to Azure storage. This is a new, not thoroughly vetted process, and has a bunch of somewhat problematic issues, so I wouldn't suggest it. However, if you look through the concept, I think it's more in line with what you're trying to do.

    Instead of partitions, you should look to SSIS to perform data movement if you want to age your data across multiple file groups. This will entail some significant hits on your transaction log. You'll have to work out how best to do the data movement.

    If you really do want to manage the data through partitions, then you need to better understand how partitioning works and how you'll be able to move those partitions around, especially since you won't be able to have a "and everything else" partition. Instead, you'd have, minimum, 12 partitions at the end of a year, 24 at the end of two. This won't be sustainable. So, it's either data movement, or a rethink on the approach.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • bark stuff (2/24/2016)


    You need to read questions properly before answering. Three times I stated the question and three times you have gone off on a tangent.

    Heh... you should read your own question more carefully and then reread the answers and find that they're spot on.

    Since you're not bagging what we're raking, I'll provide a direct answer so as not to confuse. Here's your original post.

    bark stuff (2/24/2016)


    Hello, I have two large related tables - parent and child

    The user is mostly interested in having the last 30 days. But also wants the last 90 days and the whole history. It is my intention to have three partitions (a) last 30 days, (b) previous 60 days and (c) all the rest for each table. Each day I will shunt the last day's data down a level.

    Without being provided a tomb what are the overheads/advantages of having the partitions in separate files?

    TIA Bark

    To answer your final question above, there are huge advantages to having the partitions not only in separate files but in separate file groups. Each old file group could be reindexed and set to read only so you wouldn't have to backup any of the old static data, provided that it's truly a history table which also means that all but the current day is 100% static...

    ... if you did things right...

    ... and you're proposing not doing them right... not even close...

    If you only use 3 partitions and each day "shunt the last day's data down a level", then none of the 3 partitions are static. That means that you can't even take advantage of not reindexing old data. So, trying to avoid writing a tomb on the subject, you need to pay attention to the answers... especially the ones coming up. 😉

    Your proposal of shunting each day down a level is contrary to all that is holy for partitions. You'll have to move 60 million rows a day (20 million for each of the 3 partitions), do proper reindexing on all 3 partitions with the possible exception of the clustered index if it is temporally based, rebuild all the non-clustered indexes, and possibly rebuild stats after every daily shunt. Since your partitions consist of a month a piece, you can't even take advantage of SWITCH IN/OUT, which is partition based.

    As Gail has implied/stated, except for not having to reindex static partitions, partitioning has no performance advantage and, because there's a separate B-TREE for each partition, frequently provides a performance disadvantage.

    The only way that your suggestion would work would be to do daily partitions and use the sliding window option. If you also make all but the current and next day partitions read-only (and you MUST have a "next day" partition at the ready or risk severe data movement) , you'll also cut down on backup times (no sense backing up that which has not changed) and enable the possibility of online piecemeal restores.

    And I haven't even touched on the nightmare your suggestion will have on the partition function and partition schema.:-)

    Now, understand that your original question dictates that you know little to nothing about partitioning and go back and read Gail's and Grant's answers again. They're trying to save you from a very deep and dark world of hurt.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply