Best way to implement partitioning

  • SQLKnowItAll (7/9/2012)


    ScottPletcher (7/9/2012)

    I know why the date was left out of the Sales table -- it can be derived.

    What am I missing... How can it be derived?

    Join back to the Account table using AccountId.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • ScottPletcher (7/9/2012)


    SQLKnowItAll (7/9/2012)


    ScottPletcher (7/9/2012)

    I know why the date was left out of the Sales table -- it can be derived.

    What am I missing... How can it be derived?

    Join back to the Account table using AccountId.

    Maybe I am missing something... Or just lack of information... However, I believe that rows are inserted into the accountsales table regularly, while only 1 row for each account is inserted into the accounts table. So I could have a createddate for account 10000 today, and then rows added to accountsales with this account today, 3 days from now, 3 years from now... So how do you know when the actual sale happened versus when the account was created. Mind you, I am thinking that these tables are representative of a Customer and Orders table.

    Jared
    CE - Microsoft

  • SQLKnowItAll (7/9/2012)


    ScottPletcher (7/9/2012)


    SQLKnowItAll (7/9/2012)


    ScottPletcher (7/9/2012)

    I know why the date was left out of the Sales table -- it can be derived.

    What am I missing... How can it be derived?

    Join back to the Account table using AccountId.

    Maybe I am missing something... Or just lack of information... However, I believe that rows are inserted into the accountsales table regularly, while only 1 row for each account is inserted into the accounts table. So I could have a createddate for account 10000 today, and then rows added to accountsales with this account today, 3 days from now, 3 years from now... So how do you know when the actual sale happened versus when the account was created. Mind you, I am thinking that these tables are representative of a Customer and Orders table.

    Maybe it's just me. I took Account to be more of a header and Sales the details for a sale.

    Based on the names, you're probably right though.

    Even then, I would expect there to be a "order header" table that stores the date/time of the sale. And that the datetime of any individual item on that order could be derived from the header table's sale date.

    Typically, I would cluster order header and detail tables by OrderId.

    IF, though, it has been decided to cluster the "header" table by date, I would consider (not necessary do, but consider) denormalizing and putting the date in the sales table to facilitate joins.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • ScottPletcher (7/9/2012)


    SQLKnowItAll (7/9/2012)


    ScottPletcher (7/9/2012)


    SQLKnowItAll (7/9/2012)


    ScottPletcher (7/9/2012)

    I know why the date was left out of the Sales table -- it can be derived.

    What am I missing... How can it be derived?

    Join back to the Account table using AccountId.

    Maybe I am missing something... Or just lack of information... However, I believe that rows are inserted into the accountsales table regularly, while only 1 row for each account is inserted into the accounts table. So I could have a createddate for account 10000 today, and then rows added to accountsales with this account today, 3 days from now, 3 years from now... So how do you know when the actual sale happened versus when the account was created. Mind you, I am thinking that these tables are representative of a Customer and Orders table.

    Maybe it's just me. I took Account to be more of a header and Sales the details for a sale.

    Based on the names, you're probably right though.

    Even then, I would expect there to be a "order header" table that stores the date/time of the sale. And that the datetime of any individual item on that order could be derived from the header table's sale date.

    Typically, I would cluster order header and detail tables by OrderId.

    IF, though, it has been decided to cluster the "header" table by date, I would consider (not necessary do, but consider) denormalizing and putting the date in the sales table to facilitate joins.

    I see where you are coming from now. I guess until we get more info, it will be hard to tell what is really going on...

    Jared
    CE - Microsoft

  • Couple steps back... Why are you partitioning? What's the purpose and expected gain?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (7/9/2012)


    Couple steps back... Why are you partitioning? What's the purpose and expected gain?

    Spot-on Gail, as always.

    See this post: http://qa.sqlservercentral.com/Forums/Topic1301588-1550-1.aspx#bm1302471

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (7/10/2012)


    GilaMonster (7/9/2012)


    Couple steps back... Why are you partitioning? What's the purpose and expected gain?

    Spot-on Gail, as always.

    See this post: http://qa.sqlservercentral.com/Forums/Topic1301588-1550-1.aspx#bm1302471%5B/quote%5DWell, I might have made an @$$ out of myself by assuming, but I gathered the OP wants to partition for ease of maintenance on a large database since he mentioned size and not performance. Although, we all know I have made some pretty bad assumptions 🙂

    Jared
    CE - Microsoft

  • SQLKnowItAll (7/10/2012)


    TheSQLGuru (7/10/2012)


    GilaMonster (7/9/2012)


    Couple steps back... Why are you partitioning? What's the purpose and expected gain?

    Spot-on Gail, as always.

    See this post: http://qa.sqlservercentral.com/Forums/Topic1301588-1550-1.aspx#bm1302471%5B/quote%5DWell, I might have made an @$$ out of myself by assuming, but I gathered the OP wants to partition for ease of maintenance on a large database since he mentioned size and not performance. Although, we all know I have made some pretty bad assumptions 🙂

    The OP said this: " I would like to partition these tables for obvious reasons."

    There is nothing obvious to me about the request or the reasons for it. In any case, he/she seems to have disappeared. But from what little I know I say trouble is lurking - just my gut feel here.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (7/10/2012)


    SQLKnowItAll (7/10/2012)


    TheSQLGuru (7/10/2012)


    GilaMonster (7/9/2012)


    Couple steps back... Why are you partitioning? What's the purpose and expected gain?

    Spot-on Gail, as always.

    See this post: http://qa.sqlservercentral.com/Forums/Topic1301588-1550-1.aspx#bm1302471%5B/quote%5DWell, I might have made an @$$ out of myself by assuming, but I gathered the OP wants to partition for ease of maintenance on a large database since he mentioned size and not performance. Although, we all know I have made some pretty bad assumptions 🙂

    The OP said this: " I would like to partition these tables for obvious reasons."

    There is nothing obvious to me about the request or the reasons for it. In any case, he/she seems to have disappeared. But from what little I know I say trouble is lurking - just my gut feel here.

    Well... My assumption was that if 2 tables make up 1.5TB of storage, I could certainly find it hard to manage backups. If it made sense, I would partition it off and make a significant amount of the data read-only (again, if that was logical) so that my read_write backups were much smaller. This way I could have my read only filegroups backed up once instead of more frequently. Among other things...

    Jared
    CE - Microsoft

  • SQLKnowItAll (7/10/2012)


    TheSQLGuru (7/10/2012)


    SQLKnowItAll (7/10/2012)


    TheSQLGuru (7/10/2012)


    GilaMonster (7/9/2012)


    Couple steps back... Why are you partitioning? What's the purpose and expected gain?

    Spot-on Gail, as always.

    See this post: http://qa.sqlservercentral.com/Forums/Topic1301588-1550-1.aspx#bm1302471%5B/quote%5DWell, I might have made an @$$ out of myself by assuming, but I gathered the OP wants to partition for ease of maintenance on a large database since he mentioned size and not performance. Although, we all know I have made some pretty bad assumptions 🙂

    The OP said this: " I would like to partition these tables for obvious reasons."

    There is nothing obvious to me about the request or the reasons for it. In any case, he/she seems to have disappeared. But from what little I know I say trouble is lurking - just my gut feel here.

    Well... My assumption was that if 2 tables make up 1.5TB of storage, I could certainly find it hard to manage backups. If it made sense, I would partition it off and make a significant amount of the data read-only (again, if that was logical) so that my read_write backups were much smaller. This way I could have my read only filegroups backed up once instead of more frequently. Among other things...

    Your assumptions place an AWFUL lot of knowledge with the OP. Way too much I think. But in any case I will sign off until and unless the OP comes back...

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I do apologize for leaving this by the wayside. Previouly I didn't have much experiance on partitioning so after the responses came back in I realized I should learn further before going into this topic. Now that I've done that I can go further on what I'm looking for 🙂

    The main reason I'm doing this is because archiving data is such a huge I/O impact on the DB. Basically right now we have three filegroups. The first is normally to hold the last week of data (SSD's), the second is to handle the last three months of data (FC 15k) and the third is to handle data older than that (SATA 7200k).

    Right now we insert data from a table on the first filegroup to a table on the second filegroup and then go back and do deletes on the table in first filegroup. Because we are talking about millions of records a day, this is extremely intensive. Instead we are looking to switch data out on the first partition, merge the empty partition back in, rebuild indexes on the staging table to move it to the next filegroup, split a partition on the next filegroup that is for the same day we switched out and then switch the staging table back in to the main table.

    So back to my op - we can add the CreationDate to the other child tables and then partition on that, but then because we need to do switches we would need to include the date in all the indexes - which includes the pk. Right now the date is not part of the PK on any of the tables. We join all child tables like AccountSales to Account and then use the date in the account table. Even if the CreationDate column was added to all the child tables we will still need to join to the Account table. I suppose then we would need to re-write all the queries to join on both AccountId and CreationDate if we change the PK? I have concerns about that in addition to just the fact that an int CIX vs an int + datetime CIX (byte length on CIX and leafs) and the I/O required.

    I had also thought about just forgoing the whole date archiving thing and just going with partitioning on an interval of AccountId i.e, every 25 million rows. Then the PK wouldn't need the Date although we wouldn't get partition elimination increases in performance from any queries really but smaller index sizes might be a better tradeoff?

    Anyways - hope that helps define the issue. Please let me know if you have any questions. Thanks in advance for any help!

    * Also the suggestions on the filegroup @ 80 --> 100% or 98% - you are right, I had just scripted those tables up real quick for an example and left the default FF 🙂

    * The talk about making the other FG read only are good too and is something we are planning in the future but it's not something I can currently do for various reasons which we have been in the process of resolving.

  • bump

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply