Mimicking partitions in SQL 2000

  • Hi everyone,

    We have an archive table that captures sales and is over 340 million rows on our SQL 2000 box. It has NO clustered indexes, but has two nonclustered indexes, one on a critical upload date column. (No, I didn't design it like this). Long story short, the rebuild of the index requires we have 1.6x the largest index available as an unallocated cushion which is filling up the drive space.

    As SQL 2000 doesn't support partitioning like SQL 2005, we are planning to create a view with the same name as this table, and separate data into smaller tables corresponding to months, clustered this time on the upload date column, that are called by the view. The total space used on these smaller tables should be less than the massive index of the original table.

    Are there any issues to be aware of when creating this pseudo-partitioning, such as should the view be partitioned, despite the fact the underlying tables are as well, and other issues you may think of?

    Thanks.

    Gaby

    Gaby
    ________________________________________________________________
    "In theory, theory and practice are the same. In practice, they are not."
    - Albert Einstein

  • I've done something very similar on 2000, and the data in Create View in BOL has all the rules on partitioned views. It's pretty easy to set up, and works quite well, if you follow the rules in there.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (6/17/2009)


    I've done something very similar on 2000, and the data in Create View in BOL has all the rules on partitioned views. It's pretty easy to set up, and works quite well, if you follow the rules in there.

    Thanks GSquared. Thought that would be okay, but was curious as to the real life experience of implementing this. Also found this article that's more for data warehousing but has some practical approaches.

    Wish me luck then. 😀

    Gaby
    ________________________________________________________________
    "In theory, theory and practice are the same. In practice, they are not."
    - Albert Einstein

  • I had a three-table solution. One was currently active data, which was only a few dozen rows, but was under constant updates, in terms of up to a hundred or more transactions per second. One was recently active data, which was a few hundred rows, up to a couple of thousand, and had lots of selects but only a few dozen transactions per minute. Last was older data, with lots of selects, but only a few updates per day, and about 10-million rows.

    Current table had a clustered index on an Identity PK, and no other indexing. At a few dozen rows, they weren't really needed anyway.

    Recent table had clustered, and a few key indexes for the most common selects.

    Archive table had a dozen or more covering indexes, pretty much whatever was needed to make the selects work well.

    Had jobs that moved data between those tables based on a set of very complex business rules. Ocassionally, something would move from Archive to Recent or Current, or from Recent to Current, but mostly it was moves from Current to Recent or Recent to Archive.

    Had quite a few selects that needed to pull data from all three, so did those on partitioned views.

    The whole thing worked great. When I started out, I had it all in one table, and locks and such were causing major performance issues. Didn't want to use NoLock or some such, because dirty reads would have caused huge problems in this particular application. Splitting the data into three sections improved performance like you wouldn't believe (average query time went from 10 seconds to under half a second, and updates and inserts were much faster too). Partioned views as the basis for all the CRUD procs made the code very easy to build and maintain too. Since a major part of this was for a call center, really fast selects, inserts and updates were completely mission critical. Can't have the operators sitting there, waiting for their screen to display data, while they're on the phone.

    Used similar solutions in a couple of other situations, but that was the most extreme one.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Fortunately, this is an archive table I'll be working on. The eventual goal is to have the current view encapsulate the most recent X months, including current month (we haven't decided how many yet). We'll be designing monthly jobs that 1) create a new table for the month past, 2) add that table to the view, and 3) drop the oldest table from the view.

    E.G.

    create view ArchiveTable

    as

    select * from Archive_current_table -- for the current month not yet complete

    union all

    select * from Archive_2009_05

    union all

    select * from Archive_2009_04

    union all

    select * from Archive_2009_03

    When this month is finished, I'll rename Archive_current_table to Archive_2009_06, move the new table to the view, remove Archive_2009_03 from the view, and create a new Archive_current_table. Assuming all the tables are indexed on the entrydate in ascending, would it be more prudent to reverse the order of the selects with the oldest tables first?

    Gaby
    ________________________________________________________________
    "In theory, theory and practice are the same. In practice, they are not."
    - Albert Einstein

  • You could simplify the system tremendously by having tables named "CurrentMonth", "Prior1Month", "Prior2Month", ..., "ReallyOld", or something like that.

    Dynamically building tables and rebuilding the view is going to be more error prone.

    Another option would be tables named "DataJanuary", "DataFebruary", ..., "DataPriorYears". Makes it a little simpler than dynamic tables, and leaves the view with a static definition.

    Having what year and month the data is in the table name kind of violates the idea that data should be in rows and columns, not in the meta-data for the table. You can't query, "Where TableName > dateadd(year, -1, getdate())". Better to have that data in a column, so you can query it.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • The order of the SELECTs shouldn't matter, AFAIK. The query optimizer is supposed to know what is where in a partitioned view and not query those sets that don't apply.

    Not sure how well this works in large implementations or in what you're proposing, but I've seen demos with 3 views and it appeared to be efficient and work as advertised.

  • Steve Jones - Editor (6/17/2009)


    The order of the SELECTs shouldn't matter, AFAIK. The query optimizer is supposed to know what is where in a partitioned view and not query those sets that don't apply.

    Not sure how well this works in large implementations or in what you're proposing, but I've seen demos with 3 views and it appeared to be efficient and work as advertised.

    Thanks, I have a lot to go with now. Only thing now is to find a test server with enough space left to restore the full backup from production. 😀

    Gaby
    ________________________________________________________________
    "In theory, theory and practice are the same. In practice, they are not."
    - Albert Einstein

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply