Designing Indexing for Partitioned Tables, how would you do it?

  • Hello All,

    I am currently in the process of creating a new table, which I know will extend far beyond the identity range of 'int' and I'm seeking to store this data in tables separated by month.

    Each row of data is assigned a date (of effectiveness). Ideally, the check constraint would include this date to partition across, however, as we all know, partitioning updatable views requires the partitioning column to be part of the primary key. This means that the date field, must become part of the primary key.

    As a date alone is not accurate enough the become it's own primary key, obviously there would be a need to record a 'sub id' with the date field and create the two columns as a primary key:

    [ItemDate] [datetime],

    [ItemKey] [int]

    The issue with this is that the dates or date ranges are known, the 'Keys' are not. When we execute the query call and search for items on a date basis, this would then perform an index scan on the primay key.

    I'm currently thinking that I will create the primary key as below:

    [ItemKey] [bigint] IDENTITY(-9223372036854775807)

    [ItemDate] [datetime]

    /* -9223372036854775808 reserved as a known non-existant key */

    Then, ontop of this, create a standard index on the [ItemDate] alone. Then when running the query analyzer, we see that only an index seek was performed on [ItemDate].

    My question is this... If you had a table that is eventually going to hold more than 4 billion rows, each row with up to 150bytes of data, how would you split it up?

    Interested to see your suggestions...

    Neil.

  • Hi,

    If you're planning to segregate the tables by month, I'm not sure why you think there will be a problem with your original primary key of [ItemDate],[ItemKey] ?

    If you pass a date as part of the query, this shouldn't SCAN the index, but should do an index SEEK on the date.

    What you'll see in the showplan is that each table will be touched, but only minimally as the optimizer establishes whether the data in that table potentially matches the required date range or not. When it finds the table which satisfies the required date range, it should then use a clustered index seek to find the records matching that date.

    We currently use exactly this approach on monthly tables of some 650 million rows each, referenced via a partitioned view, and have no problem at all.

  • Hello Philip,

    Thank you for your reply. It wasn't that I thought there was a problem, I was just interested to throw the issue to the internet community and see what the results were. Obviously (by your response), we're on the right track.

    Can I ask though, you mentioned that you are using the same technique. With your secondary key item, similar to [ItemKey] how are you generating these numbers? Are these numbers unique to the month table or the tables overall?

    Thanks

    Neil.

  • Hi Neil and Phillip,

    If the table is large. Is segragation only solution? Can't we use something like bit fields to store primary keys??

    thanks

    Samir

  • Hello Samir,

    It depends on your situation. Tables can potentially store many millions, in fact, billions of rows, however consider this:

    If you had a table with a single 'int' column and populated it with every possible 'int' number, from -2147483648 through to 2147483647 (4 billion rows) with each row at 4 bytes of data, this would take approximately 17.1GB to store.

    If you needed to logically scan or query this table, it would take considerable time above having the table split into properly organized segments. Sure, if you needed to search the entire results, then there's no real way (apart from indexing the searching columns) to speed up the process, but in most queries we all write some data exclusions to narrow our search field.

    As for using bit columns as primary keys, this is not possible. Ideally, the best case scenario for a primary key would be to have a sequential number; numbers being faster to read than characters and sequential to avoid page splitting in indexes.

    I hope this helps.

    Neil.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply