What columns should be chosen when partitioning a table?

Question

Post reply

What columns should be chosen when partitioning a table?

kwoznica

SSCertifiable

Points: 5275
More actions
March 31, 2011 at 9:31 am

#237744

Hello all,

I am reviewing a table which get poor performance and has approximately 20GB of data. I want to try partitioning the table but would like some opinions of which column should get chosen when setting the partition scheme.

The two candidate columns I see is one called Document_no and timestamp

The document_no column is a varchar datatype and has a wide range of data which increments starting with SQ-000001 to SQ-899035.

The timestamp column is a timestamp datatype but it is in hexadecimal. Can this datatype be used to set up table partioning?

Which of the two would be the best column to partition the table off of?

Thanks for your reply.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

C.K.Shaiju SSCarpal Tunnel Points: 4228 More actions · Answer 1

C.K.Shaiju

SSCarpal Tunnel

Points: 4228

March 31, 2011 at 9:51 am

#1306070

Check the link below. Hope it helps.

http://technet.microsoft.com/en-us/library/dd578580(SQL.100).aspx

Regards,

Shaiju CK

_____________________________________________
One ounce of practice is more important than tonnes of dreams

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 2

Shaiju pointed to a good link for this. And, no, a TIMESTAMP datatype shouldn't be used for partitioning (I'm not even sure you can).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

kwoznica SSCertifiable Points: 5275 More actions · Answer 3

kwoznica

SSCertifiable

Points: 5275

April 1, 2011 at 6:19 am

#1306425

Thanks to both of your for the feedback. I'll give that document a read through today. I think I'll be able to use the document no_ column I described. Its unfortunate this table does not have a normal date field that is set when the record is created as this would have made my decision easier.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 4

kwoznica (4/1/2011)

Thanks to both of your for the feedback. I'll give that document a read through today. I think I'll be able to use the document no_ column I described. Its unfortunate this table does not have a normal date field that is set when the record is created as this would have made my decision easier.

Is is possible to add one because that would be one of the better things to do.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

kwoznica SSCertifiable Points: 5275 More actions · Answer 5

kwoznica

SSCertifiable

Points: 5275

April 1, 2011 at 12:07 pm

#1306641

Jeff,

What would be the best way to do that though?

We already have millions of records. Also would it need to be part of the front end application?

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 6

kwoznica (3/31/2011)

Hello all,

I am reviewing a table which get poor performance and has approximately 20GB of data. I want to try partitioning the table but would like some opinions of which column should get chosen when setting the partition scheme.

Have you already looked at the indexes you have on the table? That's where I'd start if I were trying to address performance problems.

The two candidate columns I see is one called Document_no and timestamp

The document_no column is a varchar datatype and has a wide range of data which increments starting with SQ-000001 to SQ-899035.

The timestamp column is a timestamp datatype but it is in hexadecimal. Can this datatype be used to set up table partioning?

Which of the two would be the best column to partition the table off of?

Thanks for your reply.

It depends on how you're querying the data. If you tend to go by daterange, you'd want a date column. If you primarily query by document_no, then you might want to partition by that column.

As far as the Timestamp though, no you don't want to partition on that kind of column. If you want to partition by date, add a column of type DATE and call it something like DocumentDate (Or Document_Date if you like underscores) then populate that column with a translation of the timestamp column.

What's the primary key on this table? One thing to remember is that when you partition a table the column you're using as the partition has to be part of the primary key.

One other thing, are you planning on creating an archive table that matches the current table to move some of the data out of the table you're partitioning? This will get you better performance as it will reduce the number of rows in the table.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 7

kwoznica (4/1/2011)

Jeff,

What would be the best way to do that though?

We already have millions of records. Also would it need to be part of the front end application?

First, no, it would not need to be a part of the front end. It could be setup to default to whatever the current date and time is at the time of insertion of new rows.

The real key here is, is there any way of telling approximately how old each row is?

Also, how many millions of rows are we talking about?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

kwoznica SSCertifiable Points: 5275 More actions · Answer 8

Jeff Moden (4/7/2011)

kwoznica (4/1/2011)

Jeff,

First, no, it would not need to be a part of the front end. It could be setup to default to whatever the current date and time is at the time of insertion of new rows.

The real key here is, is there any way of telling approximately how old each row is?

Also, how many millions of rows are we talking about?

Jeff,

I believe there is a way to tell how old each row is becuase there is a column with a datatype of timestamp and each record is populated with data.

If I add the column how can I set the column to default to the date in which the row was created?

When I look at the data I see a time stamp column however it is in hexadecimal and the datatype is timestamp so I am guessing I can use that but as a previous poster mentions I need to translate that to date time data type.

The table has approximately 9 million records.

I appreciate your reply.

kwoznica SSCertifiable Points: 5275 More actions · Answer 9

Stefan Krzywicki (4/6/2011)

Have you already looked at the indexes you have on the table? That's where I'd start if I were trying to address performance problems.

It depends on how you're querying the data. If you tend to go by daterange, you'd want a date column. If you primarily query by document_no, then you might want to partition by that column.

As far as the Timestamp though, no you don't want to partition on that kind of column. If you want to partition by date, add a column of type DATE and call it something like DocumentDate (Or Document_Date if you like underscores) then populate that column with a translation of the timestamp column.

What's the primary key on this table? One thing to remember is that when you partition a table the column you're using as the partition has to be part of the primary key.

One other thing, are you planning on creating an archive table that matches the current table to move some of the data out of the table you're partitioning? This will get you better performance as it will reduce the number of rows in the table.

Stefan,

I have looked at the indexes on the table and here is what I can say of my findings. I have the indexes rebuilt on a weekly basis. There are 4 indexes with the primary key clustered. I'm not sure what other tuning the indexes could use as it is not over abundant with indexed columns.

The table is primarily queried by document no_ column.

How do I translate timestamp data types to Date Time?

The primary key based on sp_help is actually an index. Do you recommend changing this? If the partition needs a primary key column then it looks like this presents a problem.

There is an archive table however certain business rules, ie we need to keep data for x number of years, and then certain managers could choose to archive the data through the front end but dont. Deaf ears if you know what I mean.

Thank you for your insight though. and check out my print screen, which i have attached if you get a minute.

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 10

kwoznica (4/7/2011)

Jeff Moden (4/7/2011)

kwoznica (4/1/2011)

Jeff,

First, no, it would not need to be a part of the front end. It could be setup to default to whatever the current date and time is at the time of insertion of new rows.

The real key here is, is there any way of telling approximately how old each row is?

Also, how many millions of rows are we talking about?

Jeff,

I believe there is a way to tell how old each row is becuase there is a column with a datatype of timestamp and each record is populated with data.

If I add the column how can I set the column to default to the date in which the row was created?

For new rows, you can either give the column a default value of GetDate() when you add the column to the table or add the current date with the GetDate() function in the INSERT statements.

For existing rows, you'd use

UPDATE MyTable

SET NewDateColumn = CAST(TimestampColumn AS DATETIME)

When I look at the data I see a time stamp column however it is in hexadecimal and the datatype is timestamp so I am guessing I can use that but as a previous poster mentions I need to translate that to date time data type.

The table has approximately 9 million records.

I appreciate your reply.

You only have 9 million rows and it takes up 20 gig? How many columns are there? You might get a good reduction in size either by normalizing the table, using SPARSE columns or by checking your data types and make sure you're using the smallest appropriate data type for each column.

For comparison, one of the partitioned tables I use regularly is just under 30 gig and has 162 million rows.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 11

kwoznica (4/7/2011)

Stefan Krzywicki (4/6/2011)

Have you already looked at the indexes you have on the table? That's where I'd start if I were trying to address performance problems.

It depends on how you're querying the data. If you tend to go by daterange, you'd want a date column. If you primarily query by document_no, then you might want to partition by that column.

As far as the Timestamp though, no you don't want to partition on that kind of column. If you want to partition by date, add a column of type DATE and call it something like DocumentDate (Or Document_Date if you like underscores) then populate that column with a translation of the timestamp column.

What's the primary key on this table? One thing to remember is that when you partition a table the column you're using as the partition has to be part of the primary key.

One other thing, are you planning on creating an archive table that matches the current table to move some of the data out of the table you're partitioning? This will get you better performance as it will reduce the number of rows in the table.

Stefan,

I have looked at the indexes on the table and here is what I can say of my findings. I have the indexes rebuilt on a weekly basis. There are 4 indexes with the primary key clustered. I'm not sure what other tuning the indexes could use as it is not over abundant with indexed columns.

From looking at the abbreviated index definitions in your jpeg, you want to re-examine your indexes. You want an index in 2008 to be based on columns that limit your queries, the columns in the WHERE clauses of your queries. If you want to include the other columns being returned by the queries, which can frequently be a good idea, you want them as INCLUDED columns. Try to rework your indexes so they only contain the columns you really need.

The table is primarily queried by document no_ column.

I'd suggest considering this column for your partition then. You can still do it by date, but if you can think of a good way to archive documents based on document_no, then that's the way to go. If you can't, date is the way to go. Remeber that whatever column you partition by has to be part of your Primay Key.

How do I translate timestamp data types to Date Time?

I believe that's just CAST(TimestampColumn AS DATETIME)

However, you should determine how specific you need this date and time to be. Look at the datetime data types and choose the one that takes the least space that still meets your requirements. A good description of the various types can be found at the bottom of this page. http://msdn.microsoft.com/en-us/library/ms182418.aspx

The primary key based on sp_help is actually an index. Do you recommend changing this? If the partition needs a primary key column then it looks like this presents a problem.

A Primary Key is a type of index, that's fine. You will have to change it so it uses both the column you want as your primary index and the column you're partitioning on.

There is an archive table however certain business rules, ie we need to keep data for x number of years, and then certain managers could choose to archive the data through the front end but dont. Deaf ears if you know what I mean.

If you have criteria that states you have to keep data for a certain amount of time, I'd suggest making RecordDate your partitioning column. It will make managing the data over long periods much easier for you.

Thank you for your insight though. and check out my print screen, which i have attached if you get a minute.

I really think the best way for you to improve performance on this table is to learn more about indexes and primary keys and improve the ones you currently have on this table. If that doesn't help enough, work on making the table smaller by normallizing and changing to smaller data types. Only if those steps don't help should you consider partitioning. Partitioning is not a trivial task and won't help your speed much if you don't already have good indexes and a properly defined table.

Can you post the full definitions for your indexes? From the little I can see, you need to redo your Primary Key as well.

If you need help getting to those definitions, let me know and I can walk you through it.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams

kwoznica SSCertifiable Points: 5275 More actions · Answer 12

Stefan Krzywicki (4/7/2011)[hr

I really think the best way for you to improve performance on this table is to learn more about indexes and primary keys and improve the ones you currently have on this table. If that doesn't help enough, work on making the table smaller by normallizing and changing to smaller data types. Only if those steps don't help should you consider partitioning. Partitioning is not a trivial task and won't help your speed much if you don't already have good indexes and a properly defined table.

Can you post the full definitions for your indexes? From the little I can see, you need to redo your Primary Key as well.

If you need help getting to those definitions, let me know and I can walk you through it.

If I do add a column as you suggest can I put a contstraint on the table to update new records? I'm basically trying to avoid having to convince

the application developers to add more code.

Yes we have 9 million rows which take up 20 gig. The table has 280 columns. I would love to normalize the table but application developers are consultants who keep telling me it is a hardware problem. Furthermore there are political issues because a high level manager used to work for them. Any recommendations I have made get shut down due to either price or lack of faith in my word versus theirs.

I agree with the data types as well, plus there are many columns with default values that could easily be Null or omitted from the table completely.

Below are the index create statements. You asked for the definitions so let me know if you were referring to this.

CREATE UNIQUE NONCLUSTERED INDEX [$1] ON [dbo].[Live$SalesLine_37]

(

[Sequence No_ (History)] ASC,

[Document Type] ASC,

[Document No_] ASC,

[Line No_] ASC,

[Version No_ (History)] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [Data Filegroup 1]

GO

CREATE UNIQUE NONCLUSTERED INDEX [$2] ON [dbo].[Live$SalesLine_37]

(

[Changed By (History)] ASC,

[Document Type] ASC,

[Document No_] ASC,

[Line No_] ASC,

[Version No_ (History)] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [Data Filegroup 1]

GO

CREATE UNIQUE NONCLUSTERED INDEX [$3] ON [dbo].[Live$SalesLine_37]

(

[Changed Date (History)] ASC,

[Document Type] ASC,

[Document No_] ASC,

[Line No_] ASC,

[Version No_ (History)] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [Data Filegroup 1]

GO

ALTER TABLE [dbo].[Live$SalesLine_37] ADD CONSTRAINT [Live$SalesLine_37$0] PRIMARY KEY CLUSTERED

(

[Document Type] ASC,

[Document No_] ASC,

[Line No_] ASC,

[Version No_ (History)] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DataFilegroup3]

GO

kwoznica SSCertifiable Points: 5275 More actions · Answer 13

Stefan Krzywicki (4/7/2011)

Stefan Krzywicki (4/6/2011)

Have you already looked at the indexes you have on the table?

If you primarily query by document_no, then you might want to partition by that column.

From looking at the abbreviated index definitions in your jpeg, you want to re-examine your indexes. You want an index in 2008 to be based on columns that limit your queries, the columns in the WHERE clauses of your queries. If you want to include the other columns being returned by the queries, which can frequently be a good idea, you want them as INCLUDED columns. Try to rework your indexes so they only contain the columns you really need.

I believe the document_no column may not have consistent data based on some sample I have been looking at so I need to get more information on how the data is populated in that column. If I make that column the primary key will I break any dependencies that may be in the application.

I'd suggest considering this column for your partition then. You can still do it by date, but if you can think of a good way to archive documents based on document_no, then that's the way to go. If you can't, date is the way to go. Remeber that whatever column you partition by has to be part of your Primay Key.

I believe that's just CAST(TimestampColumn AS DATETIME)

However, you should determine how specific you need this date and time to be. Look at the datetime data types and choose the one that takes the least space that still meets your requirements. A good description of the various types can be found at the bottom of this page. http://msdn.microsoft.com/en-us/library/ms182418.aspx

Thanks for this. I didn't think it could be done with CAST.

A Primary Key is a type of index, that's fine. You will have to change it so it uses both the column you want as your primary index and the column you're partitioning on.

If you have criteria that states you have to keep data for a certain amount of time, I'd suggest making RecordDate your partitioning column. It will make managing the data over long periods much easier for you.

That RecordDate column your referring to would be the new column I would add. I think this would be the best candidate but can I implement it and have it updated without the front end?

I really think the best way for you to improve performance on this table is to learn more about indexes and primary keys and improve the ones you currently have on this table.

I still have alot to learn about indexing. I'm working on it. I really appreciate your help and suggestions.

I would love to make changes as I have a good feel on how to do that but am concerned about the ramifications it will have with the front end. If I change the index's on columns can the front end tell?

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 14

kwoznica (4/13/2011)

A Primary Key is a type of index, that's fine. You will have to change it so it uses both the column you want as your primary index and the column you're partitioning on.

If you have criteria that states you have to keep data for a certain amount of time, I'd suggest making RecordDate your partitioning column. It will make managing the data over long periods much easier for you.

That RecordDate column your referring to would be the new column I would add. I think this would be the best candidate but can I implement it and have it updated without the front end?

Sure, if the RecordDate is the date of the entry, all you have to do is set up the column with a DEFAULT value. For example

CREATE TABLE DefaultDateExample (RecordDate date default GetDate())

I really think the best way for you to improve performance on this table is to learn more about indexes and primary keys and improve the ones you currently have on this table.

I still have alot to learn about indexing. I'm working on it. I really appreciate your help and suggestions.

I would love to make changes as I have a good feel on how to do that but am concerned about the ramifications it will have with the front end. If I change the index's on columns can the front end tell?

If you change the indexes, unless the queries from the front end use index hints they should never know. They'll just see that they're getting their data faster. I'll reply with more about the indexes in a response to the post you made with the index definitions.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams