Best method to add non nullable column(s) in a very large table

  • Hi guys,

    I was wondering what is the best way of adding a new non nullable column(s) to a wide and large table(taking of like 90M records and more than 10 columns).

    I know some of you have different experience in doing something like this, can you please share.

    I'm expecting to run this solution in both SQL 2000 and SQL 2005 environment.

    Thanking you in advance,

    Ignas

  • Just off the top of my head..

    1. You could add each column as nullable

    2. Populate the columns in reasonable batches

    3. Alter each column one-by-one to NOT NULL

    Rationale behind these steps

    1. Adding nullable field should table less time to add to the DB than 10 nonnullable with a default. You could even do them one by one, it would take more time overal but with less single impact.

    2. You don't want to generate too much log data or excessive locking.

    3. Each row for each field is evaluated to make sure it does not have a NULL, this takes time, spread it out..

    CEWII

  • CEWII,

    I will try this approach and let you know the feedback..

    Thanks,

    Ignas

  • I look forward to hearing how it went. I would love to see if others have any other or better approaches..

    CEWII

  • CEWII,

    Exactly!!... I was expecting to see 3 or more different options from the forum and would pick the best 3 for test.

    Ignas

  • Ignas Chilewa (8/27/2009)


    Hi guys,

    I was wondering what is the best way of adding a new non nullable column(s) to a wide and large table(taking of like 90M records and more than 10 columns).

    Ignas,

    I didn't reply originally because you do not say what defines 'best' for you, nor did you give a table schema with the intended changes. That makes it very difficult to give good advice. Nevertheless:

    Adding a new NULLable column with NULL entries will certainly be fast - until you start adding data. Once the server has to start physically moving data around on the data pages (and most likely splitting existing data pages) things will get very slow. The final state of the data pages may well be a bit of a mess too.

    Unless there are very good reasons not to, I would always advise copying the data out, dropping the table, re-creating it with the new structure, and re-loading the data using minimally logged bulk load. Copying the data out in clustered-index order, and re-loading it into the new empty table (with just the clustered index) and specifying the ORDER hint on load is usually optimal. Rebuild non-clustered indexes after the load.

    See the 'Prerequisites for Minimal Logging in Bulk Import' topic in Books Online.

    If you provide details of your requirements, I can probably give you some more targeted advice.

    Paul

  • Paul,

    Below is the table definition:

    CREATE TABLE [dbo].tableName(

    column1 [int] NOT NULL DEFAULT (0),

    column2 [varchar](8) NOT NULL,

    column3 [numeric](10, 2) NOT NULL DEFAULT (0),

    column4 [bit] NOT NULL DEFAULT (0),

    column5 [numeric](8, 0) NOT NULL DEFAULT (1),

    column6 [bit] NOT NULL DEFAULT (0),

    column7 [tinyint] NOT NULL DEFAULT (1),

    column8 [tinyint] NOT NULL DEFAULT (1),

    column9 [int] NOT NULL DEFAULT (0),

    column10 [bit] NOT NULL DEFAULT (0),

    column11 [bit] NOT NULL DEFAULT (0),

    column12 [char](3) NOT NULL DEFAULT (999),

    column13 [bit] NOT NULL DEFAULT (1),

    column14 [varchar](128) NOT NULL DEFAULT (suser_sname()),

    column15 [datetime] NOT NULL DEFAULT (getdate())

    )

    CREATE UNIQUE CLUSTERED INDEX [IX_CUSTOMER_CATALOG_1] ON [dbo].tableName

    (

    column2 ASC,

    column1 ASC

    )WITH FILLFACTOR = 90 ON [DataC]

    GO

    CREATE NONCLUSTERED INDEX [IX_CUSTOMER_CATALOG_2] ON [dbo].tableName

    (

    column1 ASC,

    column2 ASC,

    column7 ASC

    )WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [IX_CUSTOMER_CATALOG_3] ON [dbo].tableName

    (

    column14 ASC,

    column15 ASC

    )WITH FILLFACTOR = 90 ON [PRIMARY]

    GO

    The size of data is like 3.2 GB and index like 4.5 GB.. This table is running on a SAN based storage box with SQL 2000. MS win 2003 Adv server with SP 2 is installed.

    Not sure those are kind of requirement you expected... if not just let me know.

    Again appreciate your time,

    Ignas

  • Thanks for that. Other things of interest include:

    * Are you able to schedule some database off-line time for this work?

    * If so, how long?

    * What would determine the 'best' method for you?

    For next time, would you be able to post in the SQL 2000 forums? You are much more likely to find help quickly there. (I no longer have 2000 installed).

    A good summary of the main advice I would offer can be found here on MSDN

    If you read the links on that page, you won't go far wrong.

    Upgrade to 2005/2008 soon :w00t:

    Paul

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply