Add\Modify Index on Large Table.

Question

Post reply

Add\Modify Index on Large Table.

Rechana Rajan

SSCertifiable

Points: 7678
More actions
May 1, 2019 at 12:43 pm

#3600960

I have a very huge table (60GB) used for logging purpose(100 of records logged every second). Now the developers have added a new update script as procedure which is using a different where condition for which I don’t have an Index.

· How can I create a new index on the new columns without causing blocking on DB ?

· What’s the impact if I add INCLUDE column to an existing index?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Jonathan AC Roberts SSCoach Points: 17601 More actions · Answer 1

If you have SQL Server Enterprise you can use ONLINE = ON on the create index statement. This is supposed to allow transactions to continue on the table while the index is being created. I'm not sure how well it works though. I would want an outage to create that index.

If you want to add an include column to an existing index you will need to drop and recreate the index.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 2

Rechana Rajan wrote:

I have a very huge table (60GB) used for logging purpose(100 of records logged every second). Now the developers have added a new update script as procedure which is using a different where condition for which I don’t have an Index. · How can I create a new index on the new columns without causing blocking on DB ? · What’s the impact if I add INCLUDE column to an existing index?

There's something seriously wrong with the design of the system if it's allowed to update a log table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Eric M Russell SSC Guru Points: 125255 More actions · Answer 3

Taking a step back, if the table gets routinely updated, then it's probably more like a queue table or a run log table (ie: row gets insert with StartTime populated and then EndTime is updated when a process completes). If so, then only a tiny faction of the rows in this 60 GB table are really needed.

You could refactor this as a collection of tables and paritioned view. For example, the data could be split between two tables: one current table containing the most recent rows (ie: where EndTime is NULL) and then the remaining rows are contained in a history table. A nightly job can insert rows from the current table into the history table and delete rows from the current table. If needed, a view can unionize the two tables for reporting purposes (what's called a partitioned view).

What's great about this solution is flexibility: the smaller current table can be indexed as needed for the application, and the much larger history table can have minimal indexing, so no more nightmares about query tuning and index maintenance.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 4

There's a difference (to me, anyway) between a table that suffers a whole lot of INSERTs and a table that suffers UPDATEs. Run logs (with only one exception that's not likely to be true here), audit tables, and similar should never suffer "UPDATEs". They should only suffer INSERTs.

So, I guess the question goes back to the OP... what is the actual purpose of this table and what is its actually usage? Knowing that will help us answer your question.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Rechana Rajan SSCertifiable Points: 7678 More actions · Answer 5

Jonathan AC Roberts wrote:

If you have SQL Server Enterprise you can use ONLINE = ON on the create index statement. This is supposed to allow transactions to continue on the table while the index is being created. I'm not sure how well it works though. I would want an outage to create that index. If you want to add an include column to an existing index you will need to drop and recreate the index.

Ours is Enterprise but ONLINE=ON wont help as it locks initially to copy the table.

Rechana Rajan SSCertifiable Points: 7678 More actions · Answer 6

Eric M Russell wrote:

Taking a step back, if the table gets routinely updated, then it's probably more like a queue table or a run log table (ie: row gets insert with StartTime populated and then EndTime is updated when a process completes). If so, then only a tiny faction of the rows in this 60 GB table are really needed. You could refactor this as a collection of tables and paritioned view. For example, the data could be split between two tables: one current table containing the most recent rows (ie: where EndTime is NULL) and then the remaining rows are contained in a history table. A nightly job can insert rows from the current table into the history table and delete rows from the current table. If needed, a view can unionize the two tables for reporting purposes (what's called a partitioned view). What's great about this solution is flexibility: the smaller current table can be indexed as needed for the application, and the much larger history table can have minimal indexing, so no more nightmares about query tuning and index maintenance.

Thanks ERIC . This is exactly what it is. The table is getting data inserted from multiple application initially and when the process is completed from different systems\third party integration the data gets updated.

We will try the option suggested.

Thanks Again

Rechana Rajan SSCertifiable Points: 7678 More actions · Answer 7

Jeff Moden wrote:

There's a difference (to me, anyway) between a table that suffers a whole lot of INSERTs and a table that suffers UPDATEs. Run logs (with only one exception that's not likely to be true here), audit tables, and similar should never suffer "UPDATEs". They should only suffer INSERTs. So, I guess the question goes back to the OP... what is the actual purpose of this table and what is its actually usage? Knowing that will help us answer your question.

Thanks Jeff .

The table as I mentioned getting initial data from application which is integrated with different other applications. The data is processed and shared from our side through BizTalk and then the response updates the initial record to success\failure. The filter condition changes wrt to different application\process. As DBA we have made indexes on initial filter columns but since the table is huge adding an Index at this stage seems quite impossible without downtime.

Is there anyway to handle this? Can the index creation be done in stages in SQL 2019 there I option to pause the index creation is there anything like that for SQL 2014?

The approach suggested by Eric seems doable.

Jonathan AC Roberts SSCoach Points: 17601 More actions · Answer 8

Rechana Rajan wrote:

Jonathan AC Roberts wrote:

If you have SQL Server Enterprise you can use ONLINE = ON on the create index statement. This is supposed to allow transactions to continue on the table while the index is being created. I'm not sure how well it works though. I would want an outage to create that index. If you want to add an include column to an existing index you will need to drop and recreate the index.

Ours is Enterprise but ONLINE=ON wont help as it locks initially to copy the table.

I'm not sure what you mean?

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 9

Rechana Rajan wrote:

Jeff Moden wrote:

There's a difference (to me, anyway) between a table that suffers a whole lot of INSERTs and a table that suffers UPDATEs. Run logs (with only one exception that's not likely to be true here), audit tables, and similar should never suffer "UPDATEs". They should only suffer INSERTs. So, I guess the question goes back to the OP... what is the actual purpose of this table and what is its actually usage? Knowing that will help us answer your question.

Thanks Jeff . The table as I mentioned getting initial data from application which is integrated with different other applications. The data is processed and shared from our side through BizTalk and then the response updates the initial record to success\failure. The filter condition changes wrt to different application\process. As DBA we have made indexes on initial filter columns but since the table is huge adding an Index at this stage seems quite impossible without downtime. Is there anyway to handle this? Can the index creation be done in stages in SQL 2019 there I option to pause the index creation is there anything like that for SQL 2014?

The approach suggested by Eric seems doable.

Thanks for the explanation. I agree with Eric... moving historical data that will NEVER be updated EVER AGAIN would be the way to go here. If done correctly (separate tables for, say, each year or less), the tables could be individually indexed to support SELECTs with no regard for UPDATEs because they won't happen again on those older rows. In fact, if you put each of those tables into it's own separate file group and file, then you can save a huge amount on index maintenance and backups and even on restores (should that unfortunate opportunity rear its ugly head). The way all of that would be saved is by doing a final rebuild on each archive table to compact out any free space (which you will no longer need in each table because it'll never be updated again) and then set the file group for each table except the latest tables (the tables that will still be updated) to READ ONLY. Once in the READ ONLY mode, you can back them up one final time and never have to back them up ever again. You can also set your backups to skip over READ ONLY files. This will also help seriously shorten any stats rebuilds you do and, of course, since there's no chance of additional fragmentation on a READ ONLY file group, there's also zero need for any index maintenance.

And... all but the latest month or two (the months that usually suffer UPDATES) can be totally online and adding indexes to those latest file groups will be nasty fast because they simply won't be that large. Index maintenance will realize the same benefit.

There ARE some tricks involved in doing all of this for the least amount of wasted space in each file group and how to transfer the data from the current monster table to the smaller tables in an online fashion but it will be totally worth it.

And, depending on the structure of your table, I might even be able to show you how to make it so that the clustered index never becomes substantially fragmented.

And, just to assure you this type of thing does work exactly as I've stated (and I'm sure that Eric would agree), I have a 2005 database that just won't go away... the big table in it has 9.5 years of data (a total of 1.2 TERAbytes!!) stored in 114 monthly file groups and 112 of them are READ ONLY. It takes less than 6 minutes to back it up (there's other things in the PRIMARY file group) and I never have to rebuild the Clustered Index on that table. My only regret is that I partitioned it as a "Partitioned TABLE" instead of a "Partitioned VIEW" and I'm getting ready to change that!

If that's of interest, post back with the CREATE TABLE statement, the indexes, and any constraints and we can help you work up a plan. I will tell you that if you want this done without a shedload of down time or nuclear explosions of the log file, it WILL take some planning. It's only a bit complex but there's nothing difficult about it... unless you don't actually have a plan for it. You'll also (temporarily at least) need around 70 or 80 GB of extra disk space to pull this off with as little down time (I'm thinking "just a minute or two" if done right) as possible. When I partitioned my big table, it had about 7.5 years of data in it (about 940 GB, which is almost 16 times larger than your table) and, although it took a really long time to move the legacy data, my total down time was measure in seconds.

After that, life was pretty easy all the way around for this table.

The key is that we need the things I asked for an more information about the data itself. There's also a seriously great chance that there will be no need for any front end changes.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Eric M Russell SSC Guru Points: 125255 More actions · Answer 10

Also, since the application is only filtering on a tiny subset of rows, then consider simply using Filtered Indexes as an alternative to my previous suggestion of a Paritioned View which involves some initial downtime to move around data. Even on a 60+ GB scale table, a filtered index requires a fraction of the time to create, uses less storage, and faster lookups / scans.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho