How does the target table size impact the insert performance

Question

Post reply

How does the target table size impact the insert performance

Rocky's

Ten Centuries

Points: 1102
More actions
March 3, 2019 at 1:23 am

#409564

I have a target table which is incrementing with 2 Million records every day from the past 30 days. Earlier this job used to complete in 15 mins, but nowadays it's taking to much of time (10hrs). So, I have tried to reduce the commit size by inserting data in chunks and I can see some improvement. But still, I have to optimize these writes.
My target table is having only one index (clustered index - identity(1,1) - primary key).
I am using Insert into ...select from ... as it will reduce the logging.
Currently, the target table is having 500 milling records.

Any suggestions would be greatly appreciated

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Joie Andrew One Orange Chip Points: 27360 More actions · Answer 1

Joie Andrew

One Orange Chip

Points: 27360

March 3, 2019 at 4:25 am

#2023394

Is the target database having to grow during your insert times? What are your autogrow settings?

Joie Andrew
"Since 1982"

Eirikur Eiriksson SSC Guru Points: 182631 More actions · Answer 2

Eirikur Eiriksson

SSC Guru

Points: 182631

March 3, 2019 at 8:01 am

#2023395

Joie Andrew - Sunday, March 3, 2019 4:25 AM

Is the target database having to grow during your insert times? What are your autogrow settings?

If the autogrowth is on a percentage setting, this is to be expected.
😎
What are the sizes of the database files?

Rocky's Ten Centuries Points: 1102 More actions · Answer 3

Rocky's

Ten Centuries

Points: 1102

March 3, 2019 at 11:52 am

#2023396

MDF Size: 700 gigs
LDF Size: 70 GB
MDF - Auto Growth - 512 MB Unlimited
LDF - Auto Growth - 512 MB, Limited to 1900 Gigs

Rocky's Ten Centuries Points: 1102 More actions · Answer 4

Rocky's

Ten Centuries

Points: 1102

March 3, 2019 at 12:14 pm

#2023398

Joie Andrew - Sunday, March 3, 2019 4:25 AM

Is the target database having to grow during your insert times? What are your autogrow settings?

MDF Size: 700 gigs
LDF Size: 70 GB
MDF - Auto Growth - 512 MB Unlimited
LDF - Auto Growth - 512 MB, Limited to 1900 Gigs

Rocky's Ten Centuries Points: 1102 More actions · Answer 5

Eirikur Eiriksson - Sunday, March 3, 2019 8:01 AM

Joie Andrew - Sunday, March 3, 2019 4:25 AM

Is the target database having to grow during your insert times? What are your autogrow settings?

If the autogrowth is on a percentage setting, this is to be expected.
😎
What are the sizes of the database files?

MDF Size: 700 gigs
LDF Size: 70 GB
MDF - Auto Growth - 512 MB Unlimited
LDF - Auto Growth - 512 MB, Limited to 1900 Gigs

frederico_fonseca SSCoach Points: 15304 More actions · Answer 6

I think there may be more to this than just a simple insert.
Would it be possible to post here a actual explain plan of that insert (and if this is part of a proc you would need to get it from a trace).

and if you could also post the DDL of that table, the specific code you are using to do the insert (including your "batch chunk" code)

Regarding your autogrowth - possibly a bit small for that db size - depending on how much it grows per week I would change it to be enough for 4 to 8 weeks growth.
Would also make sure that IFI (instant file initialization) is enabled for the instance user - see https://www.brentozar.com/blitz/instant-file-initialization/

Jonathan AC Roberts SSCoach Points: 17601 More actions · Answer 7

Jonathan AC Roberts

SSCoach

Points: 17601

March 4, 2019 at 7:03 am

#2023429

Are you sure there is only one index on the table?
Does the query you use to insert data reference the insert table in a query (or any other table)?

Leo.Miller SSChampion Points: 13035 More actions · Answer 8

These may seem like random questions, but:
1) Is the database set for Read Committed Snapshot Isolation?
2) Are there any deletes on the table?
3) Are there any old open connections to the database (over SPID 50)?

If the answer to 1) & 3) is yes, kill the old /inactive SPIDs and see if that resolves the problem.

Leo
Nothing in life is ever so complex that with a little work it can't be made more complicated.

Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.

Rocky's Ten Centuries Points: 1102 More actions · Answer 9

@frederico_fonseca - Thanks for your response. Unfortunately, I can't post the query here.
@Jonathan AC Roberts - Thanks for your response. Yes, There is only one cluster index on the table.
@Leo.Miller - My answer is No to all your questions.
Today I have tried to load the same data in a non-prod environment, it got completed in 10mins. After seeing this, I doubted the PROD server performance, but DBA confirmed that the server's health is good.
Digging all the way around...

frederico_fonseca SSCoach Points: 15304 More actions · Answer 10

Rocky's - Wednesday, March 6, 2019 5:22 PM

@frederico_fonseca - Thanks for your response. Unfortunately, I can't post the query here.
@Jonathan AC Roberts - Thanks for your response. Yes, There is only one cluster index on the table.
@Leo.Miller - My answer is No to all your questions.
Today I have tried to load the same data in a non-prod environment, it got completed in 10mins. After seeing this, I doubted the PROD server performance, but DBA confirmed that the server's health is good.
Digging all the way around...

Testing on a non-prod unless the target DB is exactly the same size (or near enough) and the volume of data is the same, then its not a valid test.

what does your DBA see when its taking 10 hours? any experienced DBA would hardly allow a query to go for that long without doing a full investigation and finding out the root cause of the issue and alternative ways of improving it.

So unless your query is doing a insert into ... select from tbl -- WITHOUT any joins to any table then the issue is most likely your query and/or the way you are doing the loop and/or other load on the server .. few more options, but without the query/plan nearly impossible to help

A simple loop/insert is not ever privileged information that can't be shared.
As for the explain plan, you can use SQL Sentry One plan explorer (which is a MUST have for any developer/dba and is free) which will anonymize the plan making it suitable for sharing.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 11

Rocky's - Wednesday, March 6, 2019 5:22 PM

@frederico_fonseca - Thanks for your response. Unfortunately, I can't post the query here.
@Jonathan AC Roberts - Thanks for your response. Yes, There is only one cluster index on the table.
@Leo.Miller - My answer is No to all your questions.
Today I have tried to load the same data in a non-prod environment, it got completed in 10mins. After seeing this, I doubted the PROD server performance, but DBA confirmed that the server's health is good.
Digging all the way around...

An obfuscated (preferably actual) plan would be ideal, failing that, an obfuscated query. Neither are difficult or time-consuming to prepare.
Without one of these, everybody is guessing.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jonathan AC Roberts SSCoach Points: 17601 More actions · Answer 12

Rocky's - Wednesday, March 6, 2019 5:22 PM

@frederico_fonseca - Thanks for your response. Unfortunately, I can't post the query here.

You could at least give us an idea of what the insert is doing.
Is it just inserting from another table?:
INSERT INTO myTable1(col1...) SELECT (Col1,...) FROM myTable2
or is it inserting from another set of tables with joins?:
INSERT INTO myTable1(col1...) SELECT (Col1,...) FROM myTable2 INNER JOIN myTable3 ON... INNER JOIN myTable4 ON ...
because it could be the query that's selecting the data, and not the table that the data is being inserted into, that's causing the problem.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 13

First, let's clear up the notion about reducing logging... the only way that you can reduce logging during INSERT/SELECT is if a bunch of things happen...

1. The database must be in either the BULK LOGGED or SIMPLE Recovery Model. If the database is normally in the FULL Recovery Model, then DO NOT USE THE SIMPLE RECOVERY MODEL because it'll break the transaction log file chain, which will need to be restarted either with a FULL or DIF backup after changing back to the FULL Recovery Model.
2. If the table has Non-Clustered Indexes on it, you might be in deep Kimchie when it comes to minimal logging even with the use of Trace Flag 610 or the other mumbo-jumbo that MS claims will allow you to do such a thing.
3. You CAN get minimal logging for a table with a Clustered Index but a bunch of things need to happen.
3.1 You MUST use WITH (TABLOCK) on the INSERT statement.
3.2 You MUST do the insert in the same order as the Clustered Index (CI). With the CI being on an IDENTITY column, that will be easy.
3.3 This part is undocumented but I've found it to be very true... if the Insert/Select query that you're using to populate the table has any variables in it, then you almost always have to use OPTION(RECOMPILE) so that the values of the variables are actually assigned in the query plan as run-time constants.

The size of the table doesn't matter here. If you're inserting the same amount of data at the end of the CI (thanks to the identity column), then the same amount of data should always take the same amount of time especially if you have "Instant File Initialization" enabled. Log file growth will cause problems as it always has and always will because of the VLF formatting it has to do but you seem to have that handled because you used growth in MB rather than percent for both the MDF and the LDF.

All that being said, none of it matters because those are just requirements. What we need you to do is...
1. Post the full DDL for both the source and destination tables. Make sure that you include the definitions for all indexes, constraints, any related indexed views (and their indexes), triggers, and the definitions of any functions that may appear in computed columns.
2. Post the exact code that you're using for the INSERT/SELECT.
3. Identify what method you use to update stats (especially on the source table) and how often you do it.
4. As the others have stated, attaching an "Actual" execution plan would be incredibly helpful. The guessing on our part will only continue if you can't provide it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 14

One more thing to add... IF you're using SET IDENTITY_INSERT tablename ON, then THAT's the problem with performance. It's a well know fact (and there is a posted plea with MS to fix it) that setting causes the data to materialize in TempDB even when all other requirements for minimal logging have been correctly implemented. Until they fix it, there is basically nothing that can be done for appending rows to a larger table that will help.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions