Faster way to INSERT INTO large table

  • I'm looking for some ideas on how to speed up the process of inserting new records into data tables. These data tables are very large and have anywhere from 6 to 10 primary keys (so if I do it wrong, then end results of a VERY long executery query will be an error).

    The process I typically use is this:

    INSERT INTO MainTable (All variables)

    SELECT All variables

    FROM AuxilliaryTable

    WHERE PrimaryKey1+PrimaryKey2 NOT IN

    (SELECT PrimaryKey1+PrimaryKey2

    FROM MainTable)

    Sometimes this is just fine. But today the table is just so big (349,186,932 records), and the clock on this query just hit 02:56:18 and I'm sitting here thinking, "there HAS to be a better way." I'm already going one quarter at a time, so limiting the insert is not the issue.

    Any ideas? Or at least something I could be doing while sql is executing?

  • amy lets see the actual execution plan; that may tell us if an index would help, or statistics, or much more.

    i'm not sure if that is real code or pseudocode; why would you add two numbers (that appear to be primary keys?) together?

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell, If you want the actual execution plan, you'll have to wait quite a while. There are two indexes on the insert into table (PK clustereed and non-unique, non-clustered) and a PK clustered index on the select from table.

    The use of the '+' sign is something my predecessor did, and I guess I just kept it up. But it's basically a way to save typing (maybe). Instead of typing a WHERE statement that said

    MainTable.PK1 = AuxTable.PK1 AND MainTable.PK2 = AuxTablePK2, etc (remember, we're talking about 10 PKs here) the PKs are all added together to create a column that looks like this: 270100502011040450103371100

    I am not defending this practice in any way. The whole point of my post is b/c I admit I need a better/ faster way. (I sound like Handy Smurf!)

  • Amy.G (11/21/2011)


    Lowell, If you want the actual execution plan, you'll have to wait quite a while. There are two indexes on the insert into table (PK clustereed and non-unique, non-clustered) and a PK clustered index on the select from table.

    The use of the '+' sign is something my predecessor did, and I guess I just kept it up. But it's basically a way to save typing (maybe). Instead of typing a WHERE statement that said

    MainTable.PK1 = AuxTable.PK1 AND MainTable.PK2 = AuxTablePK2, etc (remember, we're talking about 10 PKs here) the PKs are all added together to create a column that looks like this: 270100502011040450103371100

    I am not defending this practice in any way. The whole point of my post is b/c I admit I need a better/ faster way. (I sound like Handy Smurf!)

    that might be the performance bottleneck.

    if you have to add 10 keys together for the comparison, you have to do it to all 300 million rows. that probably generates a big temp table just to stage the 300 million rows for the comparison...do you have a whopping big temp table, by chance?

    joining on the 10 actual columns would be faster, since it could use the clustered index/primary key index, or if you created a calculated, persisted column that appended those 10 keys together, and put an index on it, that would probably help to.

    so i think that is probably the issue right there.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • OK, yes, you have a point that I should be paying much more attention to the indexes on a table and utilizing them. With that in mind, I want to get back to the logic of the insert and subsequent SELECT statement. It's basically, insert into MainTable everything in the AuxilliaryTable that's not already in the main table.

    There are many ways to write that SELECT statement. Any ideas which is best? Would a JOIN be better than EXCEPT?

    PS, a lot of articles about indexes are how to create them. Know of any good ones about how to utilize them?

  • Amy.G (11/21/2011)


    OK, yes, you have a point that I should be paying much more attention to the indexes on a table and utilizing them. With that in mind, I want to get back to the logic of the insert and subsequent SELECT statement. It's basically, insert into MainTable everything in the AuxilliaryTable that's not already in the main table.

    There are many ways to write that SELECT statement. Any ideas which is best? Would a JOIN be better than EXCEPT?

    PS, a lot of articles about indexes are how to create them. Know of any good ones about how to utilize them?

    well, the details will help here Amy;

    you mentioned ( i think) that your table has a 10 column primary key...is that right..10 columns are featured int eh PK of the table?

    I have one very similar, so I'm pretty familiar with this.

    can we see the DDL of the table, adn the real insert command you were using? that might provide enough to at least offer some suggestions; an estimated execution plan might help, since it's a lot of work for the actual.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • OK, the actual code:

    This is the table the records need to go into

    CREATE TABLE [dbo].[BED_MN](

    [stfips] [char](2) NOT NULL,

    [areatype] [char](2) NOT NULL,

    [char](6) NOT NULL,

    [periodyear] [char](4) NOT NULL,

    [periodtype] [char](2) NOT NULL,

    [period] [char](2) NOT NULL,

    [ownership] [char](2) NOT NULL,

    [indlevel] [varchar](1) NOT NULL,

    [indcodetype] [char](2) NOT NULL,

    [indcode] [varchar](6) NOT NULL,

    [adjusted] [char](1) NULL,

    [bedtypcode] [varchar](2) NOT NULL,

    [emp_level] [float] NULL,

    [emp_rate] [numeric](5, 1) NULL,

    [estbl_level] [float] NULL,

    [estbl_rate] [numeric](5, 1) NULL,

    [suppress] [varchar](2) NULL,

    CONSTRAINT [PK_BED_MN] PRIMARY KEY CLUSTERED

    (

    [stfips] ASC,

    [areatype] ASC,

    ASC,

    [periodyear] ASC,

    [periodtype] ASC,

    [period] ASC,

    [ownership] ASC,

    [indcodetype] ASC,

    [indcode] ASC,

    [bedtypcode] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    I just clicked on the "Display Estimated Execution Plan" and I'm a little embarrassed at how awful it is. "Missing Index" is a recurring theme. Must I make it public? But here is the original query.

    INSERT INTO [ALMIS_22].[dbo].[BED_MN]

    ([stfips],[areatype],

    ,[periodyear],[periodtype],[period],[ownership]

    ,[indlevel],[indcodetype],[indcode],[adjusted],[bedtypcode],[emp_level]

    ,[emp_rate],[estbl_level],[estbl_rate],[suppress])

    SELECT [stfips],[areatype],

    ,[periodyear],[periodtype],[period],[ownership]

    ,[indlevel],[indcodetype],[indcode],[adjusted],[bedtypcode],[emp_level]

    ,[emp_rate],[estbl_level],[estbl_rate],[suppress]

    FROM [Transfer].[dbo].[BED_MN_test]

    and

    [stfips]+[areatype]+

    +[periodyear]+[periodtype]+[period]

    +[ownership]+[indcodetype]+[indcode]+[bedtypcode] NOT IN

    (

    SELECT [stfips]+[areatype]+

    +[periodyear]+[periodtype]+[period]+[ownership]+[indcodetype]+[indcode]+[bedtypcode]

    FROM [ALMIS_22].[dbo].[BED_MN] )

  • Amy.G (11/21/2011)


    Lowell, If you want the actual execution plan, you'll have to wait quite a while. There are two indexes on the insert into table (PK clustereed and non-unique, non-clustered) and a PK clustered index on the select from table.

    The use of the '+' sign is something my predecessor did, and I guess I just kept it up. But it's basically a way to save typing (maybe). Instead of typing a WHERE statement that said

    MainTable.PK1 = AuxTable.PK1 AND MainTable.PK2 = AuxTablePK2, etc (remember, we're talking about 10 PKs here) the PKs are all added together to create a column that looks like this: 270100502011040450103371100

    I am not defending this practice in any way. The whole point of my post is b/c I admit I need a better/ faster way. (I sound like Handy Smurf!)

    I agree with Lowell here. The indexes have no chance to help with that methodology. That is certainly going to cause some issues.

    The other problem I would be concerned about here is duplicate connections, either as Varchar or numeric composite comparisons. Example:

    Numeric:

    Key1: 30 Key2: 15 = 45

    Key1: 25 Key2: 20 = 45

    Same connection.

    Varchar:

    Key1: abc Key2: def = abcdef

    Key1: abcd Key2: ef = abcdef

    Same connection. Because some of your key fields are varchar you can't ensure that spaces will end up in the characters.

    Except and an anti-semi null join (Left join where IS NULL) are basically the same thing. Except actually can run slower, as I showed a while ago in some testing. If it's important I'll find the topic I fired up on it and got a few local gurus to help me understand the 'under the hood'.

    I would avoid the not in mechanic you're using because of the multi-field connection and do a massive LEFT JOIN testing for NULLs on Almis_22..Bed_MN.

    However, there's really no good way out of this. These kinds of delta updates are just painful to do. One thing I WOULD do however is add a column to Transfer schema that's a Bit for 'existing', and then only do the insert after the comparison on those that are non-existing. That's mostly to keep my lock contention as low as possible, however.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Amy.G (11/21/2011)


    I just clicked on the "Display Estimated Execution Plan" and I'm a little embarrassed at how awful it is. "Missing Index" is a recurring theme. Must I make it public? But here is the original query.

    We've all been there. I would definately make sure both tables have an index on these fields, in the exact same order and sorting:

    [stfips] ASC,

    [areatype] ASC,

    ASC,

    [periodyear] ASC,

    [periodtype] ASC,

    [period] ASC,

    [ownership] ASC,

    [indcodetype] ASC,

    [indcode] ASC,

    [bedtypcode] ASC


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • for a quickie compare Amy, peek at the estimated planfor this, and also go ahead and run it;

    what kind of performance differnece do you get?

    i think it will be several order of magnitudes faster, as wella s being more accurate,as it avoids the issues Craig identified.

    note this is only the SELECT...not the whole insert.

    SELECT tx.[stfips],

    tx.[areatype],

    tx.

    ,

    tx.[periodyear],

    tx.[periodtype],

    tx.[period],

    tx.[ownership],

    tx.[indlevel],

    tx.[indcodetype],

    tx.[indcode],

    tx.[adjusted],

    tx.[bedtypcode],

    tx.[emp_level],

    tx.[emp_rate],

    tx.[estbl_level],

    tx.[estbl_rate],

    tx.[suppress]

    FROM [Transfer].[dbo].[BED_MN_test] tx

    LEFT OUTER JOIN [ALMIS_22].[dbo].[BED_MN] mn

    ON tx.[stfips] = mn.[stfips]

    AND tx.[areatype] = mn.[areatype]

    AND tx.

    = mn.

    AND tx.[periodyear] = mn.[periodyear]

    AND tx.[periodtype] = mn.[periodtype]

    AND tx.[period] = mn.[period]

    AND tx.[ownership] = mn.[ownership]

    AND tx.[indcodetype] = mn.[indcodetype]

    AND tx.[indcode] = mn.[indcode]

    AND tx.[bedtypcode] = mn.[bedtypcode]

    WHERE mn.[stfips] IS NULL

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • WOW. Lowell and Craig -- you guys really went above and beyond in helping me with this mundane issue. Thank you so much.

  • Amy.G (11/21/2011)


    WOW. Lowell and Craig -- you guys really went above and beyond in helping me with this mundane issue. Thank you so much.

    Amy keep us posted on this! getting feedback and feeling like we helped someone gives us the warm fuzzies.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Amy.G (11/21/2011)


    WOW. Lowell and Craig -- you guys really went above and beyond in helping me with this mundane issue. Thank you so much.

    Our pleasure. When you get those indexes into place and can check out the query Lowell posted, bring back your new estimated execution plan. Should be a lot less embarassing and then we can dig into the meat of this thing and see if there's anything else that would be a bit hard to explain without handing you a thousand point checklist. 🙂

    I would sincerely look into 2-staging this process however. Holding insert/exclusive locks on the Alvis_22 schema table until this completes is just bad form. Do the long work shared first, then do a final 'quick strike' on the table to get your records in.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • I would sincerely look into 2-staging this process however. Holding insert/exclusive locks on the Alvis_22 schema table until this completes is just bad form. Do the long work shared first, then do a final 'quick strike' on the table to get your records in.

    You have now totally lost me.

  • You're comparing calculated values to calculated values. Indexes are no longer a part of the equation. It's nothing but scans. That's a guaranteed performance degradation as more and more rows are added.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply