Fast Insert

  • which way is faster .

    Inserting data into a table having clustered Index or Inserting data into a table then create clustered index.

    Insertion happens from a different table and its about 1million rows

  • If you are sure your source data is unique then First Load data into your table and then create clustered index on it. It would be faster.

    ____________________________________________________________

    AP
  • It's faster to create the clustered index first, because if necessary you can always explicitly sort the data being into cluster key order for the insert. Typically SQL will sort it that way automatically, but check the query plan, and if it doesn't, specify it yourself.

    For example, say new_table is clustered on ( cola, colf ):

    INSERT INTO new_table

    SELECT ...

    FROM ...

    ORDER BY cola, colf

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • As always, "It Depends". For example, no one has mentioned using the high performance "SELECT/INTO" method, which works wonders even if minimal logging isn't possible due to the current Recovery Model. The ONLY way to know is to try it because "your mileage may vary" depending on the nature of the table and a bunch of other variables.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Here's a script and its execution plan containing the estimated stats for testing two different ways of inserting data into a table (INSERT vs SELECT INTO). I've used SQL Sentry Plan Explorer to get actual stats, and inserting into a table with a clustered index is faster (in this test) by ~200ms for 100,000 rows than inserting into a table and adding the index after. As Jeff Moden says above, "It Depends", but at least you've got something here to look at.

  • Gregg Dave (5/14/2015)


    which way is faster .

    Inserting data into a table having clustered Index or Inserting data into a table then create clustered index.

    Insertion happens from a different table and its about 1million rows

    As Jeff already said, it depends, suggest you have a look at this article, Minimally Logged Inserts

    😎

  • sickb00000y (5/14/2015)


    Here's a script and its execution plan containing the estimated stats for testing two different ways of inserting data into a table (INSERT vs SELECT INTO). I've used SQL Sentry Plan Explorer to get actual stats, and inserting into a table with a clustered index is faster (in this test) by ~200ms for 100,000 rows than inserting into a table and adding the index after. As Jeff Moden says above, "It Depends", but at least you've got something here to look at.

    Thanks sickb00000y,

    The result was better by 8 seconds when i tried with 5million rows with table having Index.

  • That's not surprising once you think through the process. Loading the table and then clustering it requires all the rows to be inserted twice.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • Here are a handful of approaches, listed from what's generally fastest (select / into) to slowest.

    Select into a new table (ie: SELECT * INTO B FROM A).

    Select into an empty table.

    Select into a table with no indexes, foreign keys, or triggers.

    Select into a table with indexs, et all, but order by clustering key.

    Select into a table with indexs,et all, and don't specify order by.

    Basically what you're trying to do is minimize transaction logging, avoid on-the-fly foreign key lookups and index writes, and avoiding page splitting (inserting non-sorted records into a clustered index).

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • That's a good point about the minimal logging.

    But from at least SQL 2008 on, you can get minimal logging on the insert into the existing empty clustered index as well, by doing the following:

    1) Insuring the insert is in clustered key order, as I noted above.

    AND (either

    A) Enable trace flag 610 (this has other considerations and should be vetted before being done).

    OR

    B) Specify the TABLOCK hint in the load.

    INSERT INTO clus_table ( TABLOCK )

    SELECT ...

    FROM ...

    ORDER BY clus_key1, clus_key2

    )

    Edit: Sorry for not mentioning that earlier. I always sort the rows and specify TABLOCK for a new table load by habit. On some servers I have TF610, but not on others, but it still doesn't hurt to specify TABLOCK on table initial load.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply