Bulk Insert

  • We are bulk inserting data in a single table with 'bulk insert' statment. The table in which we are doing this bulk insert has got 1 clustered index on an bigint field and 6 non clustered indexes.
     

    The insertion of first 10-12 millions are done in 7-8 hrs. After that each million takes 3-4 hours. It is very clear that this difference in performance is not gradual, but very much sudden.
     
    We are not sure about the reason behind this sudden degradation in performance. We assume that this is happening because of some configuration/tweaking that needs to be done at the SQL server end.

     

    Configuration of the machine running DB server: Dual processor(2.4 GHz), 2 GB RAM, 512 HDD.

     
    What could be the possible reason for it? Do we need to do some database tweaking for it?
     
    --Deepak

  • - win-permon and sql-profiler can point you in the right direction.

    - post the script (DB-(filegroups) + table + indexes + indexed views)

    - is the db autogrowing during this load ? (do it proactive if you can)

    - hdd-extent size make is as large as you need.

    - if your load-window is large enough, drop the non-clustering indexes before load and create them again after all has been done.

    - sort your rows-to-be-loaded according to the clustering index.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • thanks alzdba.

    it is not viable to drop and recreate the indexes as

    1 . bulk insert is an ongoing process

    2 . certain select queries are also fired on the data loaded.

    on an average 20 million records are inserted in the table. A new table is created everyday. 

    This is the table structure in which data is being inserted

    CREATE TABLE [dbo].[tbl_lm_event_2_1] (

     [id] [bigint] IDENTITY (1, 1) NOT NULL ,

     [server_name] [varchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [date] [datetime] NULL ,

     [DeliveryTime] [datetime] NOT NULL ,

     [DeliveryMicroSeconds] [smallint] NULL ,

     [InjectionTime] [datetime] NULL ,

     [InjectionMicroseconds] [smallint] NULL ,

     [Delay] [int] NULL ,

     [IronPortMID] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [Size] [int] NULL ,

     [RecivingServerIP] [char] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [FromAddress] [varchar] (250) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [Domain] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [NoOfReciepts] [smallint] NULL ,

     [NumberOfAttempts] [smallint] NULL ,

     [ToAddress] [varchar] (250) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [CustSize] [smallint] NULL ,

     [Message_ID] [varchar] (200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[tbl_lm_event_2_1] WITH NOCHECK ADD

     CONSTRAINT [PK_tbl_lm_event_2_1_2003_10_29] PRIMARY KEY  CLUSTERED

     (

      [id]&nbsp WITH  FILLFACTOR = 70  ON [PRIMARY]

    GO

    CREATE  INDEX [ix_tbl_lm_event_2_1_DeliveryTime_2003_10_29] ON [dbo].[tbl_lm_event_2_1]([DeliveryTime]) WITH  FILLFACTOR = 70 ON [PRIMARY]

    GO

     CREATE  INDEX [ix_tbl_lm_event_2_1_Size_2003_10_29] ON [dbo].[tbl_lm_event_2_1]([Size]) WITH  FILLFACTOR = 70 ON [PRIMARY]

    GO

     CREATE  INDEX [ix_tbl_lm_event_2_1_FromAddress_2003_10_29] ON [dbo].[tbl_lm_event_2_1]([FromAddress]) WITH  FILLFACTOR = 70 ON [PRIMARY]

    GO

     CREATE  INDEX [ix_tbl_lm_event_2_1_Domain_2003_10_29] ON [dbo].[tbl_lm_event_2_1]([Domain]) WITH  FILLFACTOR = 70 ON [PRIMARY]

    GO

     CREATE  INDEX [ix_tbl_lm_event_2_1_ToAddress_2003_10_29] ON [dbo].[tbl_lm_event_2_1]([ToAddress]) WITH  FILLFACTOR = 70 ON [PRIMARY]

    GO

     CREATE  INDEX [ix_tbl_lm_event_2_1_Message_ID_2003_10_29] ON [dbo].[tbl_lm_event_2_1]([Message_ID]) WITH  FILLFACTOR = 70 ON [PRIMARY]

    GO

    --------------------------------------------------

    strored proc to bulk insert

    CREATE PROCEDURE usp_lm_i_event_data

    @table_name varchar(50), -- name of table

    @data_file_name varchar(500), -- Src File Path

    @format_file_name varchar(500), -- Path of format file

    @rows_count_per_batch int,

    @batch_size int

    AS

    DECLARE @query VARCHAR(1000)

    SET NOCOUNT ON

    SELECT @query = 'BULK INSERT  ' + @table_name + ' FROM ''' + @data_file_name + ''' WITH (FORMATFILE = ''' + @format_file_name +

      ''',rows_per_batch=' + CONVERT(VARCHAR(10),@rows_count_per_batch) + ', batchsize=' +

      CONVERT(VARCHAR(10),@batch_size) + ')'

    EXEC(@query)

    SET NOCOUNT OFF

    GO

    Appreciate you help.

  • nice large index-columns

    My guess is your primary file(-group) is autogrowing.

    <<A new table is created everyday. >>

    If you can, first create the table with the clustered index.

    then load the data

    then add the indexes.

    This way you can hope the table and index file-data-chunks don't get mixed up to much.

    Split your clustering index and the other indexes on different physical disks.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • You should also check TempDB as it will be growing to support the indexes during the inserts.

    Steve Hughes, Magenic Technologies

    steve.hughes@magenic.com

  • Thanks for the input.

    i have added a secondary file group and set the non clustered indexes on it.

    Primary file (-group) is auto-growing.

    The application is basically a for data-mining. We have given the facility to query the data while it is being loaded; loading is a continuous process with a bulk-insert batch size of 100,000. Hence i create the indexes when the table is created.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply