Creating millions of records in one go

  • Our company produces and sells gift vouchers, each having an individual serial number. When we have new stock printed, we have to create a database record for each voucher, as we need to know when it was sold, to whom, and where and when it was redeemed. When we receive new stock, we do it in large quantities - sometimes up to 5 million at a time.

    At the moment, the records are inserted in a routine in the front end (Access adp) which is fed a start number, denomination, quantity and expiry date. It then loops, inserting a record for each serial number, and can take four or five hours to complete.

    I'm sure it would be more efficient to feed these parameters to a stored procedure on the server and let this handle it, and have come up with the following:

    CREATE PROCEDURE dbo.usp_InsertVouchers(

    @StartNumber int,

    @Qty int,

    @ExpiryDate smalldatetime

    )

    AS

    DECLARE @EndNumber Int

    SET @EndNumber = (@StartNumber + @Qty)

    While (@StartNumber < @EndNumber)
    BEGIN

    INSERT INTO dbo.tblVoucher
    (SerialNumber, ExpiryDate)
    VALUES (@StartNumber, @ExpiryDate)

    SELECT @StartNumber = @StartNumber + 1

    End
    GO
    [/code]

    Are there any better ways of doing it?

  • You may want to go through Jeff's article on tally table once and see if you can use some of the logics in ur case.

    http://qa.sqlservercentral.com/articles/T-SQL/62867/%5B/url%5D



    Pradeep Singh

  • Thank you very much - just what I needed!

    I wrote the following stored procedure - it created 2.75m records in 32 seconds!

    CREATE PROCEDURE dbo.uspInsertSerials

    (

    @StartNumber Int,

    @Qty int,

    @ExpiryDate SmallDateTime

    )

    AS

    DECLARE @strSQL Varchar(255)

    --=============================================================================

    -- Create and populate a Tally table

    --=============================================================================

    --===== Conditionally drop

    IF OBJECT_ID('dbo.Tally') IS NOT NULL DROP TABLE dbo.Tally

    --===== Create and populate the Tally table on the fly

    SELECT @strSQL = 'SELECT TOP ' + convert(varchar(20), @Qty) + ' IDENTITY(INT,' + convert(varchar(20),@StartNumber) + ',1) AS N INTO dbo.Tally FROM Master.dbo.SysColumns sc1, Master.dbo.SysColumns sc2'

    EXEC (@strSQL)

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N

    PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    INSERT INTO tblSerials

    (SerialNumber, ExpiryDate)

    SELECT dbo.udfFormatInt(N,10), @ExpiryDate

    From dbo.Tally

    GO

  • Well, Jeff's queries are like that, Rocket speed execution. 😀



    Pradeep Singh

  • ps (6/22/2009)


    Well, Jeff's queries are like that, Rocket speed execution. 😀

    :blush: Thanks, Pradeep. :blush: That's an awfully nice compliment especially coming from someone like you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (7/20/2009)


    :blush: Thanks, Pradeep. :blush: That's an awfully nice compliment especially coming from someone like you.

    Frankly speaking Jeff, I'm not even fraction of a percent of what you're capable of. Still miles to go!! 🙂



    Pradeep Singh

  • Hey Chris,

    Thanks for posting your code. Lot's of folks just take the money and run. I'd like to return the favor in the form of another enhancement or two for your code. Would you post the code for your udfFormatInt function as it would answer a lot of questions for me. Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Heh... couldn't wait. Since this is SQL Server 2000, there's no "programmable" TOP. So, I cheated to avoid the dynamic SQL (not that I'm against using dynamic SQL... it's just not needed here). Details are in the code.

    Also, didn't know what your function was doing for sure but thought it might just be padding leading zeros to the given width of 10 and so replaced the function with a littleinline coding. You also don't need the clustered index because you will be scanning the whole table, no matter what.

    All told, the following version of the code should run about twice as fast... yeah, I know... 32 seconds compared to 4-5 hours was probably good enough :-D...

    CREATE PROCEDURE dbo.uspInsertSerials

    @StartNumber INT,

    @Qty INT,

    @ExpiryDate SMALLDATETIME

    AS

    --===== Suppress the auto display of rowcounts

    -- to suppress false error indications.

    SET NOCOUNT ON

     

    --===== Constrain the number of rows to create

    SET ROWCOUNT @Qty

     

    --===== Build a temporary Tally table starting with "0".

    -- No need for a clustered index because we'll use

    -- the whole table later, anyway.

    IF OBJECT_ID('TempDB..#Tally','U') IS NOT NULL

    DROP TABLE #Tally

     

    SELECT IDENTITY(INT,0,1) AS N

    INTO #Tally

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

     

    --===== Return constraint on number of rows to normal

    SET ROWCOUNT 0

     

    --===== No need for any dynamic SQL if we do the insert like this...

    INSERT INTO dbo.tblSerials

    (SerialNumber, ExpiryDate)

    SELECT SerialNumber = RIGHT('0000000000'+CAST(@StartNumber+N AS VARCHAR(10)),10),

    ExpiryDate = @ExpiryDate

    FROM #Tally

    GO

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff

    You are correct in your guess about the udfFormatInt function:

    CREATE FUNCTION dbo.udfFormatInt

    (@input_number AS INT

    , @length as INT)

    RETURNS VARCHAR(50)

    AS

    /*

    * FUNCTION NAME

    * udfFormatInt

    *

    * DESCRIPTION

    * This function takes an integer and returns it as a string, with leading zeroes up to the length specified (up to 50)

    *

    * PARAMETERS

    * @input_number The integer to be converted

    * @length The length up to which it will be padded with zeroes

    *

    * RETURNS

    * The number converted to a string, with leading zeroes.

    *

    * EXAMPLE

    *

    * DECLARE @num INT

    * SET @num = 77

    * SELECT @num as Number, dbo.udfFormatInt(@num, 3) AS Formatted

    *

    * Number Formatted

    * ----------- --------------------------------------------------

    * 77 077

    *

    * (1 row(s) affected)

    *

    * HISTORY

    * 17-nov-03 PSA Original Version

    * 10/11/2008 CMQ More efficient version - loop removed

    */

    BEGIN

    DECLARE @output_string VARCHAR(50)

    SET @output_string = CAST(@input_number AS VARCHAR(50))

    RETURN RIGHT(REPLICATE('0',@length) + @output_string, @length)

    END

    I will check out your enhancements with pleasure

    Thanks

    Chris

  • I have just tried both your and my procedures on a copy of my live database with the following results:

    My procedure - 500,000 records - 1m 27s

    Your procedure - 500,000 records - 1m 34s

    I don't think the differences are significant, but a hell of a lot better than the several hours it would have took previously!

    Thanks again

  • Chris Quinn (7/21/2009)


    I have just tried both your and my procedures on a copy of my live database with the following results:

    My procedure - 500,000 records - 1m 27s

    Your procedure - 500,000 records - 1m 34s

    I don't think the differences are significant, but a hell of a lot better than the several hours it would have took previously!

    Thanks again

    Must have been something else going on with the machine right then... didn't you say it previously did 2.75 million rows in something like 32 seconds? On my poor ol' machine at home, I did 5 million rows in less time than what you have posted above. For 500,000 rows, the new proc took only 5 seconds and the old proc took 17.

    Heh... no matter though. It's like I said... I just worry about these types of things.

    Thank you very much for the feedback, Chris.

    By the way... I'm impressed! There was actually a nicely docuented header on that function. It's been my experience that people don't take the time to do it right like that. Nicely done.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • When I tried it last week, it was inserting into an empty table with no indexes. Today I was working with a copy of the real data.

    It was running on a local instance of SQL2000 (not on the server) and took the times stated for half a million records in each test. As I said - it's the difference between minutes and hours! I'm quite happy to use either method - previously I had to run the procedures out of hours (an was not allowed to do it remotely) - now I can run it and get home nearly on time!

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply