best way to Update 500 M records

  • Hardware/Software:

    MS SQL Server 2008 R2 on WinServer 2008 Enterprise Ed. Cluster [2 nodes], 32 CPU, 128 GB RAM

    I have to tables Tran1 and Tran2:

    CREATE TABLE [dbo].Tran1(

    [NBSEQUENCE] [numeric](15, 0) NOT NULL,

    ..<100 more fields..

    [nbRptSubCatID] [numeric](6, 0) NULL,

    CONSTRAINT [PK_ARCHIVE_TRANSACTIONS] PRIMARY KEY CLUSTERED

    (

    [NBSEQUENCE] ASC

    )

    --table has 1.8 Billion records

    CREATE TABLE [dbo].Tran2 (

    [TRANSACTIONID] [bigint] NOT NULL,

    ..<100 more fields..

    [RptSubCatID] [int] NULL

    )

    --table has 1.2 Billion records

    Tran2 has a Non-Clustered Unique index on [TRANSACTIONID].

    Both tables are similar in a way that they hold the same financial transaction data; Tran1.[NBSEQUENCE] has the same values as Tran2.[TRANSACTIONID],

    they could be joined 1 -to- 1.

    Now I need to copy ~ 500 Million values from Tran1.[nbRptSubCatID] to Tran2.[RptSubCatID].

    With straight UPDATE by JOIN being out of question, I was thinking about creating a procedure that would update Tran2.[RptSubCatID] by chunks, say,

    5 Million records at a time UNCOMMITTED, then COMMITTING after each iteration, and so on till completion.

    Can anyone provide some opinion on the best way to do this Update? I am curious to hear from someone on the use of set-based solution here, if any.

    Thanks,

    Sergei Z

  • I think you are on the right track. get your update query written with the join then as you suggested update X number of records at a time till done. X depends on the table activity and what sort of other processes you have accessing the table. i might look at a smaller number of records at one time depending on the time it takes to update the records.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • It also depends if you can identify the previous updates. For instance, the columns in the second table you are updating are null.

  • Shame these are not partitioned tables 🙂

    Probably the easiest way run this process in batches and even do it in parallel would be to use the content of NBSEQUENCE, like dividing it in 100 chuncks and start processes addressing each chunk.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • PaulB-TheOneAndOnly (3/26/2012)


    Shame these are not partitioned tables 🙂

    Probably the easiest way run this process in batches and even do it in parallel would be to use the content of NBSEQUENCE, like dividing it in 100 chuncks and start processes addressing each chunk.

    Pablo,

    actually TRANSACTIONDATA table is partitioned by field TRANSACTIONDATE, some 40 partitions , one month per each.

  • Sergei Zarembo (3/26/2012)


    PaulB-TheOneAndOnly (3/26/2012)


    Shame these are not partitioned tables 🙂

    Probably the easiest way run this process in batches and even do it in parallel would be to use the content of NBSEQUENCE, like dividing it in 100 chuncks and start processes addressing each chunk.

    Pablo,

    actually TRANSACTIONDATA table is partitioned by field TRANSACTIONDATE, some 40 partitions , one month per each.

    Is TRANSACTIONDATA the table expected to get the updates?

    SQL Server 2008 has the ability to escalate locks to "partition" level before locking the whole table - this would help when running jobs in parallel, one hitting each individual partition.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Is TRANSACTIONDATA the table expected to get the updates?

    SQL Server 2008 has the ability to escalate locks to "partition" level before locking the whole table - this would help when running jobs in parallel, one hitting each individual partition.

    Yes, Tran2.[RptSubCatID] is the field getting updated. Looks like I need to read on **running jobs in parallel, one hitting each individual partition**? Can you recommend some sources?

    Thanks to everyone who has replied!

    Sergei Z

  • Sergei Zarembo (3/26/2012)


    Is TRANSACTIONDATA the table expected to get the updates?

    SQL Server 2008 has the ability to escalate locks to "partition" level before locking the whole table - this would help when running jobs in parallel, one hitting each individual partition.

    Yes, Tran2.[RptSubCatID] is the field getting updated. Looks like I need to read on **running jobs in parallel, one hitting each individual partition**? Can you recommend some sources?

    Thanks to everyone who has replied!

    Sergei Z

    Just be sure the WHERE clause includes "TRANSACTIONDATE between low-limit and high-limit" where low-limit and high-limit exactly match a specific partition that should tell SQL Server to hit a single partition rather than hitting the whole table. WHERE clause should include same between condition for the source table.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • RE: Just be sure the WHERE clause includes "TRANSACTIONDATE between low-limit and high-limit" where low-limit and high-limit exactly match a specific partition that should tell SQL Server to hit a single partition rather than hitting the whole table. WHERE clause should include same between condition for the source table.

    Pablo,

    I understand this part [running Update against 1 partition]. I'm not sure how would you run a few partitions in parallel. Was it what you referrred to earlier. or I misread it?

    Thank you.

  • Sergei Zarembo (3/26/2012)I understand this part [running Update against 1 partition]. I'm not sure how would you run a few partitions in parallel. Was it what you referrred to earlier. or I misread it?

    Sorry, missed that.

    I would do it by starting a separate job for each partition, each one hitting a specific one.

    Depending on processing power, etc. you should gauge how many processes running at the same time a.k.a. in parallel, perform the better; one one end, one partition at a time, on the other end 40 jobs, each one hitting one of your 40 partitions. As almost alwyas happens in life, the better alternative should be something in between.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Pablo, Thanks for your prompt response.

    Sergei Z

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply