Finding the best way..

  • Everyone,

    Please guide me which one is good for big databases.

    Scenario:

    I have to compare two of my tables one is archived and other is production. i want to update my archive table with respect to the greater modified date, so i have lets say 10000 records which have greater modified date and all that are in my #temp table.

    NOW i have two ways:

    - i can update those records from temp table to my archive table

    and

    - i can delete those records from archived and INSERT again from temp to archive table.

    Records may increase from 10000 also.

    what should i do in order to keep performance issue in mind.??

    Thanks,

    Noman

  • I would update the rows that exist and insert the ones that don't. Like this:

    UPDATE Archive

    SET COLUMN_list = T.Column_List

    FROM

    Archive A JOIN

    #temp T ON

    A.primary_key = T.primary_key

    INSERT INTO Archive

    (

    COLUMN_list

    )

    SELECT

    T.COLUMN_list,

    FROM

    #temp T LEFT JOIN

    Archive A ON

    T.primary_key = A.primary_key

    WHERE

    A.primary_key IS NULL

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • I like Jack's solution, though I might update, delete the matches, insert the rest. Not sure which is better.

  • Noman - I have run into something very similar to this in some of the processes we run, after doing some exhaustive searches I found a combination of a few things that helped and were recommended by a few experts: Transactions and Truncating. Both appear to increase the rate at which a process like yours will take place. I created some sample data (9 columns by 10000 rows) and then created an archive table that had half as many records. I used Jack's answer to check against and below are the stats:

    Truncate and Transactions: 62ms average total execution time over three trials same data

    Update and then Insert: 168ms average total execution time over three trials same data

    Either way you can't blink your eyes faster than this gets done with 10000 records at 9 wide.

    Truncate Table Archive

    Begin Tran

    INSERT INTO Archive

    (

    COLUMN_list

    )

    SELECT T.COLUMN_list

    FROM #temp T

    Commit

  • Nice one Wesley. Certainly truncate will be fast. Of course any indexes will affect the solutions as well.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • Jack, how so? I am not familiar enough with indexes to understand your point. Will they have to be rebuilt after a truncate?

  • Indexes either speed up or slow down the Updates (how's that for being specific) and will slow down the insert, especially for a clustered index.

    For example in your test data, if you add a clustered index on audit_id, the update/insert, because you are doing only an update, on my laptop the average was around 140ms and the truncate/insert was around 100ms. I'd have to run a few more tests to really get a good feel, but that's the general idea.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • So, if I understand correctly, a good index will likely speed up the update process, but have no bearing on the insert?

  • Here's the way I understand it, and I could be wrong, so don't take this for the definitive answer. Check out Gail Shaw's[/url] and Kimberly Tripp's[/url] blogs, they have some excellent posts on indexes.

    If the columns in the index are being updated then the index will likely slow down the update because SQL Server has to update the index pages as well. In the case of an index on a primary key that will NOT be updated it should "find" the rows to update faster and thus the update will be faster.

    Inserts will probably be a bit slower because the index pages will have to be updated as well, particularly if the insert does an insert in the middle of a clustered index. This is why, especially in OLTP systems, it is recommended that your clustered index be on a monotonically increasing value like an identity column or a created date. Think about it like this, if I have a clustered index on last name and I insert 'Aaron, Henry' which will be at the beginning of the index, every row has to be moved, so that insert will be slower than if I have an identity column as the clustered index and insert 'Aaron, Henry'.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • Thanks, Jack your response, and the links, proved very informative.

  • I sense extreme danger... someone tell me why you'd truncate a perfectly good archive table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • If you truncate the table, then you will lose all the data to include anything that may not be in your current production file. Is that what you really want to do? Also, what if there is a record in your archive table that just happens to have a modified date within the range you are pulling down from Production but the record is no long in Production? If you simply delete out everything from your archive table within that date range, you are going to lose that record. Again, I am not sure that is what you want. I'd say update matching records first, then insert the new ones. Just my take but I am not sure what your exact business needs are.

  • I had thoughts about that too. The conclusion I came to was that it's not truly an archive table, as I know it. An archive table would be filled with data that was correct on a certain date. In the processes I have created recently there needed to be a way to check a table for changes, or multiple tables for changes across one person/place/thing. Triggers are not desired, so what I was left with was comparing an image (archive) of the table to its current state. So that's one reason. The other is what the original poster stated:

    NOW i have two ways:

    - i can update those records from temp table to my archive table

    and

    - i can delete those records from archived and INSERT again from temp to archive table.

    ...what should i do in order to keep performance issue in mind.??

    Jeff, I loved the "I sense extreme danger", almost had coffee in my sinuses.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply