SQL to do a upsert or update/insert

Question

Post reply

SQL to do a upsert or update/insert

teekay-101299

Mr or Mrs. 500

Points: 585
More actions
March 14, 2008 at 5:39 pm

#184151

I am using SQL 2005.

I have a 2 tables. A source and a destination table.

The source table is refreshed every 30 seconds with data that contains (updates to records and new records) from a flat file using SSIS.

I want to update/insert to the destination table from the source table. The source table will have approx 10,000 records.

What is the best way to accomplish this? Or should I say most efficient way?

Thanks

Teekay

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Adam Haines SSC-Insane Points: 23217 More actions · Answer 1

Probably would be faster to truncate the destination table and bulk insert from the source to the destination.

I really do not see a reason to spend time trying to figure out what to update and what to insert, if you need all the data from the source.

This really does not even need to be a package. You could do something like

TRUNCATE TABLE MyDatabaseName.dbo.DesinationTable;

INSERT INTO MyDatabaseName.dbo.DestinationTable

SELECT *

FROM MyDatabaseName.dbo.SourceTable WITH (TABLOCK);

My blog: http://jahaines.blogspot.com

Adam Haines SSC-Insane Points: 23217 More actions · Answer 2

I forgot to mention that you can set the tsql I posted into a job to run as often as you want.

I do not know what recovery model you have selected for the destination database,but using bulk logged, will give you minimally logged transactions, as you are not replicating. So, you should consider using the bulk logged recovery model for the insert and then switch back to full.

If you are using the simple recovery model, you needn't worry about minimally logged transactions.

My blog: http://jahaines.blogspot.com

Steve Jones - SSC Editor SSC Guru Points: 728159 More actions · Answer 3

I'd typically add a column to the source that loads with a "loaded" value. That way you know what's been sent in. Then you could update xx records with a "processing" or other token. this is in case your process breaks or runs too slow.

For Upsert (Pre-2008), I usually match the source with the destination and do updates:

update dest

set destcol = sourcecol

where sourcepk = destpk

and token = 'processing'

Once that's done, I delete that data (or set a new token, depending on needs)

delete source

from dest

where sourcepk = destpk

and token = 'processing'

then everything that's left is an insert.

Note that SSIS can do most of this with better package programming, removing the need for a source table.

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 4

To give a more correct answer, I think we need more information about the tables and the data. We have just about enough info to give a swap, but that's about it.

If you could post the DDL of the tables, some sample data in the form of unioned inserts, and what the expected results would look like, we could give you a much better answer.

😎

Adam Haines SSC-Insane Points: 23217 More actions · Answer 5

Just rethinking about the problem here. But another solution would be to use DML triggers. This would fit your requirement and give a near real time replica of the source table. This would elimate your need to run the process every 30 seconds. The triggers should be set to use set based inserts/updates/deletes.

Like Lynn suggested, post your DDL and some test data. We do not know the tables or the data involved. We do not even know if these tables are in the same data base, or even on the same instance.

My blog: http://jahaines.blogspot.com

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 6

How many rows in the desitination table?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Adam Haines SSC-Insane Points: 23217 More actions · Answer 7

Adam Haines

SSC-Insane

Points: 23217

March 15, 2008 at 3:26 pm

#790061

He said there are aprox. 10,000, now, though records can be inserted and updated.

My blog: http://jahaines.blogspot.com

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 8

Thanks Adam, but he said

"The source table will have approx 10,000 records."

I want to know how many rows are in the destination table. That will answer a lot of questions on how this should be done.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

teekay-101299 Mr or Mrs. 500 Points: 585 More actions · Answer 9

Thank for eveyones input... Sorry for the delayed response... I have been down and out with the flu for that past few days.

The data table are fairly straight forward (Text (MAX 60), Integers, Money and Dates) The tables have primary keys. I will be receiving about 10,000 records in the csv file.

I am trying to replicate a stage table to a production table (the two tables are identical). I am doing all the conversion/error checking from the .csv files to the stage table using SSIS. Once the data is successfully loaded into the stage table, I want to quickly move the data into production.

SSIS seem to be a lot of work for what i am trying to accomplish. Basically update all the records in production that are different in stage. Any new records in stage that do not exist in production should be inserted.

Deletions are handled sperately.

The front end of the database will be a web application. So people will be reading the data sets when I am writing to the database.

So I want to make updating/inserting/deleting to/from the production database least evasive as possible. (I may have to do some data caching on the application side to avoid incomplete data sets.... but this is just a thought)

Thanks again,

Teekay

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 10

I am trying to replicate a stage table to a production table (the two tables are identical).

You're still not telling the whole story here... you already said the staging table has about 10,000 rows... how many does the destination table have? It's very important for this...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 11

Let me ask it a slightly different way... could the content of the staging table be used to replace 100% of the final destination table in production?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Steve Jones - SSC Editor SSC Guru Points: 728159 More actions · Answer 12

Steve Jones - SSC Editor

SSC Guru

Points: 728159

March 17, 2008 at 8:48 am

#790389

I agree with Jeff. It depends on how you are moving the data as to the best advice.

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 13

Did I just see him mention that there were at most 60 TEXT fields? As in TEXT, not varchar?

Wow...that's going to hurt.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

teekay-101299 Mr or Mrs. 500 Points: 585 More actions · Answer 14

Thank you all again for your responses....

Jeff -The production table will have an estimated 30,000 rows to start and will grow over time as records are added. We will be archiving data yearly, so I do not foresee the table to go beyond 500,000 records (and this is a VERY high estimate)

Here is how this works... hopefully I can explain this process easily.

A legacy process is sending .csv file every 30 seconds or so. the .csv file contains the last 7 days of orders. It sends the last 7 days of orders incase the system goes offline or there is problem. This way, the system can easliy recover itself. We may change the number of days to be less, but we are in the beginning stages of this project, so we are just testing how this will work.

The .csv file is FTPed and FileWatcher waits for the transfer to complete. Once the transfer is complete, SSIS deletes all the data in the stage table and imports the data from the .csv file. The SSIS process handles all the data conversion and error checking. Once the data is in the stage table, I what to update the records that have change in stage to production and insert new records in stage to production. Deletion of records will be handled separate.

Matt - The comment about "TEXT" is incorrect. I am using "VARCHAR(60)" in SQL 2005. However, I am not sure why SSIS does not offer "VARCHAR" as a data type of OLE. This is why a made this typo. Sorry for the confusion.

Hopefully this helps.

Teekay