Importing/Updating data to an existing table

Question

Post reply

Importing/Updating data to an existing table

David-250683

Hall of Fame

Points: 3452
More actions
May 25, 2007 at 1:40 pm

#62507

Oh brilliant minds, young grasshopper seeks your guidance:

Nightly, data is exported from a non-SQL Server database to a text file. Then, via DTS, the data is imported and appended to an existing table in a SQL Server 2000 database. Sometimes the data contains new records; sometimes the data is meant to update existing records. However, the data is merely appending to the existing data so this results in duplicate records or multiple records with the same primary field. (The SQL Server database does not have any primary keys).

How can I set up the import so:

1) If the primary field exists, then the existing data is updated.

2) If the primary field does not exist, then the new data is appended (inserted) to the table.

Can this be done directly through the import job? Or should the data be first imported into a disposable (temp) table and then imported into the existing table?

David
You say you are "treating" KeyID and DateAdd as a composite key. Are they the key or not? Does KeyID normally serve as the primary key? If so, why are you keeping records with the same KeyIDs but different DateAdds?
What is the primary key for All_Title? ~~How~~ Why do you disable it to allow for duplicates, even if for a moment?
From the time the duplicate records are entered until the time they are removed, the table is in an inconsistent state -- where the table is outside the scope of the definition of the entity it is supposed to contain. Even if this is only momentary... and you don't foresee anything going wrong between the two operations -- wouldn't it be better if the table was never in that state at all?
You have apparently created an identity column just for the purpose of facilitating this operation. Personally, I don't like doing that. I make sure such "utility" data is absolutely necessary and, even when it is, I don't put it in the same table. That makes utility information part of the definition of the entity and it's not. Even Date_last_modified and User_who_modified information should be placed in separate utility tables designed for that purpose.

You've had at least a couple of suggestions here that don't raise the concerns I've listed (I'm making some assumptions on the Bulk Insert idea). It's your database, of course, but you asked.

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms

Lanny Irwan Ten Centuries Points: 1020 More actions · Answer 1

Hi David,

Create stored procedure that checks whether the primary key and do the update or insert to the table. Then in the DTS designer, you import the text file into temp table then call your stored procedure. It will be different story, if you have the ETL tools software such as Data Mirror or Informatica.

Hopefully it answers your question and it may not be the only solution.

Cheers

Steve Jones - SSC Editor SSC Guru Points: 728159 More actions · Answer 2

you might be able to do this with a data pump, but I'd follow Lanny's advice. It's easy and quick. Import into a staging table and set up a proc for the last step.

DTS:

- truncate staging table

- import data

- call proc

The proc should do an update on PK matches and then an insert of what doesn't match. Or delete the matches after the update and insert everything else.

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

Greg Charles SSC-Forever Points: 45403 More actions · Answer 3

Greg Charles

SSC-Forever

Points: 45403

May 29, 2007 at 9:44 am

#709406

A Data Driven Query task will do what you want, but it's slow because it processes the source data row by row. I prefer the staging table/stored procedure approach mentioned above and have used it many times.

Greg

Ben Sullins-437405 SSCertifiable Points: 5397 More actions · Answer 4

If the nightly extract contains all the information you need in the table rather then writing proc's for applying all the changes (Insert, Update, Delete) just wipe out the table and insert all the new data. This way any changes that needed to be applied are done in one step.

Cheers,

Ben Sullins
bensullins.com
Beer is my primary key...

David-250683 Hall of Fame Points: 3452 More actions · Answer 5

David-250683

Hall of Fame

Points: 3452

May 29, 2007 at 1:34 pm

#709464

Thanks for the help. Can you provide any how-to steps?

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 6

Add an Identity column and delete the dupes when the import is done... much faster than trying to build an "UPSERT" (merge) in SQL Server 2000.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

David-250683 Hall of Fame Points: 3452 More actions · Answer 7

I can delete the duplicate records, but what I need to do is delete the current record in favor of the imported record when the primary field already exists in the table. The imported record is not necessarily a duplicate; the import may just result in a 2nd record with a duplicate primary field with the remaining fields different. Perhaps I could add an IDENITY column and simply delete a record where the primary field of the current table equals the primary field of the record to be imported and where the IDENTITY column is less than that of the incoming record.

However, the imported data does not have an IDENTITY column and is originating from a non-SQL Server database. So would this then have to be a two part process – 1^st import the data and then delete the older record? Or can this all be incorporated into DTS or one script?

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 8

The best answer was the three-stage DTS procedure. The proc would actually be simple -- an update and an insert statement.

Let's say your SQL Server table has an identity column and four data columns (ID, C1, C2, C3, C4) and the staging table just has the data columns. Also, that C1 and C2 form the natural key -- these fields identify a duplicate row. So the proc would first update the non-key fields from duplicates:

update dbo.OriginalTable 
set C3 = ST.C3, C4 = ST.C4
from dbo.OriginalTable as OT 
  inner join dbo.StagingTable as ST
    on OT.C1 = ST.C1 and OT.C2 = ST.C2;

Then insert those rows from the staging table that are not duplicates:

insert into OriginalTable (C1, C2, C3, C4)
select C1, C2, C3, C4
from StagingTable as ST
where not exist (
    select *
    from OriginalTable as OT
    where OT.C1 = ST .C1 and OT.C2 = ST.C2);

That's it. Of course, it would help to index the key field(s) of both tables.

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 9

Tomm's code for an upsert is good... think I'd do it in a stored procedure rather than DTS, but it'll do the trick either way.

I still use BULK INSERT and a dupe delete, though... speed freak, I guess.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

David-250683 Hall of Fame Points: 3452 More actions · Answer 10

David-250683

Hall of Fame

Points: 3452

May 30, 2007 at 7:38 pm

#709873

Thanks for all the input. I think I got it! Thanks again everyone.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 11

What did you end up with, David?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

David-250683 Hall of Fame Points: 3452 More actions · Answer 12

I'm just gonna let the nightly import via DTS run and then set up a job to check for and delete the older record. This is the stored procedure that will run after the import.

CREATE PROCEDURE nwDuplicateCheckerAll_Title

AS

SET NOCOUNT ON

BEGIN TRANSACTION DuplicateChecker

DELETE A

FROM All_Title A

INNER JOIN All_Title B

ON A.KeyID = B.KeyID

AND A.[DateAdd] = B.[DateAdd]

WHERE A.LastColumn < B.LastColumn

COMMIT TRANSACTION DuplicateChecker

SET NOCOUNT OFF

I'm treating the KeyID and DateAdd as a composite primary key. I added an identity column - "LastColumn." How does it look?

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 13

Tomm Carr

SSCertifiable

Points: 5426

June 1, 2007 at 10:39 am

#710396

So, if I read this correctly, you are inserting all the records into the table, then going through and deleting the older duplicates. This seems straightforward but I have a few questions.