Bit of a noob with some expereince needs help with duplication

  • I have a problem. Every day my SQL imports a flat file into a table. Everyday that flat file is updated with new data but the problem is the new data is appended tothe bottom. The data already imported from the previous days before are still in the same flat file.

    There i sno way I can get this flat file to only show new data so how can I regularly import data from this flat file but ignore the lines already imported.

    Attemptign to do this on SQL 2005 right now and using DTS.

  • May I suggest you this..

    After importing the data from your flat file, can you move the flat file to a new location (you may need it if necessary) and at the actual location create a empty file.

    Every day, the data would be appended into this file (which contains nothing)..

    Only a work around...


    Bru Medishetty

    Blog -- LearnSQLWithBru

    Join on Facebook Page Facebook.comLearnSQLWithBru

    Twitter -- BruMedishetty

  • Nope, not an option unfortunately, the service populating this file wont allow it.

  • I never import to the final table... I always import to a "staging" table and then use some "upsert" or "merge" code I've written to both pre-validate the data and update the final table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I'm a bit confused as you use SQL 2005 and DTS in one sentence. Hope you mean SSIS?

    Personally I would solve it as Jeff describes, but if creating the staging table is not an option for you, you can do it on the fly and in memory either using a lookup (well suited for smaller tables) or a merge join followed by a conditional split (will also work on larger tables but requires that your data is sorted!

    Regards,

    Willem
    http://wschampheleer.wordpress.com[/url]

  • Actually using a staging table is an option. problem is im such a newbie all the ways you suggested to remove the duplicates etc are chinese to me. Sorry for being such a noob.

  • Ok - let's see if this helps.

    Create a staging table (similar to your final table).

    Load it with the flat file using a data flow task.

    Use an Execute SQL Task to move the data from the staging table to the final table using either a merge statement (requires SQL 2008) or an insert statement. You can copy/paste the statement directly into the task or create a stored procedure.

    Lookup merge in BOL. Baiscally you will do something like this:

    merge into final table t

    using staging table s

    on (t.col1 = s.col1 and t.col2 = s.col2 etc.)

    when not matched then insert statement ;

    If you can't use merge, you'll have to do something like this

    insert into final table

    (col1, col2 etc.)

    select col1, col2 etc.

    from staging table s

    left join final table t

    on t.col1 = s.col1 and t.col2 = s.col2 etc.

    where t.col1 is null ;

    One caveat for both statements: be careful with columns that are nullable; you'll have to take care of that explicitly in the comparison! The reason is that NULL = NULL is NULL instead of TRUE!

    You can either solve this by using something like isnull(t.col1,'') = isnull(s.col1,'') (will vary with the datatype)

    or (my preferred approach)

    (t.col1 = s.col1 or (t.col1 is null and s.col1 is null)

    Regards,

    Willem
    http://wschampheleer.wordpress.com[/url]

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply