Archiving to history without creating duplicate rows

  • Hi All:

    I'm attempting to create a job that will archive online records to a history table. This is the sql that I've come up with:

    update daily_transactions_hist

    select * from daily_transactions

    where transactiondate < dateadd(d, -360, getdate())
    -- and daily_transactions.transactionid <> daily_transactions_hist.transactionid

    Notice that last line is commented out because it does not work. The executable lines work fine. I was looking for a way to prevent creating duplicates in the history table. This norally should not be the case; however, there might be times that some of the data will be need to be moved back to online to research some accounts.

    BTW, the schemas are exactly alike, except for the transactionID which is defined as integer identity PK on the online table.

    Thanks for assistance on this

    William

  • Are you using "UPDATE" to archive ???


    * Noel

  • Hi Nole

    My error, I copied the wrong SQL. The SQL is:

    select * into daily_transactions_hist

    from from daily_transactions

    where transactiondate < dateadd(d, -360, getdate())

    -- and daily_transactions.transactionid daily_transactions_hist.transactionid

    Thanks

    William

  • Your query is not going to work cos this is how it needs to be done.

    Try something like this.

    --Update Existing (If any Data has Changed)

    update daily_transactions_hist set  daily_transactions_hist.Col1 = DLY.Col1,

         daily_transactions_hist.Col2 = DLY.Col2,

         daily_transactions_hist.Col3 = DLY.Col3...

    from daily_transactions

    where transactiondate < dateadd(d, -360, getdate()) --I am assuming u have Index on this Column

    and daily_transactions.PKkeyColumn = daily_transactions_hist.PKkeyColumn

    and (

     isnull(daily_transactions_hist.Col1,'') <> isnull(DLY.Col1,'') OR

     isnull(daily_transactions_hist.Col2,'') <> isnull(DLY.Col2,'') OR...

     ..)

    --Insert Non Existing

    Insert into daily_transactions_hist(PKkeyColumn,Col1,Col2,Col3...)

     select PKkeyColumn,Col1,Col2,Col3... from daily_transactions

     where transactiondate < dateadd(d, -360, getdate())

     and not exists (Select 1 from daily_transactions_hist where daily_transactions.PKkeyColumn = daily_transactions_hist.PKkeyColumn)

     

  • Thanks Sreejith,

    I'll give this a try and let you all know how it works

    William

  • Yep... and keep in mind that SELECT/INTO only works once on any given target table unless you drop the table... but that would defeat the archive, eh

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • ...and SELECT ... INTO ... should never be used, for major concurrency issues, if you are not using TOP 0.

  • If you have a unique index on the history table (you should already as part of good design, I know sometimes it's not possible)  with the [ignore duplicates] option on, then a one line INSERT command will work. 

    You can get more sophisticated with the use of a trigger on the history table.

  • I wouldn't say "never"... they're pretty handy when making a temp table and, since no one but the current session can use the temp table, there's no concurrency issue to worry about. 

    Do you have a link to an article that explains the "TOP 0" thing you're talking about?  Haven't heard about that one, yet.  Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I would say never

    Consider this, what happens when you create a table? The best outcome is that SQL Server just places eXclusive KEY locks on certain tables (ex: sysobjects, sysindexes and syscolumns).

    What happens if SQL Server places eXclusive locks in a transaction? Well, SQL Server maintains these locks until the transaction is either rolled back or committed.

    Imagine:

    BEGIN TRANSACTION
    SELECT someColumns INTO someTable FROM someOtherTable
    (errorhandling)
    COMMIT TRANSACTION

    Imagine that this select lasts, say, 1.5 hours (might be the case in the example of Data Archiveing). During this time you have eXclusive locks on system tables.

    Imagine that now any application/user (whatever) wants to retrieve something?

    Not a too far fetched example: SELECT * FROM INFORMATION_SCHEMA.TABLES

    What happens?

    Concurrency problems. You have lock(s) in the way and (if you are lucky) you have just one user blocked for 1.5 hour that needs to do some work. Of course, if the user needed to be blocked this is correct but does he need to be blocked because you are creating a table implicitly?

    Okay, so TOP 0? Well, outside your transaction, do a SELECT TOP 0 ... INTO ... FROM ... This will give you your table (empty) with the same data-types as the columns in the original table without you having to do a CREATE TABLE (which was why people keep on using ... INTO ... in the first place, right?). Now you can happily use

    BEGIN TRANSACTION
    INSERT INTO someTable SELECT someColumns FROM someOtherTable
    (errorhandling)
    COMMIT TRANSACTION
     

    without sacrificing concurrency nor the extra seconds to make a correct CREATE TABLE.

    I hope at least someone managed to read this far and starts decreasing the unnecessary locks in their database...

    Happy SQLling,

    Hanslindgren!

  • P.S Even when creating temporary tables you lock the system tables. I.e. even though your table isn't visible to anyone else you still create concurrency issues.

    Hanslindgren

  • Why on Earth would you put the creation of a temp table in an explicit transaction?  And, so far as exclusively locking the system tables when you use SELECT/INTO to create a temp table, that hasn't been true since SQL Server 6.5...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • If you ask yourself that question then you haven´t met many developers...  And it is still true that SELECT/INTO creates eXclusive locks, try sp_lock and see for yourself. It places a X keylock in those system tables to guarantee the ACID properties (albeit relaxed for tempdb but some are still in effect). Think especially about the part about Isolation.

    But the orignal post was not about # tables it was about normal tables.

    And of course you might want to place S/I in an exclusive transaction to guarantee that potential side-effects are COMMITTED or ROLLED BACK...

  • First, SELECT/INTO does not lock system tables during a SELECT/INTO with no explicit transaction... (at least, none that I can detect)...

    Test setup:

    First, build yourself a nice medium sized test table... say, 10 million rows ... takes a little over 7 minutes to run on my non-server quality box...

    --===== Make a 10 million row test table with random info in it

     SELECT TOP 10000000

            IDENTITY(INT,1,1) AS RowNum,

            CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT) AS UserID,

            CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

          + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)) AS SomeValue,

            'A column for kicks' AS Kicks,

            'Still another column just for proofing' AS StillAnother,

            CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY) AS SomeNumber,

            CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME) AS ADate --(>=01/01/2000  <01/01/2010)

       INTO dbo.MediumTest

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2

    By the way, if you run sp_lock 60 seconds into the run above (I used WAITFOR TIME to start both for test repeatability), there is absolutely no evidence of any exclusive locking on any system table what-so-ever... here's the proof (output from sp_lock)...

    spid

    dbid

    ObjId

    IndId

    Type

    Resource

    Mode

    Status

    My Notes

    51

    4

    0

    0

    DB

    S

    GRANT

    52

    4

    0

    0

    DB

    S

    GRANT

    53

    7

    0

    0

    DB

    S

    GRANT

    55

    1

    85575343

    0

    TAB

    IS

    GRANT

    --Master.dbo.spt_values

    56

    7

    0

    0

    DB

    S

    GRANT

    57

    1

    3

    2

    KEY

    (da016388857b)

    S

    GRANT

    57

    7

    158063749

    0

    TAB

    X

    GRANT

    --Target table of SELECT/INTO

    57

    7

    0

    0

    DB

    [BULK-OP-DB]

    NULL

    GRANT

    57

    7

    0

    0

    DB

    [BULK-OP-LOG]

    NULL

    GRANT

    57

    7

    0

    0

    DB

    S

    GRANT

    Test Code:

    SELECT * INTO SelectIntoTest FROM MediumTest

    I ran the test code (takes about 00:02:10 to run) and sp_lock using WAITFOR TIME where sp_lock was setup to run 60 seconds after the test code start... here's the results from sp_lock... I don't see any locks on the system tables...

    spid

    dbid

    ObjId

    IndId

    Type

    Resource

    Mode

    Status

    My Notes

    51

    4

    0

    0

    DB

    S

    GRANT

    52

    4

    0

    0

    DB

    S

    GRANT

    53

    7

    0

    0

    DB

    [BULK-OP-LOG]

    NULL

    GRANT

    53

    7

    0

    0

    DB

    S

    GRANT

    53

    7

    0

    0

    DB

    [BULK-OP-DB]

    NULL

    GRANT

    53

    7

    158063749

    0

    PAG

    1:834505

    S

    GRANT

    --Source table of SELECT/INTO

    53

    7

    190063863

    0

    TAB

    X

    GRANT

    --Target table of SELECT/INTO

    53

    7

    158063749

    0

    TAB

    IS

    GRANT

    --Source table of SELECT/INTO

    55

    1

    85575343

    0

    TAB

    IS

    GRANT

    --Master.dbo.spt_values

    56

    7

    0

    0

    DB

    S

    GRANT

    57

    7

    0

    0

    DB

    S

    GRANT

    OK... let's do it your way... using a declared transaction.  You are partially correct on SELECT/INTO creating locks on the system tables if the SELECT/INTO is contained in a transaction...

    Test Code:

    BEGIN TRANSACTION

     SELECT * INTO SelectIntoTest FROM MediumTest

    COMMIT

    I say you are partially correct because it, in fact, does not lock the system tables exclusively... only some keys...

    spid

    dbid

    ObjId

    IndId

    Type

    Resource

    Mode

    Status

    My Notes

    7

    7

    0

    0

    DB

    S

    GRANT

    7

    7

    3

    0

    TAB

    IX

    GRANT

    --SysColumns

    7

    7

    3

    1

    KEY

    (6a0021d28c8c)

    U

    WAIT

    --SysColumns

    51

    4

    0

    0

    DB

    S

    GRANT

    52

    4

    0

    0

    DB

    S

    GRANT

    53

    7

    0

    0

    DB

    [BULK-OP-LOG]

    NULL

    GRANT

    53

    7

    0

    0

    DB

    S

    GRANT

    53

    7

    0

    0

    DB

    [BULK-OP-DB]

    NULL

    GRANT

    53

    7

    1

    0

    TAB

    IX

    GRANT

    --SysObjects

    53

    7

    3

    0

    TAB

    IX

    GRANT

    --SysColumns

    53

    7

    2

    0

    TAB

    IX

    GRANT

    --SysIndexes

    53

    7

    3

    2

    KEY

    (8a010cde9fa6)

    X

    GRANT

    --SysColumns

    53

    7

    3

    2

    KEY

    (a701eff1c16a)

    X

    GRANT

    --SysColumns

    53

    7

    3

    1

    KEY

    (6c00aa1a8526)

    X

    GRANT

    --SysColumns

    53

    7

    3

    1

    KEY

    (6f0098ea5b11)

    X

    GRANT

    --SysColumns

    53

    7

    3

    1

    KEY

    (6e007645ee03)

    X

    GRANT

    --SysColumns

    53

    7

    3

    2

    KEY

    (70017d01dc33)

    X

    GRANT

    --SysColumns

    53

    7

    1

    1

    KEY

    (69005bc5d8d5)

    X

    GRANT

    --SysObjects

    53

    7

    3

    2

    KEY

    (d80268bbf092)

    X

    GRANT

    --SysColumns

    53

    7

    0

    0

    PAG

    0.367361111

    X

    GRANT

    53

    7

    3

    1

    KEY

    (7000fd8de7a9)

    X

    GRANT

    --SysColumns

    53

    7

    3

    1

    KEY

    (6b00cf7d399e)

    X

    GRANT

    --SysColumns

    53

    7

    3

    1

    KEY

    (6d00132252bb)

    X

    GRANT

    --SysColumns

    53

    7

    0

    0

    PAG

    1:19589

    X

    GRANT

    53

    7

    0

    0

    IDX

    IDX: 7:222063977

    X

    GRANT

    53

    7

    222063977

    0

    TAB

    Sch-M

    GRANT

    --Target table of SELECT/INTO

    53

    7

    158063749

    0

    TAB

    IS

    GRANT

    --Source table of SELECT/INTO

    53

    7

    3

    2

    KEY

    (5002df2c9849)

    X

    GRANT

    --SysColumns

    53

    7

    3

    2

    KEY

    -4.90269E+11

    X

    GRANT

    --SysColumns

    53

    7

    1

    3

    KEY

    (69002ec5be91)

    X

    GRANT

    53

    7

    3

    2

    KEY

    (6c01d7cba559)

    X

    GRANT

    --SysColumns

    53

    7

    2

    1

    KEY

    (6900786741b2)

    X

    GRANT

    --SysIndexes

    53

    7

    1

    2

    KEY

    (a7027cd184db)

    X

    GRANT

    --SysObjects

    53

    7

    158063749

    0

    PAG

    1:823741

    S

    GRANT

    --Source table of SELECT/INTO

    53

    7

    3

    1

    KEY

    (6a0021d28c8c)

    X

    GRANT

    --SysColumns

    55

    1

    85575343

    0

    TAB

    IS

    GRANT

    --Master.dbo.spt_values

    55

    7

    0

    0

    DB

    S

    GRANT

    56

    7

    0

    0

    DB

    S

    GRANT

    57

    7

    0

    0

    DB

    S

    GRANT

    Now, since those locks include exclusive locks on some keys of SysObjects, you shouldn't be able to make a new table while the SELECT/INTO is running if what you say is correct... but, if you try it, you'll find that you CAN make and select from another table in the presence of those locks!!!  The system is smarter and more flexible than you think!  Try it!!!

    So, saying that you should "never" use SELECT/INTO is absolutely wrong because the locks you talked about just don't matter.  ITS OK TO USE SELECT/INTO ANYTIME AND WITHOUT FEAR!!!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • P.s  Nice trick with the TOP 0 thing... most folks use WHERE 1=0 to do the same thing.  TOP 0 seems more obvious and I like it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply