Trigger to capture changes into an audit table

  • Following is my table definition

    CREATE TABLE dbo.Employee (

    EmployeeID INT NOT NULL,

    EmpFirstName VARCHAR (50) NOT NULL,

    EmpMiddleName VARCHAR (50) NOT NULL,

    EmpLastName VARCHAR (50) NOT NULL,

    EmpDOB DATETIME NOT NULL,

    EmpSSN VARCHAR(11) NULL,

    DateCreated DateTime NOT NULL,

    DateModified DateTime NOT NULL

    )

    I also have an audit table to keep track of changes made to any of the columns in my table above. The audit table's definition is

    CREATE TABLE dbo.AuditChanges (

    AuditChangeID INT IDENTITY (1, 1) NOT NULL,

    TableName VARCHAR (100) NOT NULL,

    ColumnName VARCHAR (100) NOT NULL,

    OldValue VARCHAR (2000) NOT NULL,

    NewValue VARCHAR (2000) NOT NULL,

    ChangedTime DATETIME NOT NULL

    )

    I need to track changes made to a handful of tables from my database that are heavily used by the application which has about 50,000 to 80,000 transaction a day. This will translate to around 200,000 to 300,000 changes on some tables.

    I am trying to get it done through triggers. Can someone suggest a best possible generic script to do this on tables? As you can see from the table definition, changes to multiple tables will be logged to the same audit table.

    Given the volume of my transactions, will CDC be a better option? Please advise.

    My CDC implementation may take time, so for short term I am looking at triggers as my quick solution for immediate use. I appreciate your help and inputs.

  • i cannot find it now, but Sean Lange posted an answer to a question that was near identical to this idea, and noted all the grievous, serious problems that were inherent in this design.

    definitely CDC is the only way to go, especially given that you need old values vs new values; given that you'd essentially be unpivoting every updated row' columns to new rows in your table, you'd hav eto pivot them back to visualize them again, so the ability to try and reconstruct that rows that "bob" changed on January 30th at 13:31 would be a hellishly difficult task, that CDC does automatically.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I believe this is the post that Lowell was referring to. http://qa.sqlservercentral.com/Forums/FindPost1536816.aspx

    Again...run away from this type of approach as absolutely fast as you can. Auditing doesn't do any good if you can't retrieve point in time values.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • For such a small table as the given Employee table, I agree. If not CDC, then use full row auditing instead.

    However, for very wide tables, this method is a real disk saver. For example, I have a legacy table (I didn't design it) that's 140 columns wide. Only 3 or 4 columns are typically updated over the lifetime of the row. Writing full row auditing for this table would be insane.

    I've also seen that many people have had problems with CDC although I can neither confirm nor deny said problems simply because I've not tried it yet. Do you good folks have a favorite article that explains the pro's and con's of CDC or is Books Online the best bet?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Lowell/Sean/Jeff,

    Thanks for your valuable inputs. The table I mentioned above is just a fake one to simulate my requirement. My actual table columns range anywhere between 20 to 80, with updates (up to 300K per day) as indicated in my original post.

    Where does "Change Tracking" option fits into the mix? Can you please throw some light on it as compared to CDC? What are the pros and cons of these two options?

    Your service to SQL community is highly laudable. I am one of the many beneficiaries, and I truly appreciate it.

    Regards,

    RexHelios

  • RexHelios (2/3/2014)


    Following is my table definition

    CREATE TABLE dbo.Employee (

    EmployeeID INT NOT NULL,

    EmpFirstName VARCHAR (50) NOT NULL,

    EmpMiddleName VARCHAR (50) NOT NULL,

    EmpLastName VARCHAR (50) NOT NULL,

    EmpDOB DATETIME NOT NULL,

    EmpSSN VARCHAR(11) NULL,

    DateCreated DateTime NOT NULL,

    DateModified DateTime NOT NULL

    )

    I also have an audit table to keep track of changes made to any of the columns in my table above. The audit table's definition is

    CREATE TABLE dbo.AuditChanges (

    AuditChangeID INT IDENTITY (1, 1) NOT NULL,

    TableName VARCHAR (100) NOT NULL,

    ColumnName VARCHAR (100) NOT NULL,

    OldValue VARCHAR (2000) NOT NULL,

    NewValue VARCHAR (2000) NOT NULL,

    ChangedTime DATETIME NOT NULL

    )

    I need to track changes made to a handful of tables from my database that are heavily used by the application which has about 50,000 to 80,000 transaction a day. This will translate to around 200,000 to 300,000 changes on some tables.

    I am trying to get it done through triggers. Can someone suggest a best possible generic script to do this on tables? As you can see from the table definition, changes to multiple tables will be logged to the same audit table.

    Given the volume of my transactions, will CDC be a better option? Please advise.

    My CDC implementation may take time, so for short term I am looking at triggers as my quick solution for immediate use. I appreciate your help and inputs.

    I'm sure that I'll, once again, be labeled as a Luddite but I just read the overview for CDC and the definitions of the tables involved as well as the stored procedures, functions, and jobs associated with CDC and I can tell you that if someone told me that I had to use CDC, for a thousand reasons I'd quit my job.

    First of all, I think it's insane to audit anything about inserts except for when they occurred and by whom. Think about it. Let's say you had a database that only had inserts made against it, the database would more than double in size if the tables were CDC audited because it captures the whole row. Why would anyone do that? If nothing ever changed, then the original insert is in the original target table. As for deletes, only the original information that has never been updated should be captured if inserts aren't captured (and they shouldn't be as previously discussed).

    If you really look things over, if a table is only ever inserted into, it will take more than twice the disk storage to save and audit the data. That will also affect backup and restore times. If you do an average of just two updates to each row, your data storage requirements will more than quadruple in size using CDC or whole row auditing on large tables. For very narrow tables, whole row auditing can be more efficient but, for crying out load, don't store both the deleted and inserted versions during updates. Only store the deleted versions of updates.

    Further and just like any trigger, CDC does not auto-magically add columns to capture tables. That's a manual process just like any trigger except for so called "generic triggers" (which have a HUGE performance problem... more on that in a minute).

    To add aggrevation to the whole process, you have to use special functions to find rows in a date range because the bloody capture tables only contain LSNs instead of "occurred at" dates and times. They also talk of log contention and I certainly don't need that on my high volume databases.

    Further, the tables being captured are referred to by Object_ID rather than by name. Add a column to a table in the wrong spot using SSMS and the ID changes because of the behind the scenes work of copying the data to a new table and dropping the old. POOF! Your audit history is gone with it.

    Then there's the table bloat that occurs when updating, say, just 4 columns in a 140 column wide table.

    There's more but those points were enough for me to say "probably never" to CDC.

    The ONLY advantage I see for CDC over triggers is the automatic inclusion of BLOBS which triggers can't handle directly (although there are some simple workarounds for that). Even then, you have to remember to make a change to a replication setting or your BLOBs will be truncated at 64K bytes. Whoops!

    As for so called "generic triggers", the HUGE (and I mean SERIOUSLY HUGE) disadvantage is that both the INSERTED and DELETED tables are examined for column names. The problem is that the scope of those tables is incredibly narrow and can't even be seen in a spot of dynamic SQL. That means that both trigger tables must be copied to some form of temporary storage, which takes a huge amount of time and resources. One of the absolute worst generic triggers I've ever seen populates a FLAT (Field Level Audit Table) much like the one in the original post on this thread. It's not the FLAT that's bad... it's the generic triggers that most build or use that's bad. We had a CLR (which someone got from an article right here on SSC) trigger that they put on some of our wider tables because they didn't want to take the time to write a correct trigger (140 columns is an awful lot of code to write for... more on that in a minute, as well). It took 4 minutes to audit a 4 column modification for a lousy 10,000 rows. After I changed it to a proper trigger, that same excursion took only 800ms.

    As for the FLAT table in the original post that everyone is bad mouthing, I'll say "It Depends". If you have wide tables were only a couple of columns are update, then the FLAT works great and saves a huge amount of disk space. Yes, they can be difficult to query for the point-in-time status of a given row from a table but that's not such a difficult problem to overcome. Thank goodness for a couple of very high speed methods of doing "data smears" to make it all easier.

    As for the columns in that FLAT, you need to add a column to identify if the action was an INSERT, UPDATE, or DELETE. I strongly recommend that column should be only a CHAR(1) and you use "I", "U", or "D" to describe the action for simplicity sake.

    As for the OldValue and NewValue columns, save yourself some major headaches. Lose the VARCHAR datatype and use SQL_Variant, instead. That will auto-magically capture what the datatype of the data is as well as the data. that makes possible joins a whole lot easier and the triggers will be much faster because you don't have to do any explicit conversions and you can also interogate the datatype if needed (and that's a rare need indeed).

    As for the triggers themselves, like I said, forget about any form of "generic" or "self healing" trigger. They will make your code run dog damned slow. Instead, write some code to build the proper trigger with separate column interrogation. It's not as difficult as it would seem. If you add or delete a column, just rerun the code to rebuild the trigger and, yes, the code to rebuild the trigger CAN be generic. If you change the datatype of a column, no worries. The SQL_Variant will take it all in stride without you ever even noticing.

    I've written code to build the generic triggers. It's customized to add some columns that you'd never need (associated Loan_ID, for example). I've got a whole lot on my plate in the next two weeks but I'll see if I can do a quick mod to the code and try to remember to post it here.

    A word of warning and very strong friendly advice. Start reading up on how to partition tables using either Partitioned Tables or Partitioned Views and using 1 File/FileGroup combination per month. All audit tables, whole row or FLAT, get really big in very short order. I also recommend you build such audit tables in a separate database and point synonyms at them. That way you can set files (1 monthly partition per file) to Read-Only without having to set your main database to "Single User" because, no matter how brief that is, it will kill ALL other connections to the database.

    And repeat after me... "I SHALL NOT AUDIT ANY DATA DURING INSERTS OTHER THAN WHEN THEY OCCURRED!!!".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply