Auto-Reporting all constraint violations in an import

  • I'd like to get a sanity check on this, and any comments regarding 'been there done that watch out for x'.

    Objective:

    Must support legacy business processes.  Need to be able to report ALL constraint violations within a repetitive mass insert/update at once so that they can be submitted in a report to multiple PoCs for correction.

    1. LegacyTable:

    1. No constraints
    2. Manual validation of business/data rules leaves some still broken
    3. An export from the SQL version of the data will be used to snapshot out copies of the old-style table for the legacy FoxPro apps to continue using during the first few phases of the full solution implementation
    4. the constraints are very likely to change, as well as the DB schema to a lesser degree
    5. current business process that must be supported involves getting a price list from accounting in excel, saving it in a 'standard' foxpro .dbf, sending the data over to the office in the UK for approval, then importing as is into the products/pricing table.
    6. Implementing actual constraints into the schema means that the import, upon violation of a any constraint, will blow up and report back only ONE violation, despite the fact that there will almost surely be MANY records with each one violating MANY constraints. 

    Where I'm at now:

    I don't want to force maint of TWO sets of constraints, foreign keys, etc. (one set in the schema, another in an import validation procedure).  I don't want to use triggers for validation instead of constraints.

    dbo.fnHelpConstraint(@TableName)

     table function from a mutilated version of sp_helpconstraint that returns customized data

     returned data includes 'ValidationQuery' field containing a pre-built select statement

    UTIL_ReportConstraintViolations sp

      @ConstraintTable nvarchar(128)

     ,@ErrorTableToCreate_WithOverwrite nvarchar(128)

     ,@ViolationsOccured int output

    I intend to put a before trigger on the insert/update into the products table.  Inside it, I'll need to copy the entire contents of 'inserted' into a table that can be seen outside the trigger, then call UTIL_ReportConstraintViolations to generate a 'static' dataset of all constraint violations for reporting.

    I am still a bit shaky on the trigger/error/begintran/endtran/rollback stuff.  but..

    import process

     wipe report table

     set @Trancount = @@Trancount

     begin tran

     insert new records (before trigger fires)

       Build an out-of-scope visible table from 'inserted'

       using fnHelpConstraint, iterate through the schema-driven validation queries and fill the report table with violations

       if violations did not occur

        insert into products table from inserted

     update existing records (before trigger fires)

       Build an out-of-scope visible table from 'inserted'

       using fnHelpConstraint, iterate through the schema-driven validation queries and fill the report table with violations

       if violations did not occur

        insert into products table from inserted

     if @@trancount <> @Trancount + 1

      commit tran

     if @TranCount <> @Trancount

      warn of error

     if @violationsoccured

      warn of insert and update issues seperately

      advise calling report procedure for correction of issues

     

    Thoughts?

     

    Background:

    I work for a company (8 years) that has horrible business processes (but they do work).  There are no business process analysts, per se.  Any time a business process change (even approved, correct, and well thought out) becomes part of an IT project, the solution ends up in either disuse or misuse.  I am in the process of migrating off of a FoxPro table solution that is being used on a WAN, to SQL.

    It's a complex line to walk, but typically it seems that as long as the legacy-replacement project supports the current business processes, and meets the required functionality, it'll get used.  The trick is to not create another monster... whee.

    There are two products tables that have evolved on their own (i.e. are similar).  The pricing, etc. for ~20 country 'categories' (legal considerations etc) is created by a changing conglomerate cluster of folks scattered across this office, the UK office, and the NZ office (which handles Japanese orders only).  Then they dump the imports and change requests on an old-timer here, who just manually goes out and does most of the imports and changes.  He does have a specific format he accepts data for import/replacement in when they do a full published price list update though.  There is also an app where non-IT people can update, add, delete individual items.  My goal is to support the current business process of price changes.

    If I make it too difficult to do the mass imports, the stack of cards that centers around the old-timer doing his updates will all fall down, and I'll get sucked into the business process.  I do NOT, however, want a SQL schema that has no constraints or a bad design - it'll make any front ends using the products data error-prone.

  • My thoughts are that if you are trying to import data that is either delimited, fixed field, or a combo of both, you should probably take the time to use BCP.  Yeah, I know... a bit cryptic but well worth the effort.  Will create runtime log files, error log files, and put "bad records" in a holding file.  And of the three methods of importing data (the other two are Bulk Insert and DTS), it is the hands down winner.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Won't BCP only record single constraint violations?  I've only used it once, and just did a simple dump OUT.  I should read up on it.

    Thanks

  • It won't actually report what constraint was violated but it will reject any record to the "error log" that either violates a constraint, key, foreign key, data-type, or column width.  Using the -m parameter, you can set it to allow, say, 20 such errors (actually, any number of records that have any error).  If more records than the number you set have errors, the BCP run will shut down.

    Provided that the same number of delimiters occur in the header as what are identified in the Format file, you can even tell it to skip the header without writing a separate program. If a record has the wrong number of delimiters or is missing the "end of record" mark, BCP may take out the bad record and the record that follows until it gets "back in synch" (which would be true of nearly any import except those that read a whole record first).

    The format file can be a real pain to get right but it's well worth it.  You can even tell BCP to skip unwanted fields by formatting the format file correctly.  I've even done doubled quoted comma seperated values with it.

    p.s. I totally agree with your decision not to use triggers.  About the only time I use them is to guarantee referential integrity across remote servers.  Even then, if other forms of "RI" worked, I wouldn't use triggers. 

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • After testing, BCP is fast (no suprise there), and it does give the error log nicely.  I'll likely use it over DTS (it has it's uses but oof I can't stand those wierd no-info errors it gives) for importing history data when it's needed.

    This time however, because there is an explicit ongoing small-import process to update price and VAT changes for the UK office, I gave it a no-constraint mirror of the target table and am going to use that function I wrote to spit out the constraint violation report.  The table can also be used to store the data once validated so that it can be pushed live on a schedule (another need).  Using this method I don't have to tinker with the before triggers, etc.

    The UK office publishes price sheets, where the amounts must match exactly.  Once they print them, it costs a lot of money to throw a bunch away due to a small error.  If it wasn't for there being failed projects in the past by others that have attempted to implement business process changes (i.e. the published price sheet should in fact be a report that gets printed out, not the other way around), then I'd approach it differently.

  • Cmore,

    Here's one more little "secret" of mine ... I never use BCP to import to a "live" table .  I always use it to populate a "staging" table and then do any extra validation I need in the staging table before I move it (usually just an INSERT).  The staging table can have all sorts of extra columns to store pertinent "values" for repeated testing on each batch of records imported.  Cleanup is a snap... I just TRUNCATE the staging table (starts IDs at "1" again which can be a huge help! ) which also keeps a bazillion transactions out of the LDF (log) file of the database.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply