Flat File Source Validation based on data.

  • I have a peculiar requirement - I have a flat file with the format.

    COLUMN_NAME,TABLE_NAME,RULE_NO -- Header of FF

    EmpNo,EMPLOYEE,6

    EmpDOB,EMPLOYEE,3

    EmpAge,EMPLOYEE,2

    EmpSalary,EMPLOYEE,3

    Address,EMPLOYEE,7

    -- whereas these 5 rows are Employee Table data

    ID,HR,1

    EmpName,HR,4

    -- these are two rows of HR table

    Concepts or Assumption:

    Employee and HR are two different tables that already exists in DB. They are unique in nature and have no relation between them. Rule No is a column with integer data type. Based on the values of these we have perform anonymization of these columns.

    Now let us assume that EmpNo, EmpDOB and Address are errorneous. Since Address column does not exists in Employee table. The rule numbers are not valid for EmpNo and EmpDOB. (EmpDOB (datetime) and EmpNo (INT) have rule 3 and rule 6 respectively, we cannot apply these rules as 3 is for integer data type and rule no 6 is for character type column - whereas EmpNo is number).

    Data in FF for HR table is perfect and good.

    So the requirement is like this.

    I need to load the HR tables data (2 rows) . I need not load the Employee table data since there is at least one row of bad data. So I should load two rows in the OLE DB destination table(HR table alone).

    How to achieve this ?

  • You could skip data rows until you encounter "EmpName,HR,4" record, insert into HR table

  • In a Data Flow Task use a Flat File Source to read the file and use a Conditional Split Transformation to create an output stream for HR (e.g. RULE_NO == 4).

    On that output stream load the data into your destination table. Ignore the other (default) output stream if you do not need to process it in any way.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply