Importing Poorly Formatted Text File to SQL

  • mxy (1/30/2016)


    I have received file from other team,i will check with them, how it was generated.

    About duplicate columns names, even though column name is repeated value is unique and it can be n number of times per user.

    Yes, we assumed that on the duplicate column names. The question is, what do YOU want to do with them in the output?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • O/p should be in comma separated

    job-title

    Admin, approve_perm

  • I'm glad you guys jumped on this, complete with quirky update and everything. I wish I'd had time to address is properly earlier today. Like Jeff said, this kind of stuff is fun.

  • Jeff Moden (1/30/2016)


    Thank you for the post on how to load the SSIS package. I appreciate the time you took. I'll have to pass, though. I was looking forward to seeing an SSIS solution that had no script component.

    I understand. At this point I would as well so I could learn some new SSIS-fu 😛 From where I sit, this requirement has just enough twists and turns to compel me to bring C# to the party no matter how I slice it.

    I've come up with a couple other ways I could do this in SSIS. One has C# in an extremely limited role but alas still needed. If I come up with a pure SSIS solution or the OP decides the Quirky Update isn't really their cup of tea I'm happy to assist with an SSIS solution.

    Note, this is not to say I couldn't just load the file into a table with an identity-column plus the mame-value data and then pickup the solution in T-SQL using the Quirky Update method you showed but simply replacing BULK INSERT with an SSIS raw-file-loader isn't all that interesting.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • OK, attached is an SSIS Package that uses only native SSIS Tasks and Components and T-SQL, i.e. no C# Script is utilized in the SSIS. Files included in the attached zip file:

    1. ActiveDirectoryLog.txt - sample input file (from OP)

    2. ActiveDirectoryLog2_TABLE.sql - DDL to create destination table in tempdb

    3. ImportActiveDirectoryLog2.dtsx - SSIS Package that loads the sample file into the table in tempdb

    4. ActiveDirectoryLog2_SELECT.sql - query to deliver results in required shape

    This turned out to be remarkably simple once I realized the row-delimiter I should have been using all along was \r\n\r\n, not \r\n 😎

    This is the table definition the file is loaded into (also in the attached zip):

    CREATE TABLE dbo.active_directory_profile(

    profile_id int IDENTITY(1,1) NOT NULL,

    profile varchar(8000) NULL

    );

    And this is the query that brings back the desired output (also in the attached zip):

    WITH cte AS (SELECT adp.profile_id,

    LEFT(x.item,CHARINDEX(': ',x.item)-1) AS [name],

    -- requires dbo.GROUP_CONCAT from http://groupconcat.codeplex.com

    dba.dbo.GROUP_CONCAT(SUBSTRING(x.item,CHARINDEX(': ',x.item)+2,8000)) AS [value]

    FROM dbo.active_directory_profile adp

    -- requires dbo.StringSplit_Multi from http://sqlblog.com/blogs/adam_machanic/archive/2009/04/28/sqlclr-string-splitting-part-2-even-faster-even-more-scalable.aspx

    CROSS APPLY dba.dbo.SplitString_Multi(adp.[profile], CHAR(13)+CHAR(10)) x

    GROUP BY adp.profile_id,

    LEFT(x.item,CHARINDEX(': ',x.item)-1))

    SELECT profile_id,

    MAX(CASE WHEN name = 'dn' THEN [value] END) AS dn,

    MAX(CASE WHEN name = 'objectClass' THEN [value] END) AS objectClass,

    MAX(CASE WHEN name = 'userid' THEN [value] END) AS userid,

    MAX(CASE WHEN name = 'companyVal' THEN [value] END) AS companyVal,

    MAX(CASE WHEN name = 'company' THEN [value] END) AS company,

    MAX(CASE WHEN name = 'job-title' THEN [value] END) AS [job-title],

    MAX(CASE WHEN name = 'mail' THEN [value] END) AS mail,

    MAX(CASE WHEN name = 'createdby' THEN [value] END) AS createdby,

    MAX(CASE WHEN name = 'modifiedby' THEN [value] END) AS modifiedby,

    MAX(CASE WHEN name = 'createdTime' THEN [value] END) AS createdTime,

    MAX(CASE WHEN name = 'modifiedTime' THEN [value] END) AS modifiedTime

    FROM cte

    GROUP BY profile_id;

    Results:

    Note that there are two functions you will need to install for my query above to work, a grouped-string-concatenation-function and a string-splitter-function. One is implemented as a SQLCLR Aggregate and the other as a SQLCLR Table-Valued function. The comments show where you can get them. If SQLCLR makes you twist in your seat you can replace the grouped-string-concatenation-function with the technique that leverages FOR XML and the string-splitter with Mr. Moden's T-SQL splitter.

    Edit: fix tabs in code

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply