Trouble importing a large flat file

  • Hello All,

    I have to import a large (1.5 Gb) flat file containing roughly 1.8 M rows. This file has roughly 100 "columns", of these 100 columns I only need roughly 35.

    If I attempt to import using the standard import dialog I get 773,091 rows then the process stops with the:

    "sql server warning 0x8020200f: data flow task 1: there is a partial row at the end of the file." error.

    In looking at the file the # of rows imported does not correspond with the rows in the original text file...making it quite difficult to examine the text data to look for the error.

    Also...as an aside....in an effort to streamline this process I saved the import task as an ssis job. If I run this job the only feedback I get is "Success"....obviously this is not accurate.

    I am looking for any advice on how best to import this file as well as how to troubleshoot files of this magnitude in the future.

    Thanks,

    Loonie M.

  • Typically I've found that there are data problems in files that are not obvious. What I'd suggest is that you look at the package and set the batch size to something small, like 10 rows. Then when it fails, examine to table to see how far it got. Then find that place in the file and examine it.

  • 2 things that I have run into way too often:

    1 - in delimited files, the delimiter can sometimes be in the data, causing the import process to think there are extra columns.

    2 - carriage returns in the data, causing the import process to think there are rows with fewer columns than expected.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • All,

    Thank you for the suggestions. The issue is though...that it doesn't fail until 773,032!! This is a big "chunk" of data to sort through. Plus if I had to guess it appears the first records in the SQL Server table are actually close to the last records in the text file???? Which means....I'm not even sure which end to start from.

    The suggestion of the extra carriage return or delimiter may be the answer....but how do you find one "extra" in a 1.8M line text file??

    Still hoping for more insight.

    Thank you,

    Lonnie

  • Simple... setup a BULK INSERT in T-SQL to read whole lines of data from the file into a scratchpad table. Then do something like the following to find extra delimiters/EOL's...

    SELECT DataColName

    FROM dbo.ScratchPad

    WHERE LEN(DataColName) - REPLACE(DataColName,',','') <> numberofcommasexpectedforeachrow

    The resulting rows will be a prolem in one form or another.

    Yep... it violates just about every rule for indexing but that's not a concern because you want to scan the whole table to find all of the problems.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • As a sidebar, this is where BCP and BULK INSERT really shine... you can tell both to automatically sequester bad rows while still processing the good ones with just a couple of simple switch settings.

    I believe you can do the same thing in SSIS but, since I don't use SSIS, I haven't a clue as to how to do that in SSIS>

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (1/27/2010)


    As a sidebar, this is where BCP and BULK INSERT really shine... you can tell both to automatically sequester bad rows while still processing the good ones with just a couple of simple switch settings.

    I believe you can do the same thing in SSIS but, since I don't use SSIS, I haven't a clue as to how to do that in SSIS>

    Correct, this can be done in SSIS also.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Since I failed at correctly using the bcp sytax to to anything successfully I switched to SSIS. I've designed my package and have set up the connections but I see no place to handle the errors....any more help??

    Thank you,

    Lonnie M

  • I don't have SSIS available and it's been several months since I last used it so excuse me if I'm a bit vague.

    In the dataflow there's 2 outputs available from your data source. One of these is for the error rows.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • If you know the row number causing the failure (773,032), you could try to open the text file in wordpad, then use CTRL+G to go to the desired row and inspect the data. You should be able to identify the bad character causing the failure and delete it manually.

    --------------------------------------------------------------------
    Alex
    The Data Company

  • Something else you might want to try, that has worked for me in the past, is use access. Given you're qty of data, 2k7 shd be ok. If it imports ok, then double check that row, make any updates u need to, then export it as a pipe delimited text qualified txt file. This shd then import without error.

    Else, best of luck!

    Our new venture, The Data Company

  • Hello All,

    I get the number of rows inserted but this does not correspond to that row number in the text file. It almost looks as if the SQL import starts from the bottom of the file? Either way the sequential records in the SQL Server table (ie. 1,2,3,4) are not even in sequence in the text file! I can't find a relationship here at all.....although I certainly hope it exists.

  • SQL doesn't start from the bottom of the file.

    If you're having issues with determining the row in the text file, vs the last entry in SQL, then perhaps it's carriage return issue. If you can, try Access 2k7 as a "pre-import" tool.

    Our new venture, The Data Company

  • Rows in a table have no order, unless you specify an ORDER BY. If you import into a table that has something like an identity, then you'll have a way to order by the inserted order.

  • Hi,

    I think you must use your destination table as a Bulk Table initially. You can try doing the following things

    1. Make sure that the columns allow null value.

    2. If there is a primary key column try to make it auto incremental.

    3. Try to analyze top five rows of the flat file and there data type. If there are any columns with string data. In the destination table try select appropriate varchar value. Eg varchar(100) as we are using this table as Bulk insert table.

    4. Before starting the SSIS package make sure that your TempDB database size is increased to 100MB.

    Regards,

    MFG

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply