SSIS and dealing with embedded double quotes

  • One way to speed up the preprocessing of your CSV files is to use Log Parser to convert them to TSV files. A TSV should be easier for SSIS to make use of; Log Parser 2.2

    For example, if input.csv contained this:

    Author, Title,Quote

    "Salinger","The Catcher in the Rye","""People always think something's all true."""

    "Trumbo","Johnny Got His Gun", """S.O.S. Help me."""

    This would change the delimiters to tabs and remove the double quotes.

    LOGPARSER -i:CSV -o:TSV "SELECT Author, Title,Quote INTO output.tsv FROM input.csv"

  • thanks i am going to test that today

  • Building on the other suggestions and comments earlier in the thread, and if you're able to make file change requests, I'd recommend removing the text qualifier (so you don't get the possibility of the qualifier showing up in the middle of the text, bombing out the import like you've seen), but instead of just changing the field delimiter from a comma to some other single character (pipe, etc.), consider a double-character field delimiter.

    I implemented this as a fix for some vendor data a few years back after your same issue came up with them, switched to no text qualifiers and a double-character field delimiter (we're using a tilda-carat: ~^ as a delimiter), and haven't had any issues since. It'll make the files look ugly, but it works.

    Removing text qualifiers and only switching the field delimiter to another character still leaves open the chance that bad data with that new delimiter character can show up and mess with your import (I never believe the response, "That character will never be sent in our data." ... it'll happen eventually). A double-character delimiter reduces the chance that a data field will ever contain that combo to almost nothing.

    Just a thought ... but bad data is still bad data if it's not sent to you in your established/spec format.

  • dg227 (3/10/2011)


    Just a thought ... but bad data is still bad data if it's not sent to you in your established/spec format.

    And you know what they say:

    crap in is crap out 😀

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (3/10/2011)


    dg227 (3/10/2011)


    Just a thought ... but bad data is still bad data if it's not sent to you in your established/spec format.

    And you know what they say:

    crap in is crap out 😀

    So true ...

  • Thanks to all for responding so fast. I talked to the supplier of the file and they made the file adjustments with no problem 😀

    Like I said earlier, I just started a few weeks ago, so i'm learning who to contact for what and their expectations -- some people think IT can do anything with everything.

    But I like the idea of the double delimiter - i'm going to def put that in my toolbox of ideas to throw out when we start the next file import.

  • while loading CSV with double quotes and comma there is one limitation that extra double quotes has been added and the data also enclosed with the double quotes you

    can check in the preview of source file.

    So, add the derived column task and give the below expression:-

    (REPLACE(REPLACE(**RIGHT(SUBSTRING(TRIM(COL2),1,LEN(COL2) - 1),LEN(COL2) - 2)**," ","@"),"\"\"","\""),"@"," ")

    the bold part removes the data enclosed with double quotes.

    Try this and do let me know if this is helpful

  • sonalijain.engg (3/26/2013)


    while loading CSV with double quotes and comma there is one limitation that extra double quotes has been added and the data also enclosed with the double quotes you

    can check in the preview of source file.

    So, add the derived column task and give the below expression:-

    (REPLACE(REPLACE(**RIGHT(SUBSTRING(TRIM(COL2),1,LEN(COL2) - 1),LEN(COL2) - 2)**," ","@"),"\"\"","\""),"@"," ")

    the bold part removes the data enclosed with double quotes.

    Try this and do let me know if this is helpful

    Useful suggestion, but I do hope the OP solved his problem 2 years ago 😉

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • The soln provided by me is given in the package some people still stuck in such situation as i stucked so i thoght to gv this suggestion n plz dont focus on whether issue is resolved or not focus on the alternate approach:cool:

  • sonalijain.engg (3/27/2013)


    The soln provided by me is given in the package some people still stuck in such situation as i stucked so i thoght to gv this suggestion n plz dont focus on whether issue is resolved or not focus on the alternate approach:cool:

    This is a forum, not Wikipedia. People post problems and others help them solve them.

    So instead of being unpleasant, why don't you focus on helping people who currently have problems?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Useful suggestion, but I do hope the OP solved his problem 2 years ago 😉

    just like most people suggested, i now request for pipe delimiters, but sometimes I do still get files that are comma delimiter and the user extracting the data doesn't know how to change the delimiter. in those cases I have no choice but to suck it up and clean the data up manually.

  • running into this same issue. I read that 2012 could handle column dilimiters embedded in a text however It's not working for me. When i preview the data it woks fine however fails when trying to bring down the file. Also, I'm consuming the files and they are being generated by an ad server. So i cant specify what the delimiter will be. Not exactly sure how to proceed other than to use TALEND. I cant remove them from the data because its the "name" column and if it has double quotes in the name then so be it. This is really starting to make me angry.

    If i end up having to convert this ssis to talend then i'm going to use talend going forward w/ the intent of moving away from ssis totally.

  • i suppose i can write an exe to search for the data issue and replace the double quote w/ two double quotes. its a band aid though

  • mrtrich99 (3/27/2013)


    Useful suggestion, but I do hope the OP solved his problem 2 years ago 😉

    just like most people suggested, i now request for pipe delimiters, but sometimes I do still get files that are comma delimiter and the user extracting the data doesn't know how to change the delimiter. in those cases I have no choice but to suck it up and clean the data up manually.

    Can you provide some sample failing data and a copy of the error message?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Wow, a blast from the past. Although this issue continues to plague me and data imports.

    Now, for a different client, a developer coworker wrote up an ssis with script tasks, that if the file import fails it goes to another directory, so my other files load. Then for any files that fail, since they are csv, we open them up in excel and look for the double quotes and just manually remove them or do a find and replace. It's not fully automated but since only about 1 in 100 files fail, its a systematic process.

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply