How to insert CSV data into DB where some data don't have double quotes

  • Example of data in CSV are as follows:

    "XXX","0001",-990039739 ,0 ,0 ,0 ,0 ,0 ,0

    "ABC"," ",-3422054702 ,0 ,481385 ,0 ,0 ,0 ,0

    "JJZ","0001",0 ,0 ,0 ,0 ,0 ,0 ,0

    Here's my format:

    12.0

    10

    1 SQLCHAR 0 0 "\"" 0 "" ""

    2 SQLCHAR 0 5 "\",\"" 1 OKCCY SQL_Latin1_General_CP1_CI_AS

    3 SQLCHAR 0 6 "\",\"" 2 OKBRNM SQL_Latin1_General_CP1_CI_AS

    4 SQLMONEY 0 19 "\",\"" 3 OKONPX SQL_Latin1_General_CP1_CI_AS

    5 SQLMONEY 0 19 "\",\"" 4 OKONPM SQL_Latin1_General_CP1_CI_AS

    6 SQLMONEY 0 21 "\",\"" 5 OKFRX SQL_Latin1_General_CP1_CI_AS

    7 SQLMONEY 0 21 "\",\"" 6 OKFPX SQL_Latin1_General_CP1_CI_AS

    8 SQLMONEY 0 19 "\",\"" 7 OKMRX SQL_Latin1_General_CP1_CI_AS

    9 SQLMONEY 0 19 "\",\"" 8 OKMPX SQL_Latin1_General_CP1_CI_AS

    10 SQLMONEY 0 5 "\"\r" 9 OKSEQ SQL_Latin1_General_CP1_CI_AS

  • Ken Tot (8/11/2015)


    Example of data in CSV are as follows:

    "XXX","0001",-990039739 ,0 ,0 ,0 ,0 ,0 ,0

    "ABC"," ",-3422054702 ,0 ,481385 ,0 ,0 ,0 ,0

    "JJZ","0001",0 ,0 ,0 ,0 ,0 ,0 ,0

    Here's my format:

    12.0

    10

    1 SQLCHAR 0 0 "\"" 0 "" ""

    2 SQLCHAR 0 5 "\",\"" 1 OKCCY SQL_Latin1_General_CP1_CI_AS

    3 SQLCHAR 0 6 "\",\"" 2 OKBRNM SQL_Latin1_General_CP1_CI_AS

    4 SQLMONEY 0 19 "\",\"" 3 OKONPX SQL_Latin1_General_CP1_CI_AS

    5 SQLMONEY 0 19 "\",\"" 4 OKONPM SQL_Latin1_General_CP1_CI_AS

    6 SQLMONEY 0 21 "\",\"" 5 OKFRX SQL_Latin1_General_CP1_CI_AS

    7 SQLMONEY 0 21 "\",\"" 6 OKFPX SQL_Latin1_General_CP1_CI_AS

    8 SQLMONEY 0 19 "\",\"" 7 OKMRX SQL_Latin1_General_CP1_CI_AS

    9 SQLMONEY 0 19 "\",\"" 8 OKMPX SQL_Latin1_General_CP1_CI_AS

    10 SQLMONEY 0 5 "\"\r" 9 OKSEQ SQL_Latin1_General_CP1_CI_AS

    Two things, Ken. Nothing but the first 3 fields (I count the first quote as a separate field just like is done in the BCP format file above) has quotes as delimiters. Just remove the notations for quotes from all the other fields in that format file and you should be golden.

    As a bit of a sidebar, the file is also a "fixed field" file. It that will always be true and the padding within the numeric fields will always stay the same, you could get much better performance by changing the BCP format file to be position sensitive instead of delimiter sensitive. That will take a bit more work because you'll essentially double the number of fields in the format file (every other field will be a column "0" field to ignore the delimiters) but the performance will make it worth it.

    If you aren't sure about the continued fixed field format, then continue to used the delimited format so you won't be surprised by changes in the future.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (8/11/2015)


    Ken Tot (8/11/2015)


    Example of data in CSV are as follows:

    "XXX","0001",-990039739 ,0 ,0 ,0 ,0 ,0 ,0

    "ABC"," ",-3422054702 ,0 ,481385 ,0 ,0 ,0 ,0

    "JJZ","0001",0 ,0 ,0 ,0 ,0 ,0 ,0

    Here's my format:

    12.0

    10

    1 SQLCHAR 0 0 "\"" 0 "" ""

    2 SQLCHAR 0 5 "\",\"" 1 OKCCY SQL_Latin1_General_CP1_CI_AS

    3 SQLCHAR 0 6 "\",\"" 2 OKBRNM SQL_Latin1_General_CP1_CI_AS

    4 SQLMONEY 0 19 "\",\"" 3 OKONPX SQL_Latin1_General_CP1_CI_AS

    5 SQLMONEY 0 19 "\",\"" 4 OKONPM SQL_Latin1_General_CP1_CI_AS

    6 SQLMONEY 0 21 "\",\"" 5 OKFRX SQL_Latin1_General_CP1_CI_AS

    7 SQLMONEY 0 21 "\",\"" 6 OKFPX SQL_Latin1_General_CP1_CI_AS

    8 SQLMONEY 0 19 "\",\"" 7 OKMRX SQL_Latin1_General_CP1_CI_AS

    9 SQLMONEY 0 19 "\",\"" 8 OKMPX SQL_Latin1_General_CP1_CI_AS

    10 SQLMONEY 0 5 "\"\r" 9 OKSEQ SQL_Latin1_General_CP1_CI_AS

    Two things, Ken. Nothing but the first 3 fields (I count the first quote as a separate field just like is done in the BCP format file above) has quotes as delimiters. Just remove the notations for quotes from all the other fields in that format file and you should be golden.

    As a bit of a sidebar, the file is also a "fixed field" file. It that will always be true and the padding within the numeric fields will always stay the same, you could get much better performance by changing the BCP format file to be position sensitive instead of delimiter sensitive. That will take a bit more work because you'll essentially double the number of fields in the format file (every other field will be a column "0" field to ignore the delimiters) but the performance will make it worth it.

    If you aren't sure about the continued fixed field format, then continue to used the delimited format so you won't be surprised by changes in the future.

    Hi I think I'm near to solve the problem. Please check. I have a different error now. Thank you.

    Msg 4823, Level 16, State 1, Line 19

    Cannot bulk load. Invalid column number in the format file

    12.0 => I am using SS2012

    10

    1 SQLCHAR 0 1 "\"" 0 "" => change from 0 to 1

    2 SQLCHAR 0 3 "\",\"" 1 OKCCY SQL_Latin1_General_CP1_CI_AS => 5 to 3 (length in your sample)

    3 SQLCHAR 0 4 "\"," 2 OKBRNM SQL_Latin1_General_CP1_CI_AS => remove last after comma and change 6 to 4

    4 SQLCHAR 0 17 "," 3 OKONPX SQL_Latin1_General_CP1_CI_AS => change 19 to 17 and remove quote

    5 SQLCHAR 0 17 "," 4 OKONPM SQL_Latin1_General_CP1_CI_AS => change to 17 and remove quotes

    6 SQLCHAR 0 19 "," 5 OKFRX SQL_Latin1_General_CP1_CI_AS => change to 19 and remove quotes

    7 SQLCHAR 0 19 "," 6 OKFPX SQL_Latin1_General_CP1_CI_AS => change to 19 and remove quotes

    8 SQLCHAR 0 17 "," 7 OKMRX SQL_Latin1_General_CP1_CI_AS => change to 17 and remove quotes

    9 SQLCHAR 0 17 "," 8 OKMPX SQL_Latin1_General_CP1_CI_AS => change to 17 and remove quotes

    10 SQLCHAR 0 3 "\r" 9 OKSEQ SQL_Latin1_General_CP1_CI_AS => change to 17 and remove quotes

  • Which are you going for? The fixed field lengths or the comma delimited? If the comma delimited, then change all but the first field length to 500 and call it a day.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (8/12/2015)


    Which are you going for? The fixed field lengths or the comma delimited? If the comma delimited, then change all but the first field length to 500 and call it a day.

    12.0

    10

    1 SQLCHAR 0 500 "\"" 0 ""

    2 SQLCHAR 0 500 "\",\"" 1 OKCCY SQL_Latin1_General_CP1_CI_AS

    3 SQLCHAR 0 500 "\"," 2 OKBRNM SQL_Latin1_General_CP1_CI_AS

    4 SQLCHAR 0 500 "," 3 OKONPX SQL_Latin1_General_CP1_CI_AS

    5 SQLCHAR 0 500 "," 4 OKONPM SQL_Latin1_General_CP1_CI_AS

    6 SQLCHAR 0 500 "," 5 OKFRX SQL_Latin1_General_CP1_CI_AS

    7 SQLCHAR 0 500 "," 6 OKFPX SQL_Latin1_General_CP1_CI_AS

    8 SQLCHAR 0 500 "," 7 OKMRX SQL_Latin1_General_CP1_CI_AS

    9 SQLCHAR 0 500 "," 8 OKMPX SQL_Latin1_General_CP1_CI_AS

    10 SQLCHAR 0 500 "\r" 9 OKSEQ SQL_Latin1_General_CP1_CI_AS

    Not working. Still the same error.

  • Jeff,

    Sorry for double posting. Just to let you know those data are standard. The first 2 columns always has double quotes. We extract them from the server.

    Thanks,

    Ken

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply