Importing a CSV into SQL Server Shouldn't Be This Hard

  • Oh, but Steve, you really must experience the joy of trying to import "sometimes-quoted" fields using BULK INSERT!

    BULK INSERT Fun_With_Quotes
    FROM 'C:\Import\venmo.csv'
    WITH ( FORMAT='CSV');
  • Good article!   Great points, great screen shots and well written.  While most of the commentors suggest using other tools (which is fine), they don't tend to address the real issue...  which is why one of the largest software companies in the world can't write intuitive, user friendly software.  Even worse, rather than fix, improve and polish a product, they love to write a completely new one that does the same thing only differently and then abandon the millions of users that adopted the "old" product.  (i.e. charge them to "upgrade")  But, the real question is, how long are we going to keep recommending and supporting such companies and products?

  • Brandon Champion wrote:

    Oh, but Steve, you really must experience the joy of trying to import "sometimes-quoted" fields using BULK INSERT!

    BULK INSERT Fun_With_Quotes
    FROM 'C:\Import\venmo.csv'
    WITH ( FORMAT='CSV');

    I'll pass. At least until you write me a humorous and sad article on how this goes in practice.

  • I agree, why does it have to be so hard?

    I have a 64-bit machine with 64-bit MSOffice, which makes importing excel files impossible, I'm told because ssms is 32-bit.

    So I have to export the excel file to csv.

    And the flat file wizard has never worked, it always crashes, maybe for the same reason?

    So I'm left with the main importer wizard where it's as bad as described.  That hasn't changed at least between version 2012 and 2019.  Really needs a refresh.

  • I have a similar issue uploading disparate csv's.

    The person sending the csv' is creating them on a DeskTop app.

    So, having messed about for a while, I decided to;

    A) write an import conversion using the same DT app - easy as I know it well. Just for fun.

    B) combine the data on my production machine before uploading. Means the work is my end, but under my control. Worked.

  • Boy, this brought back some painful memories for me.  In my case the CSV contained adjacency lists which I was to import into SQL.  My problem was that the people who filled the CSV files put all kinds of strange characters in them, like quotation marks and carriage returns.  These would cause Import Wizard to barf.  Some of the data was much longer than the column it was to go in.

    I finally got around this by using Excel to get rid of the unwanted characters and truncate the data to fit the columns.  This worked, but of course you have to ask why it was so hard in the first place.

  • As an fyi: https://stackoverflow.com/questions/37710776/how-to-install-microsoft-ace-oledb-32-bit-with-office-64-bit

    This should help you get 32bit oledb working on a system with 64bit office.

    I have seriously debated on writing a blog post specifically on this.

    Have fun!

  • As practically everyone here is saying.. this has always been a pain.

    I don't have a problem with CSV files. As long as the file is formatted properly (e.g. consistently)

    A lot of open data source formats (e.g census etc) are provided in CSV.

    A while ago I tried using the 'Export to CSV'  option in ADS for some existing SQL data.  I was trying create data for another system.  I was amazed to discover that some exported 'char' fields were delimited with quotes and other 'char' fields were not delimited.  I was convinced this was a bug.  When I logged the issue I was told that it delimited the field with quotes if it discovered a comma "," in the text of any record for that particular field.  But there was no way to make all the 'char' fields delimited with quotes.  What a mess.  Some 'char' fields formatted one way and others a different way.  I gave up on ADS as a tool for that work.  (Maybe it's been fixed and/or changed by now but I haven't bothered to check)

    I usually take the approach to spend the time to create a solid 'landing zone / staging' table structure first.  I ensure that the table will be bulletproof for the CSV data I am trying to load and then I can choose from some of the already mentioned different methods to load the data.  That upfront (table design) work helps remove the 'loading errors' pain.

    After I know I've got a good solid staging table structure, I've even done silly things like opening up the 'staging' table in SSMS edit mode and then used copy and paste (from the csv file) to insert the rows into the table.  It works for one-off quick loads.  You can even append data with copy and paste.

    Bottom-line... I wish there was a better solution.

     

     

  • On the subject of "sometimes" quoted CSV's...

    I totally agree that it shouldn't be necessary.  MS has had more that 2 decades to make this "simple" stuff right as have a great many others.

    That, not withstanding, MS can't even build a decent string splitter so I don't know why anyone is actually crabbing about the ineptitude of MS and a host of others that can't seem to get it right.

    The work around I use is a bit annoying but it does work all the time.  That is, I make a call to PowerShell to do an IMPORT-CSV followed immediately by an EXPORT-CSV.  I don't use PowerShell (yet... I've got to look at that DBATools functionality that Steve brought up) because things are generally slower in PowerShell but it does seem to do the Import-CSV/Export-CSV rapidly enough to make it worthwhile.

    So, whazitdo?  It correctly encapsulates all the fields in double-quotes and it does it correctly (ie, not doing it again) on fields that are already encapsulated by quotes.  That makes it damned easy to do imports with or without a BCP Format File (although the format does allow you to correctly handle the first and last double quotes on each line).  It might even allow the new CSV format of Bulk Insert to work correctly (I've just not tried it yet).

    Still, it's a bloody damned shame that we have to such silly workarounds in this day and age but consider how long we've also been looking for a BULK EXPORT tool.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Definitely been there, though my favorite was when someone gave us a "ragged right" CSV. So if you had 10 total columns, but some rows only used 8 or so of them, they just stopped - no trailing commas to indicate empty fields, just a CRLF for the next line. SSIS _hates_ these.  To make it more fun, Excel at the time would generate ragged-right CSV files if you saved as CSV. That gave us all sorts of fits because Excel's data type handling is less-than-stellar and we exported to a CSV to avoid some of the issues only to find that those had other issues.

    For imports using PowerShell, the Import-CSV and Import-DbaCsv are nice functions. There's also the ImportExcel module, which does some really neat stuff w/ data and doesn't require a full install of Excel on the machine to do its magic. I've used that quite a bit to generate Excel files, read them, standardize worksheet names for ETL (because we couldn't depend on that sheet containing data to have a consistent name), and do some basic formatting.

    I was trying to fix someone's import/export script for CSV files and we hit some crazy inconsistency with quotes. We had a script that was designed to pull in the file, fix the quotes, then save the data to a new file. The BULK INSERT command could then read that file. Crazy workaround, but it works and hard to justify touching a working process to try to make it better. 🙂

  • logitestus wrote:

    As an fyi: https://stackoverflow.com/questions/37710776/how-to-install-microsoft-ace-oledb-32-bit-with-office-64-bit

    This should help you get 32bit oledb working on a system with 64bit office.

    I have seriously debated on writing a blog post specifically on this.

    Have fun!

     

    blog or send us an article. Likely a few people have this issue.

  • Jeff Moden wrote:

    On the subject of "sometimes" quoted CSV's...

    I totally agree that it shouldn't be necessary.  MS has had more that 2 decades to make this "simple" stuff right as have a great many others.

    Still, it's a bloody damned shame that we have to such silly workarounds in this day and age but consider how long we've also been looking for a BULK EXPORT tool.

    Microsoft Log Parser... updated in 2020, been around for decades.... #dropsmic

  • Another thing that is strange is that in SSMS 18 (17 was the same from memory) if you want to import Excel files you need to save the file into Office 2003 format otherwise it gives an error about not having the correct driver and this has happened across many different devices at every company and at home so was not specific to a certain configuration.

    Having to re-save new Excel files into 2003 format to be able to import using the 2021 version of the SSMS import tools makes no sense whatsoever.

    A workaround I found for getting SSMS to detect correct field sizes and type was to add a new row at the top with more data in the fields and then delete after import as tryin to amend in the wizard doesn't always seem to work very well.

  • I really appreciated this article. It made me feel better about my skills (or lack of!). Reading the other comments about how others are import CSV files, which should be easy, has me wishing I had time to explore more methods that seem to be less annoying.

    The Venmo data reminds me of the crap I find in "database files" I get from insurance companies.

    -Mark
    MSSQL 2019 Standard, Azure Hosted. Techie/Sysadmin by trade; Three years as a "DBA" now.

  • Robin Wilson wrote:

    Another thing that is strange is that in SSMS 18 (17 was the same from memory) if you want to import Excel files you need to save the file into Office 2003 format otherwise it gives an error about not having the correct driver and this has happened across many different devices at every company and at home so was not specific to a certain configuration.

    Exactly! Except I refuse to convert an XLSX back to XLS for importing. I have finally figured out, I think, which "driver" for Excel has to be installed on my workstation so I can import Excel 2016 files. I wipe my workstation and reinstall windows every one-two years so I reinstall that "driver" each time.

    -Mark
    MSSQL 2019 Standard, Azure Hosted. Techie/Sysadmin by trade; Three years as a "DBA" now.

Viewing 15 posts - 31 through 45 (of 61 total)

You must be logged in to reply to this topic. Login to reply