Importing a CSV into SQL Server Shouldn't Be This Hard

  • Thanks for the headline alone. I have been muttering that to myself for years. Lol.

     

    -- webrunner

    • This reply was modified 2 years, 11 months ago by  webrunner.

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • Couldn't agree more Steve, MS has had decades to make this better and really have missed the boat on the excel to sql trip. Another wonderful feature related here is where, a column may start out as all numerice, but 1000 or 30000 rows into the spreadsheet it switches to alphanumeric data and I have wound up with null values in the rows with alphanumeric.

    I will take a look at dbatools import, thanks for the post and pointing that out.

    • This reply was modified 2 years, 11 months ago by  Dave at Work. Reason: spelling
  • I have been working with SQL Server for 20 years.  I'm still amazed at how difficult it is to import CSV/Excel files into it.  The wizards are just too difficult, especially if you are trying to get something quickly staged up into a database.  I've probably used a dozen different strategies/tools for getting a CSV file loaded into a database because not one strategy works all of the time.

    Within the past year we received a CSV export of a CRM system that had over 400 CSV files that all had different schemas.  I would have given almost anything for a solution where I could point the program to a folder of CSV files and have it load up into a SQL Server database.  I tried almost every solution out there in order to make that process as painless as possible, but not one tool could do all of it.  I had to cobble together a few different approaches to get the data loaded into SQL Server.  MS Access, Full Convert and a java utility I wrote is what saved the day.

     

     

  • Like most of us, I've dealt with this junk and had to add conversion steps in an SSIS package, etc. It usually isn't that bad. As long as the format of the incoming file doesn't change. Garbage in, garbage out.

    Cheers

  • B DeLine wrote:

    I have been working with SQL Server for 20 years.  I'm still amazed at how difficult it is to import CSV/Excel files into it.  The wizards are just too difficult, especially if you are trying to get something quickly staged up into a database.  I've probably used a dozen different strategies/tools for getting a CSV file loaded into a database because not one strategy works all of the time.

    Within the past year we received a CSV export of a CRM system that had over 400 CSV files that all had different schemas.  I would have given almost anything for a solution where I could point the program to a folder of CSV files and have it load up into a SQL Server database.  I tried almost every solution out there in order to make that process as painless as possible, but not one tool could do all of it.  I had to cobble together a few different approaches to get the data loaded into SQL Server.  MS Access, Full Convert and a java utility I wrote is what saved the day.

    Been there. Maybe not the most elegant solution, but I wrote something that uses openrowset to read the Excel, CSV, etc. file and create a temporary view on a given file so I can get column names and data types. I would then use a mapping table to get the data in to normalized tables. This is all automated and produces success and exception reports. It feels odd to write this, but the process was rather fun to create.

    Cheers

  • For one-off issues, there are several free CSV to SQL tools on the net which build the insert/update/delete/merge/select script from the CSV data (i.e. bfotool.com/csv-to-sql).  That generated script can then be run on your SQL database.  Not a good option for an ongoing scheduled process, but works quick and easy for those ad-hoc types of problems.

  • Jeremy Sage wrote:

    Am I a dinosaur for defaulting to BULK INSERT for tasks like this? I know it is a manual process, including defining the table in the first place, but I was surprised to see it not mentioned - even if it is to weigh the pros and cons (and tell all of us that use it that we are dinosaurs!).

    A bit. bcp starts to require more setup, and more work inside the server. I wanted to just load these files into a table, then select out others. With the writing time, I could have likely just cut/pasted from one Excel sheet to the other, which seems like a crazy way to manage this small amount of data.

  • Jo Pattyn wrote:

    Still wondering why something likely as importing a csv is still so hard using the sql server tools

    Hence why I wrote this. Maybe someeone at MS will pay attention

  • Steve Jones - SSC Editor wrote:

    Hence why I wrote this. Maybe someeone at MS will pay attention

    Microsoft is only interested in Azure now. It is the favoured child while on-premises software has been sent downstairs to collect the ashes. Look at what has happened to Office & Exchange.

    I think that Microsoft will do absolutely nothing about this unless it is a new feature for Azure.

  • Welcome to my world.

    I've been doing ETL development with SSIS essentially for the last 8 years. What you're encountering are issues I've had to deal with on a daily basis.

    Usually what I do is use the "Import Data" task to dump it into a staging table somewhere where I don't mind losing it (for example, tempdb). Then I use T-SQL to clean up my data and then merge it into my final table.

  • mark.hausmann wrote:

    Compared with Power Query from Excel or Power BI it really seems hard to handle csv. SSIS is older. But meanwhile you can use a Power Query Data Source in SSIS and maybe Copy the Query from the Excel Wizard. I think it may be a little bit of an overkill in your case. With the Power Query Wizard your CSV is really easy to handle. If you choose use it together with SSIS would depend on how much automation you need.

     

    No real automation needed. This was a one off, or maybe a yearly, but who knows what the format will look like next year. Really, this is like an ad hoc request i've gotten from managers or business analysts in the past. Quickly clean and load this data, because it's not that easy or quick in Excel. I'd expect loading this into a SQL table to be as easy as loading it in R or Python. Actually, I need to try those next and see how easy/hard it is.

  • t. Bleadorn wrote:

    For one-off issues, there are several free CSV to SQL tools on the net which build the insert/update/delete/merge/select script from the CSV data (i.e. bfotool.com/csv-to-sql).  That generated script can then be run on your SQL database.  Not a good option for an ongoing scheduled process, but works quick and easy for those ad-hoc types of problems.

    Except, you have to have this knowledge and know this. I could search, but that's a whole other project.

    If you wanted to write a few articles on some tools, maybe mini reviews, would love to see them.

  • sean redmond wrote:

    Steve Jones - SSC Editor wrote:

    Hence why I wrote this. Maybe someeone at MS will pay attention

    Microsoft is only interested in Azure now. It is the favoured child while on-premises software has been sent downstairs to collect the ashes. Look at what has happened to Office & Exchange.

    I think that Microsoft will do absolutely nothing about this unless it is a new feature for Azure.

    Except this is a process that someone would need for an Azure SQL Database. The Import Flat File extension is OK, but I need to be able to use an existing table to make this easy.

  • Went through same process, I found tha Navicat was the fastest and easiest way to get data in the right data type, and then be able to save that import template to automate future imports.

    It was the ONLY tool that did everything easily out of the box. Good to see some other suggestions here.

    There is also an old coldfusion tmt_massimo_csv.cfc (coldfusion/lucee) script that tops all of these, and lucee/cf also has spreadsheet import and export out of the box which were also much better than any of the below tools. That script was able to read more DIRTY csv files than any other tool I've  used in 20 years of web dev, and combined with the blistering speed of modern cf engines like lucee, murders large imports (I was doing 20-30 million rows). I still use Lucee (open source/free modern cf engine) and find it one of the best integration tools available on the market today... imho sits beside python, but with more features, cf can read native dot net and native java and uses a very expensive java-dotnet-bridge for fraction of it's cost. anyway, this is another goto for csv because of this script and the speed that cf engines like lucee can run (compiling to java bytecode)

    The tools I found were Rubbish:

    • Microsofts (all)
    • Redgate
    • Apex SQL
    • any of the other top 5-10 suggestions on this site and others.

    Tools that were passable:

    • Jetbrains DataGrip (the best exporter to csv/tsv, but average import)Tools that could have been useful

      -Microsoft logparser (which was updated in 2020 btw)

    • This reply was modified 2 years, 11 months ago by  Dawesi.
  • That's the whole point of the article, not every data import is pretty.

    I've dealt with loyalty agencies where we had to remove excel headers and footers and all sorts of guff that changed EVERY MONTH in it's format for prizes as the owner of the doc rebuilt it completely every month manually...#yeah...

    When data is clean and consistent, you don't need a dba anymore just an import script.

    • This reply was modified 2 years, 11 months ago by  Dawesi.

Viewing 15 posts - 16 through 30 (of 61 total)

You must be logged in to reply to this topic. Login to reply