EXCEL HELL!!!

  • Good to see someone else has come to the conculsion I came to years ago. Besides, after 15 years of SQL I am still learning and damned if I'm going to have to start from scratch with SISS. Besides, I'm not a developer, I'm an analyst and I shouldn't need to ring a developer to import a spreadsheet.

    I have the problem described by Jeff and more specifically the healthcare data by pharryecoenen. In healthcare you're usually only a minor buyer so you have no control over the format of the file received at all. It is Hobson's choice every time with a spreadhseet designed by Murphy (the one with the law named after him). My solution is a little different ...

    I use the Jet driver (32bit) or OPENROWSET (64 bit or later Office files) - even if the ACE drivers are a half-baked bodgit from Microsoft.

    Instead of direct OPENROWSET I set up the spreadsheet as a linked server. Why? Because:

    EX_SP_TABLES tells me what tabs are there.

    EX_SP_COLUMNS tells me what columns there are.

    Only learnt recently OPENROWSET can use the same commands. Like I said, 15 years and I'm still learning SQL.

    I have an autoindex column and then just yank the whole spreadsheet in. Various parameters (in a table) allow a user to include / exclude various tabs etc to control the import. If the supplier is sensible and the tabs are safe I set IMEX=0 and HDR=YES. If they're unreliable I set IMEX=0 (although bear in mind pharryecoenen's comments on formatting affecting rounding errors and dates) and HDR=NO to yank the whole lot into a temporary table with columns set to varchar(255).

    Once the spreadsheet is in I check all columns for ISNUMERIC(column+e0) to see if numeric or NULL. I assume a whole row that isn't numeric anywhere probably is a header of some sort. They still manage to break it with columns called 1 and so forth but when I'm importing a thousand sheets a month it breaks the back and allows me to manually intervene in the handful that break. Final job is to compare columns to final table and insert where required or add extra columns and e-mail to tell me extra columns were required. It also fills down columns where required but this requires a parameter to tell it which columns to fill down - another opportunity for Captain Cockup to pay a visit.

    Inciidentally, we also receive flat files from a few providers. Same issues. I load the header row, replace the column delineator to make a temp table varchar(255) for each column, suck the file in, compare to what has been before, add ... you get the idea.

    It is messy. It is horrible! BUT we had a team of people working themselves into an early grave processing it all in a few days in Excel before I got hold of it and shoved it all into T-SQL.

    Biggest issue now? Rounding errors where formatting doesn't show decimals and the import uses IMEX=1.

    Works for me .. most of the time!

    SISS? Well, I gave my procs to the SISS development team around about Christmas. Not heard anything yet ...

  • h540663 (6/19/2015)


    Good to see someone else has come to the conculsion I came to years ago. Besides, after 15 years of SQL I am still learning and damned if I'm going to have to start from scratch with SISS. Besides, I'm not a developer, I'm an analyst and I shouldn't need to ring a developer to import a spreadsheet.

    I have the problem described by Jeff and more specifically the healthcare data by pharryecoenen. In healthcare you're usually only a minor buyer so you have no control over the format of the file received at all. It is Hobson's choice every time with a spreadhseet designed by Murphy (the one with the law named after him). My solution is a little different ...

    I use the Jet driver (32bit) or OPENROWSET (64 bit or later Office files) - even if the ACE drivers are a half-baked bodgit from Microsoft.

    Instead of direct OPENROWSET I set up the spreadsheet as a linked server. Why? Because:

    EX_SP_TABLES tells me what tabs are there.

    EX_SP_COLUMNS tells me what columns there are.

    Only learnt recently OPENROWSET can use the same commands. Like I said, 15 years and I'm still learning SQL.

    I have an autoindex column and then just yank the whole spreadsheet in. Various parameters (in a table) allow a user to include / exclude various tabs etc to control the import. If the supplier is sensible and the tabs are safe I set IMEX=0 and HDR=YES. If they're unreliable I set IMEX=0 (although bear in mind pharryecoenen's comments on formatting affecting rounding errors and dates) and HDR=NO to yank the whole lot into a temporary table with columns set to varchar(255).

    Once the spreadsheet is in I check all columns for ISNUMERIC(column+e0) to see if numeric or NULL. I assume a whole row that isn't numeric anywhere probably is a header of some sort. They still manage to break it with columns called 1 and so forth but when I'm importing a thousand sheets a month it breaks the back and allows me to manually intervene in the handful that break. Final job is to compare columns to final table and insert where required or add extra columns and e-mail to tell me extra columns were required. It also fills down columns where required but this requires a parameter to tell it which columns to fill down - another opportunity for Captain Cockup to pay a visit.

    Inciidentally, we also receive flat files from a few providers. Same issues. I load the header row, replace the column delineator to make a temp table varchar(255) for each column, suck the file in, compare to what has been before, add ... you get the idea.

    It is messy. It is horrible! BUT we had a team of people working themselves into an early grave processing it all in a few days in Excel before I got hold of it and shoved it all into T-SQL.

    Biggest issue now? Rounding errors where formatting doesn't show decimals and the import uses IMEX=1.

    Works for me .. most of the time!

    SISS? Well, I gave my procs to the SISS development team around about Christmas. Not heard anything yet ...

    Heh... I agree. Compared to the ACE driver, the Jet driver was awesome. As for SSIS, we don't even have it loaded because, thankfully, my boss (the Enterprise Architect) hates it as much as I do. 😀 Is it useful? A lot of people find that it is but I've never needed it or DTS to get the job done quickly, easily, and accurately.

    And, please... don't even for a second think that I'm badmouthing people that use it. I'm just explaining that I have a personal bit of hate for it and haven't found anything yet that I can't do without it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • re the rounding due to Excel formatting

    and following Eirikur's example

    I decompressed an .xlsx (rename to .zip and unzip) and had a look.

    The sheets ar in \xl\worksheets and contain the raw values unformatted, so maybe that's a line for further experimentation, if it's a 'material' issue.

    Cheers

    Harry

  • That sorts XLSX but I get a whole smorgasboard of formats:

    XLSX

    XLS - yes, still.

    XLSB

    XLSM

    MD...

    And then, occassionally, still, PDF. Still not sure what to do about that one. Print to postscript file and then try and scrape somehow? Not a big enough problem for me to have to fix it yet.

    As for SISS. Well, it does have pretty colours. It automates a lot of parallelism. However, most of the time the gain isn't big enough for me and if I have to be there, let alone look at it then something is seriously wrong. It doesn't seem as clever as Access. For many small businesses and one man bands who do not have access to a team of developers but still need to compete and get a job done it just doesn't make sense, especially as T-SQL can do just about everything I need it to.

    It'll all be obselete before long anyway. You'll throw files / weblinks at some software and it will just load and correlate different elements automatically for analysis, perhaps with a tick-box exercise for where it need manual intervention. I'm looking forward to someone clever like Jeff coming up with that - or beating them to it with T-SQL.

    In the meantime I'll just plod on with my stored procedures!

  • Jeff Moden (6/19/2015)

    Heh... I agree. Compared to the ACE driver, the Jet driver was awesome. As for SSIS, we don't even have it loaded because, thankfully, my boss (the Enterprise Architect) hates it as much as I do. 😀 Is it useful? A lot of people find that it is but I've never needed it or DTS to get the job done quickly, easily, and accurately.

    And, please... don't even for a second think that I'm badmouthing people that use it. I'm just explaining that I have a personal bit of hate for it and haven't found anything yet that I can't do without it.

    I'm sure I saw a letter from Microsoft some years ago saying the ACE driver development was curtailed because it was considered unnecessary with the emphasis on SISS and so forth. I think it was the victim of some internal cuts / restructuring but I might be wrong.

    It works ... just ... most of the time ... with a bit of tuning.

  • h540663 - Monday, June 22, 2015 3:08 AM

    Jeff Moden (6/19/2015)Heh... I agree. Compared to the ACE driver, the Jet driver was awesome. As for SSIS, we don't even have it loaded because, thankfully, my boss (the Enterprise Architect) hates it as much as I do. 😀 Is it useful? A lot of people find that it is but I've never needed it or DTS to get the job done quickly, easily, and accurately.And, please... don't even for a second think that I'm badmouthing people that use it. I'm just explaining that I have a personal bit of hate for it and haven't found anything yet that I can't do without it.

    I'm sure I saw a letter from Microsoft some years ago saying the ACE driver development was curtailed because it was considered unnecessary with the emphasis on SISS and so forth. I think it was the victim of some internal cuts / restructuring but I might be wrong.It works ... just ... most of the time ... with a bit of tuning.

    Just an update on that.  They've continued to release the updated drivers.  They've recently come out with the 2016 version.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I was trying to run Sub Test()
    but I get an error at the line
    ConnString = "Provider=SQLOLEDB;Data Source= GOSTAMUNKTELL\SQLEXPRESS;" & _
    "Initial Catalog=Rapport;" & _
    "Integrated Security=SSPI;"
    what should one use instead?

  • grovelli-262555 - Tuesday, April 11, 2017 8:27 AM

    I was trying to run Sub Test()
    but I get an error at the line
    ConnString = "Provider=SQLOLEDB;Data Source= GOSTAMUNKTELL\SQLEXPRESS;" & _
    "Initial Catalog=Rapport;" & _
    "Integrated Security=SSPI;"
    what should one use instead?

    I don't know.  It's not T-SQL. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I see it is some stuff I have written in VBA.
    Have you changed the code to your local environment?
    Data Source= yourSQLserver
    Initial Catalog=yourdatabasename
    With this code you will logg in with windows authentication.
    Observe that you might be forced to run the application as
    administrator even if you are logged in as administrator.
    Gosta M

  • I currently cannot create any db(against which to run your code)in SQL Server, I'm going to ask the db admin to create a test environment for me.

  • What experience do you have in this area?
    Is the SQL server hosted on your computer or
    on a separate server on your intranet?
    If so you would probably logg in with SQL Server Authentication to the database
    which will change the code.( Server= IP adress)
    Connstring = "Provider=SQLOLEDB;Server=XXX.YY.ZZ.VV;Database=Databasename;User
    ID=XXXXXX;Password=YYYYYY;"

  • Jeff Moden - Monday, April 10, 2017 5:46 PM

    h540663 - Monday, June 22, 2015 3:08 AM

    Jeff Moden (6/19/2015)Heh... I agree. Compared to the ACE driver, the Jet driver was awesome. As for SSIS, we don't even have it loaded because, thankfully, my boss (the Enterprise Architect) hates it as much as I do. 😀 Is it useful? A lot of people find that it is but I've never needed it or DTS to get the job done quickly, easily, and accurately.And, please... don't even for a second think that I'm badmouthing people that use it. I'm just explaining that I have a personal bit of hate for it and haven't found anything yet that I can't do without it.

    I'm sure I saw a letter from Microsoft some years ago saying the ACE driver development was curtailed because it was considered unnecessary with the emphasis on SISS and so forth. I think it was the victim of some internal cuts / restructuring but I might be wrong.It works ... just ... most of the time ... with a bit of tuning.

    Just an update on that.  They've continued to release the updated drivers.  They've recently come out with the 2016 version.

    My heart jumped, as for a moment I thought MS had created new JET drivers. Only ACE though.
    But you can do some preprocessing with Powershell (very VBA alike).
    Some messy work can be found here:
    https://github.com/DutchHarry/Excel2SQL
    Cheers

  • pharryecoenen - Wednesday, April 12, 2017 4:58 AM

    My heart jumped, as for a moment I thought MS had created new JET drivers. Only ACE though.
    But you can do some preprocessing with Powershell (very VBA alike).
    Some messy work can be found here:
    https://github.com/DutchHarry/Excel2SQL
    Cheers

    On that note, I gave an updated version of my "Excel Hell" presentation last night and there was a young lady that says there's a registry hack to allow Jet drivers to work as before but in a 64 bit environment.  She's supposed to send me the hack.  I don't know how useful that will be, though, because I don't recall that Jet ever support the XML type of spreadsheets that came out in the later versions of office.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden - Wednesday, April 12, 2017 7:38 AM

    pharryecoenen - Wednesday, April 12, 2017 4:58 AM

    My heart jumped, as for a moment I thought MS had created new JET drivers. Only ACE though.
    But you can do some preprocessing with Powershell (very VBA alike).
    Some messy work can be found here:
    https://github.com/DutchHarry/Excel2SQL
    Cheers

    On that note, I gave an updated version of my "Excel Hell" presentation last night and there was a young lady that says there's a registry hack to allow Jet drivers to work as before but in a 64 bit environment.  She's supposed to send me the hack.  I don't know how useful that will be, though, because I don't recall that Jet ever support the XML type of spreadsheets that came out in the later versions of office.

    Does you update "Excel Hell" also work in later versions of SQL (the column set issue ...)?
    Would be very interested to learn from your updates. Is it availlable somewhere?

    And of course, very interested in the JET for x64 too (saves installing a 32-bit SQL instance just for occasional handling some old Excel stuff)
    From top of my head, don't think JET did .xlsx, no. But would be pleased to be wrong.

  • pharryecoenen - Wednesday, April 12, 2017 7:48 AM

    Does you update "Excel Hell" also work in later versions of SQL (the column set issue ...)?
    Would be very interested to learn from your updates. Is it availlable somewhere?

    And of course, very interested in the JET for x64 too (saves installing a 32-bit SQL instance just for occasional handling some old Excel stuff)
    From top of my head, don't think JET did .xlsx, no. But would be pleased to be wrong.

    I've not tried it on 2014 or 2016.  The thing that broke on the original method was that Microsoft made some sort of change to how Global Temp tables are handled between 2088 and 2012.  The previous method ended in a failure reporting some oddity having to do with the use of more than one level of stored procedures where the lowest level is what created the Global Temp table.  I didn't save that error off to anywhere.  I just sighed out load (the politically correct non-HR version of what I actually did and said) and combined the two procs. 

    As for the latest presentation from last night... I just checked the chapter site and there doesn't seem to be an attachment for the meeting anywhere.  I've just sent an email to the chapter officers to alert them.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 46 through 60 (of 63 total)

You must be logged in to reply to this topic. Login to reply