How to create connection manager for a HTML file ?

  • I have been sent what I thought was excel files to import into our system with a SSIS package but after more careful examination they are really HTML files with a .xls extension. I have been trying to read them by creating a connection manager but have not had any luck so far. Also if I open the files in Excel and do a Save As and change the type from Web Page (*.htm,*.html) to Excel Workbook (*.xlsx) I can then create a connection manager to access the new data.

    Anybody got any ideas on how I can get that data ?

  • If it's HTML, then it's XML...have you tried parsing it in SSIS using an XML Data Source?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I have not tried that will see if I can get that to work this morning.

    Thanks

  • Well I am not sure this is going to work having errors with file. I created a dataflow and for my Source I used the XML source object then pointed it to one of my files. When I try to generate the XSD in the XML Source Editor I get the following error message (see below) not sure where to go from here.

    ===================================

    'Content-Type' is an unexpected token. The expected token is '"' or '''. Line 6, position 18. (Microsoft Visual Studio)

    ------------------------------

    Program Location:

    at System.Xml.XmlTextReaderImpl.Throw(Exception e)

    at System.Xml.XmlTextReaderImpl.Throw(String res, String[] args)

    at System.Xml.XmlTextReaderImpl.ThrowUnexpectedToken(String expectedToken1, String expectedToken2)

    at System.Xml.XmlTextReaderImpl.ParseAttributes()

    at System.Xml.XmlTextReaderImpl.ParseElement()

    at System.Xml.XmlTextReaderImpl.ParseElementContent()

    at System.Xml.XmlTextReaderImpl.Read()

    at System.Xml.Schema.XmlSchemaInference.InferElement(XmlSchemaElement xse, Boolean bCreatingNewType, XmlSchema parentSchema)

    at System.Xml.Schema.XmlSchemaInference.AddElement(String localName, String prefix, String childURI, XmlSchema parentSchema, XmlSchemaObjectCollection addLocation, Int32 positionWithinCollection)

    at System.Xml.Schema.XmlSchemaInference.InferElement(XmlSchemaElement xse, Boolean bCreatingNewType, XmlSchema parentSchema)

    at System.Xml.Schema.XmlSchemaInference.AddElement(String localName, String prefix, String childURI, XmlSchema parentSchema, XmlSchemaObjectCollection addLocation, Int32 positionWithinCollection)

    at System.Xml.Schema.XmlSchemaInference.InferSchema1(XmlReader instanceDocument, XmlSchemaSet schemas)

    at System.Xml.Schema.XmlSchemaInference.InferSchema(XmlReader instanceDocument, XmlSchemaSet schemas)

    at Microsoft.DataTransformationServices.DataFlowUI.XmlSourceConnectionPage.buttonGenerateXsd_Click(Object sender, EventArgs e)

  • Can yoi post the file? If not comfortable posting on the Forum send me a personal message (PM link on posts) and I'll send you my personal email address.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks for the email. Interesting file...it appears the data provider is too clever for everyone's own good. The file is actually an Excel htm file (binary format) changed to have an xls extension, not a plain-text HTML file changed to have an xls extension.

    The data provider probably decided to switch the file extension from htm to xls so that it would guarantee the file would open in Excel when double-clicked, instead of in an internet browser which is the program usually assigned to open files with an htm extension.

    So, unfortunately, I think you are going to have to open the file and save it as an Excel Workbook before working with it. You may be able to use the Excel Object Model to access the file in a Script Task and save it out as an Excel Workbook programmatically, but that's a theory, I have not tried accessing an Excel-htm file using the Object Model and am not even sure it is supported. If it is possible steps would be:

    1. File System Task: rename file from myFile.xls to myFile.htm

    2. Script Task: using Excel Object open file and save it in Excel Workbook to a new filename

    3. Data Flow: process the newly created Excel Workbook per usual methods

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply