data pump into MySQL

  • I have massive amounts of data that I need to put from 2008 into MySQL every hour. I would like to invoke MySQL's LOAD Data functionality (somewhat equivalent to BCP) but LOAD Data apparently only takes data from a flat file. I've read about MySQL ODBC and ADO.NET drivers that do this transparently, but every one I've tried, the load turns out to be a row by row insert. I need the ability to take data either from a 2008 table or more preferably a query and fast load into a MySQL table, directly.

    Pentaho's Kettle product has the ability to pull data from SQL Server and transparently do a fast load into MySQL, but this is pull technology, and I would rather invoke my ETL process from SSIS.

    Anyone have any experience that can chime in on this?

  • If you can use third-party solutions, check the commercial CozyRoc ODBC Destination component. It is part of the SSIS+ library. The component has been tested with MySQL and it provides 30x performance improvement compared to the regular insert.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • cmoy (3/23/2012)


    I have massive amounts of data that I need to put from 2008 into MySQL every hour. I would like to invoke MySQL's LOAD Data functionality (somewhat equivalent to BCP) but LOAD Data apparently only takes data from a flat file. I've read about MySQL ODBC and ADO.NET drivers that do this transparently, but every one I've tried, the load turns out to be a row by row insert. I need the ability to take data either from a 2008 table or more preferably a query and fast load into a MySQL table, directly.

    Pentaho's Kettle product has the ability to pull data from SQL Server and transparently do a fast load into MySQL, but this is pull technology, and I would rather invoke my ETL process from SSIS.

    Anyone have any experience that can chime in on this?

    Did you ever arrive at a solution?

    For SSIS 2005 you would write a Script Component in your Data Flow that worked with one of the native MySQL drivers to insert rows directly, maybe in batches to improve performance simulating a bulk load effect. An OLE DB driver that would work with the OLE DB Destination would be something to explore as well.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I actually did do that for one of my packages, but that was easy because it only involved one table. I would have to hate to write a separate script component for each table if it involved dozens of tables, because if that was the case, I might as well assign this to the C# developers to generate a custom app. There must be an easier way without having to write custom code.

  • With SSIS 2008/2012 it is easy using a builtin component, but with 2005, sadly, it is not. Check Pragmatic Works for a third party component. I think they have one that could help you, but I haven't used it and in general look to builtin solutions before resorting to buying something. As I said, you could also look for an OLE DB driver that would allow you to use the builtin OLE DB Destination component.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (6/29/2012)


    With SSIS 2008/2012 it is easy using a builtin component, but with 2005, sadly, it is not. Check Pragmatic Works for a third party component. I think they have one that could help you, but I haven't used it and in general look to builtin solutions before resorting to buying something. As I said, you could also look for an OLE DB driver that would allow you to use the builtin OLE DB Destination component.

    Actually there is no component from Pragmatic Works that could help you. It is wise to use built-in solutions if possible, but it is even wiser to use third-party components when it can help simplify your projects. There is no OLEDB driver or component, which gives you 30x performance improvement. And for loading massive amounts of data that is what matters most.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • Here is an example for you. It requires no third party components and modification of only a few lines of VB.net code to get working with your chosen ODBC connection, which can be configured to point to MySQL:

    Creating an ODBC Destination with the Script Component

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (6/29/2012)


    Here is an example for you. It requires no third party components and modification of only a few lines of VB.net code to get working with your chosen ODBC connection, which can be configured to point to MySQL:

    Creating an ODBC Destination with the Script Component

    This doesn't support bulk-load and requires modification every time you add/remove columns. Also it is not reusable and from maintenance stand point of view, it is terrible.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • CozyRoc (6/29/2012)


    opc.three (6/29/2012)


    Here is an example for you. It requires no third party components and modification of only a few lines of VB.net code to get working with your chosen ODBC connection, which can be configured to point to MySQL:

    Creating an ODBC Destination with the Script Component

    This doesn't support bulk-load and requires modification every time you add/remove columns. Also it is not reusable and from maintenance stand point of view, it is terrible.

    I disagree. It is included in the product so does not require a cash outlay. Also, does not require any installations on servers or workstations and can be easily modified to insert data in batches as opposed to one row at a time. As for maintenance its an insert statement, what could be easier to add columns to? I would say its a great option, quite the opposite of terrible.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I disagree. It is included in the product so does not require a cash outlay.

    The solution you present requires extra cash outlay in the contractor pocket who is doing the coding. Another extra cash outlay is required to maintain this in-genius piece of art 😉

    Also, does not require any installations on servers or workstations and can be easily modified to insert data in batches as opposed to one row at a time.

    Another over-rated comment, without much meaning. Installation of properly done third-party library takes 2 mins. Another 5 mins are required to license it. So the whole process will take less than 10 mins. The stacking of multiple insert statements will never bring too much performance improvement. Probably around 10-15% at most. Try it for yourself. The biggest issue is the SQL parser, which has to parse your statement for each inserted record. This problem doesn't exist when using the ODBC bulk-load API.

    As for maintenance its an insert statement, what could be easier to add columns to? I would say its a great option, quite the opposite of terrible.

    The easier option is called a third-party component. No coding required. Only point-and-click.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • CozyRoc (6/29/2012)


    I disagree. It is included in the product so does not require a cash outlay.

    The solution you present requires extra cash outlay in the contractor pocket who is doing the coding. Another extra cash outlay is required to maintain this in-genius piece of art 😉

    Also, does not require any installations on servers or workstations and can be easily modified to insert data in batches as opposed to one row at a time.

    Another over-rated comment, without much meaning. Installation of properly done third-party library takes 2 mins. Another 5 mins are required to license it. So the whole process will take less than 10 mins. The stacking of multiple insert statements will never bring too much performance improvement. Probably around 10-15% at most. Try it for yourself. The biggest issue is the SQL parser, which has to parse your statement for each inserted record. This problem doesn't exist when using the ODBC bulk-load API.

    As for maintenance its an insert statement, what could be easier to add columns to? I would say its a great option, quite the opposite of terrible.

    The easier option is called a third-party component. No coding required. Only point-and-click.

    I guess we'll have to just agree to disagree, have a good night 😀

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • CozyRoc (6/29/2012)


    Another over-rated comment, without much meaning. Installation of properly done third-party library takes 2 mins. Another 5 mins are required to license it. So the whole process will take less than 10 mins. The stacking of multiple insert statements will never bring too much performance improvement. Probably around 10-15% at most. Try it for yourself. The biggest issue is the SQL parser, which has to parse your statement for each inserted record. This problem doesn't exist when using the ODBC bulk-load API.

    Getting back to the OPs requirement, which ODBC bulk-load API are you referring to with MySQL?

    Is the bulk load API you're leveraging in MySQL the one where you pass an INSERT statement that looks like this?

    insert into table (col1, col2) values ('val1.1','val1.2'),('val2.1','val2.2');

    Because if it is, which I think it must be, then talk about a simple thing to implement in a Script Component. If this is not the story, please enlighten me.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • The bulk-load API is not related to MySQL. It is ODBC-based API, which uses the low-level ODBC handles and functions. If you are C++ programmer, I'm talking about similar type of handles which were used to do development for Windows back in the old days.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • CozyRoc (7/1/2012)


    The bulk-load API is not related to MySQL. It is ODBC-based API, which uses the low-level ODBC handles and functions. If you are C++ programmer, I'm talking about similar type of handles which were used to do development for Windows back in the old days.

    As I understand it that only applies if the ODBC driver implements the interface defined by the standard. Not all ODBC drivers do.

    Knowing enough about MySQL APIs to be dangerous I would be quite interested to know how you're outperforming the methods mentioned on this thread by 30x. Do you have any documented studies or other references to support your claim of 30x improved performance over these methods mentioned?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Knowing enough about MySQL APIs to be dangerous I would be quite interested to know how you're outperforming the methods mentioned on this thread by 30x. Do you have any documented studies or other references to support your claim of 30x improved performance over these methods mentioned?

    We do not claim something, if we haven't tested it first. We have reputation and we stand behind it. We have tested and our customers have tested and it is approximately 30x times faster compared to the regular insert. You can review all other databases we have tested against on the component documentation page here.

    Feel free to do tests on your own.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply