JUST FED UP with SSIS. Simple copy of 800 tables to another database...

  • I'm new to SSIS but not SQL. Still, this has been a mess of monumental proportions. Between wrestling with VS 2005, SSMS, SSIS, Packages, Wizards, Protection Levels, configurations (still haven't found a page telling me what they are and where to store them) and the like it's been a confusing and infuriating MESS.

    For now I'm using a stored procedure to drop tables at my destination, then INSERT INTO from the source. I'm sure it's not the fastest, but it's the most controllable / configurable.

    Sorry for the shouting. I'm at my wits end.

    I just want to copy 800 tables 'as is' from one database to another on the same server. Here's what I tried:

    - Use remote desktop to login to the SQL Server

    - Used the 'Export Data' Wizard in SQL to create an SSIS package to export 800 + tables from source database to destination database -- I HAD TO MANUALLY CHECK THEM ALL OFF. I also made sure I excluded views, as this stupid thing tries to create views as tables on the destination!!!

    - Set mappings, telling the package to drop tables at the destination before copying

    - Specified 'use server roles' as my ProtectionLevel

    - Told it to run as SERVERNAME\Administrator, the account I was logged in as, the account which has sa-level privileges on SQL

    - Saved the package

    - Ran it manually on the server - Succeeds sometimes, fails other times. "Cannot insert duplicate key row in object 'dbo.abccode' with unique index 'abcidx'." This, even though I told it to drop tables.

    - If it ran manually I'd schedule a job - IT WOULD ALWAYS FAIL.

    I also get other errors along the lines of "could not decrypt XML password" although the error log has scrolled that out of sight now so I don't have the exact terminology.

    I tried building SSIS packages in Visual Studio; the wizard creates a package that runs manually but won't run as a job; I created it while working on the server, even. Tried various protectionlevels, even passwords. There were times when a small package (only 20 tables) would succeed, but the exact same type of package failed with 800.

    I tried writing a stored procedure that uses BULK INSERT but xp_cmdshell is turned off on the server so that option's out as well.

    Is there a better way? Can someone point me to a web page with simple -- or even READABLE -- explanation of how to do this seemingly simple task? If so I'd be most grateful, and so would my cardiologist.

    Thanks in advance...

  • barry.a.seymour (5/4/2011)

    ...

    I tried writing a stored procedure that uses BULK INSERT but xp_cmdshell is turned off on the server so that option's out as well

    ...

    BULK INSERT is an SQL command, so XP_CMDSHELL is not required to be able to use it.

  • Sorry, BULK INSERT was the wrong term. It's been over a week, so I forget. I was trying to write a proc to export the data to flat file from the source, then truncate the destination tables and import from file. If you think that's a good way to go, I'm all ears.

    I'm doing this as a data warehousing project; with a few exceptions it's sufficient that tables be updated nightly.

  • Are you doing a full refresh of every table - every time? Are there any additional transformations on the data being performed?

    If this is a straight copy of the production system, I would just forget about any type of integration package or ETL process and just restore a copy of the database.

    If the goal is to be able to get to the point where you can do transformations, incremental loads of data (ETL process) - then I would recommend starting out a bit slower. Instead of trying to get all tables now, focus on getting a couple tables working and understanding how that is setup and managed.

    Once you have something working for a couple of tables, then you can focus on how to get the package installed on the server - passwords and configuration settings working, etc... Then, it is a much simpler process to add more tables and expand upon the project until you have everything completed.

    To start off - here is what I would recommend you do:

    1) Create a new SSIS Project

    2) On the control flow page - add a new data flow

    3) In that data flow, add an OLEDB Source and an OLEDB destination - connect the source to the destination with a green line.

    4) In the OLEDB Source, define the connection to your production system and specify which table you want to extract (or the SQL command you want to use).

    5) Then, in the destination task - specify the table you want to load the data into. If it doesn't exist, create a new table and use that one. Note: specify the table fast load option as that usually works better.

    At this point, you need to be aware that the OLEDB destination task does not allow you to specify an option to truncate the table. Nor does it allow you to specify to only load new rows, update existing, etc... If this is a full refresh, then go back to the control flow page - add a SQL Execute Task. Specify the destination connection and put the code in there to truncate the table. Connect that task with the control flow task (on success, green line) so the truncate runs before you try to load data.

    At this point, I would test the package and fix any errors. Then, I would add a couple more tables using the same process until I have a good sample and work on deploying the package.

    Again, I would recommend taking on any new development work - especially with a product/process I am not familiar with in small steps until I had a better understanding of that tool/product/process.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey:

    Thanks for the intelligent and well-laid out reply. I've always swum in the shallow end of the SQL pool but now find myself at the deep end! I guess I'll have to slow down to speed up.

    Here's my situation: I have to refresh table data nightly; however I need to leave everything else (procedures, views, stored procedures) intact. Optimally I'd have some sort of routine that only transmits inserts, updates and deletes to the tables (transactional replication?). Note also we're talking more than EIGHT HUNDRED tables here, so sadly your table-by-table approach may not be practical for me.

    I've done UIs in Visual Studio in the past but I am having real trouble figuring out how SSIS packages are edited. When I create a copy database package using the wizard, then open in VS2005 I see the ForEach loop container but can't figure out how it determines which tables to copy. I need the ability to control that.

    At this point I'm running my stored procedure that drops / reloads each table, but it's been running for two hours now - I suspect it won't be good to use this method going forward.

    My next avenue of attack is based on the prior comment in this thread; I've figured out how to use BCP to export the data; my new procedure will get each desired table name using a cursor, then import the data to the destination database (with a truncate first, of course.) I also have a means to determine if table has been updated at all in the past xx days - I can use that to filter out tables I need not update. I'm hoping that'll be faster, avoid transaction logging, be more programmable, etc.

    Any further thoughts from you are more than welcome; thanks for taking the time to help.

    Barry

  • barry.a.seymour (5/4/2011)


    Here's my situation: I have to refresh table data nightly; however I need to leave everything else (procedures, views, stored procedures) intact.

    I personally would look into a transactional replication for this. It sounds like it would be far less painful to administrate and it will still do what you need.

    When I create a copy database package using the wizard, then open in VS2005 I see the ForEach loop container but can't figure out how it determines which tables to copy. I need the ability to control that.

    You can look in the object on the second tab down on the left, and it will have the description of what it's looping on and the like. If you screenshot it and attach the jpg, we can probably help you further with that.

    At this point I'm running my stored procedure that drops / reloads each table, but it's been running for two hours now - I suspect it won't be good to use this method going forward.

    Ow, no, that sounds very painful, especially without bulk methods.

    My next avenue of attack is based on the prior comment in this thread; I've figured out how to use BCP to export the data; my new procedure will get each desired table name using a cursor, then import the data to the destination database (with a truncate first, of course.) I also have a means to determine if table has been updated at all in the past xx days - I can use that to filter out tables I need not update. I'm hoping that'll be faster, avoid transaction logging, be more programmable, etc.

    I would seriously look into replication instead of ETL/SSIS here. I'm all for SSIS, but to just copy the tables wholesale continuously is usually a different tool. If you go so far as to use transactional replication, it'll push the data over incrementally during the day too, saving you an overnight load process that could be very heavy, especially for large tables or data that doesn't change often. Creating an 800 table snapshot can be very intensive.

    EDIT: The copy tables wizard is supposed to be a one-off kind of scenario where you need to refresh secondary databases when you make schema changes and the like to the primary database. I wouldn't use it for something like this.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig: Thanks for the input. I like the idea of replication, but I don't want to do it continuously during the day. The goal here is to offload as much from the production database during the day as possible, including generating reports. I want to update the data warehouse at night, then report off of it during the day.

    That said, it sounds like a nightly replication run might do the trick, and might involve transferring smaller amounts of data -- but is replication designed to work that way? Would the additional CPU load required to compare/evaluate all rows offset the savings in the amount of raw data being copied?

    Thanks in advance...

  • I have to agree with Craig here, well - sort of. SSIS is not going to be the right tool, but I also don't think replication is what you want either.

    Really depends on how this other system is going to be used. If this is going to be used as a reporting system, then yes - replication would be a much better option. If it is going to be used for development work - then probably not.

    Now, if SSIS were your option - then the number of tables shouldn't matter. If you have 800 tables, then you have 800 tables you need to build into the process. However, in most cases - all 800 tables wouldn't be refreshed on a regular basis (or need to be) either. Especially when you are looking at system configuration tables, lookup tables, application/user configuration, etc...

    Those tables would not need to be refreshed - just initially loaded since they control how the application/system works. You would need to identify the key tables that need to be refreshed and build your process for those tables.

    Is that more work, of course it is - but if that is what you have to do to get the job done right, then that is what you have to do.

    FWIW - if you compare using SSIS to any other ETL tool (Informatica for one), you would find that SSIS is really not any more difficult than those tools and in a lot of ways much simpler. And, a lot less expensive 😉

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • It's a reporting system. Cracking the books now, figuring out if I can do replication in a date window, ie. 'only replicate between 9pm and 5am'...

  • bimplebean (5/4/2011)


    Craig: Thanks for the input. I like the idea of replication, but I don't want to do it continuously during the day. The goal here is to offload as much from the production database during the day as possible, including generating reports. I want to update the data warehouse at night, then report off of it during the day.

    That said, it sounds like a nightly replication run might do the trick, and might involve transferring smaller amounts of data -- but is replication designed to work that way? Would the additional CPU load required to compare/evaluate all rows offset the savings in the amount of raw data being copied?

    Thanks in advance...

    Not sure I understand what you are trying to accomplish here. Are you looking to build a reporting system with the exact same structure as production? Or, are you building a data warehouse? Or, are you looking at using this copy of production to populate the data warehouse?

    If you are looking at building a copy of production that can be used to populate a data warehouse, run reports, etc... then here is how I have that setup now:

    1) Nightly restore of production databases to the report server

    2) When restores are complete, start the ETL process to read the restored database and populate the data warehouse.

    The copies of the database are then set to read only and used for reporting during business hours. The data warehouse has what it needs after the ETL process completes.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I want to do a nightly backup of database tables to another database which I will then use to support reports. Just extract and load, no translate. Right now all reporting is off the production database, so we've decided the best way forward is to get a copy of the data into a different database, point our reports there (with a few exceptions), then update the data nightly. Same schema, but different database.

    I tried creating an initial publication snapshot earlier but it brought the system to its knees. Waiting until 5pm to start that...

  • By the way, THANKS, all you folks! You've been most helpful, if for no other reason than I have some new ideas and I don't feel so ALL ALONE! 😛

  • Another choice, especially if they're all coming from the same source database, would be log shipping.

    You CAN tell transactional to wait a while, but depending on volume you're going to end up with the log from heck and I really would avoid that.

    Log shipping would allow you to push your logs over to the reporting server, then have it process the logs at night.

    An older article comparing Logshipping and replication:

    http://qa.sqlservercentral.com/articles/Replication/logshippingvsreplication/1399/

    Otherwise, you're going to want to review each table uniquely to determine which ones you can easily get delta information from and which ones you'll just basically truncate-reload... and yes, it's going to be a monster of a package. I'd recommend breaking that up to multiple packages just for maintainability.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • bimplebean (5/4/2011)


    I want to do a nightly backup of database tables to another database which I will then use to support reports. Just extract and load, no translate. Right now all reporting is off the production database, so we've decided the best way forward is to get a copy of the data into a different database, point our reports there (with a few exceptions), then update the data nightly. Same schema, but different database.

    I tried creating an initial publication snapshot earlier but it brought the system to its knees. Waiting until 5pm to start that...

    That right there is key. Have you considered putting all your procs and the like that you need to support this with into a 'coding' database, and simply backup/restoring the primary to the new server, and having all the 'coding' items look into the restored DB? That will keep you from having to do serious headbanging while saving all the reporting level work you need.

    You usually end up going through hoops when you need to deal with schema level changes for reporting like extra indexing and the like. If you're not looking to support table level modifications, this may be your 'easy out' for now until you can get the rest of the necessary maintenance into place.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Another option (requires Enterprise Edition) would be database mirroring. Once you have that set up, you would create a database snapshot of the mirrored database and run your reports off of that database.

    You would need a separate database for the report code that looked to the snapshot database.

    There are limitations with this, specifically - if your database is fairly large and you have a lot of transactions you might need to drop/recreate the snapshot more frequently. There is an issue with the sparse file size that can cause issues when it gets too large.

    One advantage to database snapshots is that they are by definition read only. Since they are read only you can get better performance because SQL Server will not take out any locks, not even shared read locks since they are not necessary.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply