SSIS: Select Query on multiple databases.

  • Hi,

    I am new in SSIS and seeking help to fatch data from tables on multiple DB's. I want to do this in SSIS package.Can any one suggest that how to apply Join operations on multiple databases in task in SSIS.

    The scenario is like this:

    we have 2 databases like db1 and db2 where we have 2 tables like db1..tbl1 and db2..tbl2 on diffrent server. I need to apply join on db1..tbl1 and db2..tbl2.

    Something like this

    Select tbl1.Col1, tbl2.Col1

    From db1..tbl1 AS tbl1 INNER JOIN db2..tbl2 AS tbl2

    WHERE tbl1.Col2 = tbl2.col2

    thanks for help in advance.

  • If the databases are on the same instance, then use four-part naming to refer to the tables that you need in your join.

    If the dbs are on different servers, then set up a linked server and then use the four-part name to refer to the object.

    If you can't make changes to the server, then you could always setup seperated data-sources for each databsae in SSIS and join the data as part of the SSIS transformation. This way would not be as quick as using four-part names and joins..

  • Hi steveb,

    Thanks for your quick responce. The DB's (DB1 and DB2) are not on the same instance also I don't have permission for creating linked server.

    I can fire only select query on DB2.

  • You can use a merge join transformation in a data flow. Both inputs to the merge join need to be sorted.

    Create connection managers for DB1 and DB2 and create data sources for both. If you use a select statement with an order by in each data source you can go to the data source advanced editor and identify the columns and their sort order. This is usually more efficient than using the SSIS sort task.

    Then go back to the merge join transformation and identify the columns to join on and the desired output columns, and map the output to your destination.

    This article uses flat files as an example but the concept is the same:

    http://www.mssqltips.com/tip.asp?tip=1322

  • Is there other way other than use merge join ? i don't know what's wrong in my SP or SSIS package. when i use merge join component (with multiple instance, in some case i use more than 2 linked server objects) and it takes too long. the data it self approximately 10million records each month or may be more especially in 3rd and 4th quarter, the process is to generate monthly reports.

    is it normal for 10million rows, takes 7-8 hours processed ? any sugesstion ?

    ilustration :

    select a.colx, b.coly, c.colz

    from db1.dbo.tbl1 A

    inner join linked1.db2.dbo.tbl2 B on a.id = b.id

    left join sub linked2.db3.dbo.tbl3 C on a.idkey = c.idkey

    where substring(B.coly, 3, 3) IN (select lookupid from linked2.db4.dbo.tbllookup)

  • depending on the power of your machine, you could use the lookup transform instead of the merge join. lookup is *much* faster, but it also requires you to cache all the contents of the lookup in memory for it to perform efficiently.

    i had one system where i had three lookup transforms running at the same time, and the combined size of rows of the transforms was about 15 million records. this took up around 6GB of memory, just to give you a bit of a ballpark figure.

    in both join cases, choose to redirect row on lookup error, which for the lookup transform, is the way of saying "send the rows which don't find a match here"

    for the INNER JOIN, use the success case as your connector to the next step. for the LEFT JOIN, use the failure case as your connector to the next step.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply