Combine 12 different datasets from 12 different Datasources

  • I have biggie here I have a report to create that it uses 12 different datasets and datasources.

    The main reason for this is because the db is very old. The 12 different Datasources are snapshots for end of month data. The report containd a Date parameter and depending on the date the code in Report builder open the right snapshots.Now my dilemma is this.

    Each dataset is sorted by the year but you can have years in dataset 4 that r not in dataset 1 or dataset contains a year that is not in dataset 100 etc etc.

    All i want to do is grab all the info for all 12 datasets and combine them into one.I cannot use lookup,multilookup because it only returns years that are in dataset 1 that match.

    exaple

    Dataset 1

    Year

    1999

    2000

    2003

    2005

    2007

    Dataset 2

    2008

    2009

    2003

    dataset 3

    1985

    1986

    1987

    1988

    all i want to do is this with all the datasets

    Year

    1985

    1986

    1987

    1988

    1999

    2000

    2003

    2005

    2007

    2008

    2009

    I'm really stuck on this one.lol

    Thank you,

    JD

  • If the columns are all the same this sounds like an "ETL" process in the making.

    Use SSIS to combine the data from the results into a "Staging table" which could become a DataMart (or Data Warehouse).

    gsc_dba

  • If they all have the same structure, I would use UNION/UNION ALL to make them a single dataset.

  • Ok I already tried the ETL and it doesn't work and the main reason is that all 12 datasets have its own datasource and the datasource changes depending on your parameter:

    Let me explain:

    I have 35 databases and all contains a standard name convention.

    Databasename-month-lastdayofmonth-year so there are 12 databases for each year. now multiply that for 25 years of history. So if I pick a datasource, the code grabs the 12 datasources fr the year. If date of datasource is less than the parametized date the load the db in one of the 12 datasources else load the dummy database. Then it calculates the beginning and ending of the month of the datasource using the datasource date and then to get the data for that month. The database is old and i'm not wasting time creating data warehousing because there will be new version created in sql server 2008 r2 of the database. The datasets runs in a report i've created and the totsl project is 50gb big with the 12 datasources attached to its own dataset. Once the Parametized Date changes all 12 datasources refresh with new datbase names.

    I know that If I could make the Datasources to change dinamically also i could create a merged dataset. Everything is done using reporting services sql server 2008 r2

    😀

  • peseta30 (5/8/2012)


    Ok I already tried the ETL and it doesn't work and the main reason is that all 12 datasets have its own datasource and the datasource changes depending on your parameter:

    Let me explain:

    I have 35 databases and all contains a standard name convention.

    Databasename-month-lastdayofmonth-year so there are 12 databases for each year. now multiply that for 25 years of history. So if I pick a datasource, the code grabs the 12 datasources fr the year. If date of datasource is less than the parametized date the load the db in one of the 12 datasources else load the dummy database. Then it calculates the beginning and ending of the month of the datasource using the datasource date and then to get the data for that month. The database is old and i'm not wasting time creating data warehousing because there will be new version created in sql server 2008 r2 of the database. The datasets runs in a report i've created and the totsl project is 50gb big with the 12 datasources attached to its own dataset. Once the Parametized Date changes all 12 datasources refresh with new datbase names.

    I know that If I could make the Datasources to change dinamically also i could create a merged dataset. Everything is done using reporting services sql server 2008 r2

    😀

    Are the table structures the same, can you do what Daniel suggested and UNION ALL the results into a global temp table or staging table?

    gsc_dba

  • all the tables have the same structures. But thats not the problem the database name for all 12 datasources change when the date parameter changes Therefore i can't use Union because teach datasource is a monthly snapshot. SO 12 snapshots are created every year, we have 25 + years of history 12 snapshots x 25 years thats 300 databases at 5gb each and each database have its own name like Snapshot-01-31-1999, Snapshot-02-29-1999 etc etc. Each snapshot contains tables with the same structure but what makes them different is that each snapshot is frozen data that can't be modify in any way.

    When a date is changed in the date parameter all 12 datasources swicth to the right year using date calculations. So for example if I enter 09-30-2005 the code in each datasource starts switching snapshots by calculation the beginning of the year first then grab that month and calculate the last day for that month within the same year of the date parameter. Once done it compares the last day of the month and if the date is higher than the date parameter it loads a dummy snapshot that is completely blank if is lower or equal then loads the snapshot.

    the best way to explain it is to compare it to a 5 disc cd player. 5 cds, 5 different artist, with 8 different songs.

    The date parameter will be you Disc #/or your artist once that is entered the machine looks for the disc and play 8 different songs for that disc

    The songs are the snapshots why because all songs have different names then each song is loaded one at a time.

    But each time the Disc # changes a new set of songs /snapshot are loaded one by one and that is what my report do in order to work with 50 -70gb of data all a time. :w00t: I did the loading part but i can't find no info on how to combine all 12 datasets into one when all the information is sent from the server and stored in each dataset separate. This is a huge Plus of so many things that can be done in report services but maybe microsoft don't want us to push report services to its limits and this is done in SQL Server 2008 R2 using Report Builder 3.0. :hehe:

  • It could be that I have been a little short on sleep the last few days, but your explanation still doesn't tell me why you can't use UNION.

    However, your structure sounds like a maintenance and reporting nightmare. Regardless of the solution to your current problem I believe you would be better served by a single database with a well designed partition structure.

  • I believe a combination of the advice given is the "best practise" root to go but appreciate you are looking for a "simpler/quick" solution.

    Based on the volume of data and processes you have described you need to have an ETL process, combining the data into a single source.

    You could then implement a partition scheme as Daniel mentioned - but it does need to be in a single database.

    gsc_dba

  • lol SSCrazy. I'll explain why I can't use Union. One simple reason. Union is used within the same Database., U can do multiple unions but is not good if I have to use it in a store Procedure. As I mentioned before The name of every database that could be a potential load changes when the user change the date., meaning the union is worthless in this case. why? because of the numerous different combinations of the database names that could be loaded. Don't get me wrong the report only takes 40 seconds to run.

    ok How each datasource loads the database:

    first grabs the date and takes the year, then finds the first month o that year and then the last day of that month. After that it combines the result and compare if the date found is greater than the date in the parameter if is lower then it looks in the server for the database and loads the database if is greater then it loads a dummy.

    Now while the first datasource is doing its job running the sql query the Datasource2 runs and finds the second month of the year and do the same thing

    and so on until the last month is loaded. ** In the background all the queries are running whle each datasource is getting loaded.

    Here is the Datasource Load vb code that call to customized vb codes:

    =iif(Code.GetLastDayInMonth(Dateadd(DateInterval.Month,+0,DateAdd(DateInterval.Second, 0, DateAdd(DateInterval.Year, DateDiff(DateInterval.Year, Date.MinValue, cdate(Parameters!Valdate.Value)) , Date.MinValue)))) <= Code.GetLastDayInMonth(CDate(Parameters!Valdate.Value)),Code.GETSNAP("SNAP-" & CStr(format$(Code.GetLastDayInMonth(Dateadd(DateInterval.Month,+0,DateAdd(DateInterval.Second, 0, DateAdd(DateInterval.Year, DateDiff(DateInterval.Year, Date.MinValue, cdate(Parameters!Valdate.Value)) , Date.MinValue)))),"MM-dd-yyyy"))),Code.GETSNAP("Dummy"))

    and here is the customized one that gets called:

    Function GetLastDayInMonth(ByVal dtDate As Date) As Date

    'example for #2009-02-20# we want to get the last day in the month 02,

    ' (ie. date for last day in Feb)

    Return DateAdd(DateInterval.Day, _

    (Day(DateAdd(DateInterval.Month, 1, dtDate))) * -1, _

    DateAdd(DateInterval.Month, 1, dtDate))

    End Function

    Function GETSNAP(ByVal s As String) As String

    Dim strBuilder As New System.Text.StringBuilder("Data Source=Server;Initial Catalog=dbname")

    strBuilder.Replace("dbname", s.ToString)

    Return strBuilder.ToString()

    end Function

    I hope u understand why I cannot use it because of the different database not tables that get switch everytime a diff date is entered.

    If you think this is a night mare, try to do coding for Walmart or The airlines where you have to deal with multiple operating systems, connect all different types of RDBMS from Teradata, SQl Server, Oracle, Pervasive etc and create reports trying to gather all the info from every single store or location into one place.

    Now Why I can't use ETL.. Because as you mentioned It needs to be in 1 database but then i can't run it because i will have a lot of users trying to get access to the etl all at once. But you gave me an idea maybe if I use a Temptable using the username as temptable could be good. I will give that a try.

    SSCrazy even tho the run takes 40 seconds to run it is quite fast for the amount of data that is getting filtered also, challenging jobs like this one are the ones that really test your skills and makes you be more and more creative and and a lot better at solving problems. 🙂

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply