What is the best way to aggregate data from remote servers?

  • Hi,

    I'm building a sharepoint where the customer list is obtained from BDC, which needs a data source that collects data from six different databases over several sites. Some of these are connected by very slow WAN links, or have very old servers. Politics are involved, so all I can do is query the databases using a read-only account, from my own server.

    The fastest way I've been able to do it is using a SP to grab each customer ID, name and postcode from a remote table using OPENQUERY, then store the results in a local table; repeat for each database, and then finally do some clever reformatting to generate a table which looks like this:

    [font="Courier New"]

    CUSTNAME POSTCODE TABLE1ID TABLE2ID TABLE3ID TABLE4ID TABLE5ID TABLE6ID

    CUSTOMER Limited SO123ME Customer1 CUST1 C100001

    CUSTOMER LTD SO123ME Customer1 CUST1 C100001 [/font]

    It isn't very fast, and as you can guess it's not perfect because the different databases may have the same customer but store their names differently (yes, I know this is a bad example because I could use a replace function, and in fact I do where possible, but it can't catch everything).

    At least I can use the CUSTNAME column as a key column in this setup, and users can do a partial name search to obtain a list of records that match.

    If you guys were doing this, how would you go about it?

  • If the names "look" different on different servers, then CustName cannot be your key, since it will not help you identify cust1 on server1 and cust1a on server2 as being the same customer. That's where your first challenge is (making sure you have a cross-reference ability among your data sources). If there isn't one readily available, then I'd build one (mapping the customers to their matches on different servers.)

    Once you have that - then I might roughly use your approach: use OPENQUERY to pull back FILTERED rows from each data set, pulling only the key info I need and the info required to aggregate, and then perform the final aggregation locally. If possible - I'd aggregate on the remote server FIRST, and bring back pre-aggregated data (if you can come up with a way to make the calculation work with aggregating semi-aggregated data).

    If that's still slow, then perhaps look at your slowest links, and figure out how to asynchronously get the data from them (something like replication or mirroring might work for that), so that the data is retrieved quickly, even if it's not the freshest data (you replication could fall behind, but you would still get a report out fairly quickly). Depends of course on what the report requirements are, etc...

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I'm assuming that Customer IDs aren't shared between databases. Is that correct? If so, then best-match on name is possibly all you can do. You might match against phone, e-mail, or some such, as well, if that's an option.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hi,

    What would be the advantage of pulling back filtered rowsets?

    I need to build ID Enumerator, Finder and SpecificFinder methods; surely if I'm going to aggregate the customer tables from all six systems properly then I can't do that without grabbing all records from all tables.

    Quite right about being unable to correlate the data properly; however the idea of doing it this way is that we can see how many systems have different names for the same customer and then we can just do a clean-up.

    The long term plan is for it to show only one row for each customer, but we're happy for it to show

    Customer 1 Ltd

    Customer1 ltd

    CUSTOMER ONE LIMITED

    etc while that sanity check takes place. Either way, there will still only be one unique instance of each customer name, which is why I think we'll be able to use it as the primary key.

    I have a function called Neaten which strips out white spaces, capitalises the name, changes Ltd. to LIMITED, replaces ampersands with AND etc. Getting a list of distinct customer names and matching codes was dead easy (SELECT DISTINCT....).

    Writing the pivot table was extremely tedious but not especially difficult.

  • Hi,

    I've got it working rather nicely with SPs that query the remote tables, then update local tables with the query results.

    This is blisteringly fast from a client PC but I do need drop the table every night; the next time someone or something executes the SP the table is recreated and then populated from the dynamic sql outside of core hours so Sharepoint never realises it's going on.

    Is there an efficient way to do this without dropping the table though?

    I looked at the SQL 2005 crippled alternative to MERGE and I have my reservations about using it in the SPs called by MOSS 2007.

    Not that I'm a cynic, but Sharepoint has time and again demonstrated to me, a habit of throwing its teddy out of the pram as soon as I try to do anything vaguely complex in a SP.

  • Instead of referencing the tables directly - perhaps set up a view to act as your data source in this case. This way you can build the new table while the existing one is in place, then switch the view's definition (sub MS type of speed, and then drop the table).

    That way - the SP is always referencing something valid.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Here's what I need to get just to get a list of support calls from our support system:

    create table #SupportSystem_Calls

    (

    [CallRef] varchar(8),

    [CustomerRef] varchar(16),

    [NAME] varchar(120),

    [Opened] datetime,

    [Closed] datetime,

    [Actioned] datetime,

    [PRODUCTREF] varchar(16),

    [ProductName] varchar(40),

    [VERSIONREF] varchar(16),

    [Type] varchar(32),

    [Status] varchar(8000),

    [Priority] varchar(32),

    [TITLE] varchar(250),

    [CALLER] varchar(40),

    [HANDLERTAG] varchar(8),

    [CUSTSOWNREF] varchar(16),

    [SERIALNO] varchar(40),

    [SLA] varchar(32),

    [ActionCategory] varchar(32),

    [szemail] varchar(120),

    [szphone] varchar(20),

    [szline] varchar(16),

    [szresolvedby] varchar(8),

    [SourceTable] varchar(8),

    [Customer Name] varchar(120)

    )

    INSERT into #SupportSystem_Calls

    SELECT CallRef, CustomerRef, NAME, Opened, Closed, Actioned, PRODUCTREF, ProductName, VERSIONREF, Type, Status, Priority, TITLE, CALLER,

    HANDLERTAG, CUSTSOWNREF, SERIALNO, SLA, ActionCategory, szemail, szphone, szline, szresolvedby, SourceTable, dbo.Neaten(NAME)

    AS [Customer Name]

    FROM OPENQUERY([SupportServer],

    'SELECT SupportDB.Archived_Calls.REFERENCE AS CallRef, SupportDB.Archived_Calls.CUSTOMERREF AS CustomerRef, CUSTOMERTABLE_1.NAME, DATEADD(hh,

    SupportDB.Archived_Calls.OPENHOURS, DATEADD(n, SupportDB.Archived_Calls.OPENMINUTES, SupportDB.Archived_Calls.OPENDATE))

    AS Opened, DATEADD(hh, SupportDB.Archived_Calls.CLOSEHOURS, DATEADD(n, SupportDB.Archived_Calls.CLOSEMINUTES,

    SupportDB.Archived_Calls.CLOSEDATE)) AS Closed, DATEADD(hh, SupportDB.Archived_Calls.ACTIONHOURS, DATEADD(n,

    SupportDB.Archived_Calls.ACTIONMINUTES, SupportDB.Archived_Calls.ACTIONDATE)) AS Actioned, SupportDB.Archived_Calls.PRODUCTREF, SupportDB.SUPPROD.NAME AS ProductName,

    SupportDB.Archived_Calls.VERSIONREF, SUPCATEG_2.CATEGORY AS Type, REPLACE(SUPCATEG_3.CATEGORY, ''£ '', '''') AS Status,

    SUPCATEG_4.CATEGORY AS Priority, SupportDB.Archived_Calls.TITLE, SupportDB.Archived_Calls.CALLER,

    SupportDB.Archived_Calls.HANDLERTAG, SupportDB.Archived_Calls.CUSTSOWNREF, SupportDB.Archived_Calls.SERIALNO,

    SUPCATEG_1.CATEGORY AS SLA, SUPCATEG_5.CATEGORY AS ActionCategory, SupportDB.Archived_Calls.szemail,

    SupportDB.Archived_Calls.szphone, SupportDB.Archived_Calls.szline, SupportDB.Archived_Calls.szresolvedby,

    ''Archived'' AS SourceTable

    FROM SupportDB.Archived_Calls INNER JOIN

    SupportDB.SUPCATEG AS SUPCATEG_5 ON SupportDB.Archived_Calls.ACTION = SUPCATEG_5.NO INNER JOIN

    SupportDB.SUPCATEG AS SUPCATEG_1 ON SupportDB.Archived_Calls.CATAREA = SUPCATEG_1.NO INNER JOIN

    SupportDB.SUPCATEG AS SUPCATEG_2 ON SupportDB.Archived_Calls.CATTYPE = SUPCATEG_2.NO INNER JOIN

    SupportDB.SUPCATEG AS SUPCATEG_3 ON SupportDB.Archived_Calls.CATSTATUS = SUPCATEG_3.NO INNER JOIN

    SupportDB.SUPCATEG AS SUPCATEG_4 ON SupportDB.Archived_Calls.CATPRIORITY = SUPCATEG_4.NO INNER JOIN

    SupportDB.CUSTOMERTABLE AS CUSTOMERTABLE_1 ON SupportDB.Archived_Calls.CUSTOMERREF = CUSTOMERTABLE_1.REFERENCE INNER JOIN

    SupportDB.SUPPROD ON SupportDB.Archived_Calls.PRODUCTREF = SupportDB.SUPPROD.REFERENCE ORDER BY Closed DESC')

    INSERT into #SupportSystem_Calls

    SELECT CallRef, CustomerRef, NAME, Opened, Closed, Actioned, PRODUCTREF, ProductName, VERSIONREF, Type, Status, Priority, TITLE, CALLER,

    HANDLERTAG, CUSTSOWNREF, SERIALNO, SLA, ActionCategory, szemail, szphone, szline, szresolvedby, SourceTable, dbo.Neaten(NAME)

    AS [Customer Name]

    FROM OPENQUERY([SupportServer],

    'SELECT SupportDB.Calls.REFERENCE AS CallRef, SupportDB.Calls.CUSTOMERREF AS CustomerRef, SupportDB.CUSTOMERTABLE.NAME, DATEADD(hh,

    SupportDB.Calls.OPENHOURS, DATEADD(n, SupportDB.Calls.OPENMINUTES, SupportDB.Calls.OPENDATE)) AS Opened, DATEADD(hh,

    SupportDB.Calls.CLOSEHOURS, DATEADD(n, SupportDB.Calls.CLOSEMINUTES, SupportDB.Calls.CLOSEDATE)) AS Closed, DATEADD(hh,

    SupportDB.Calls.ACTIONHOURS, DATEADD(n, SupportDB.Calls.ACTIONMINUTES, SupportDB.Calls.ACTIONDATE)) AS Actioned,

    SupportDB.Calls.PRODUCTREF,SupportDB.SUPPROD.NAME AS ProductName, SupportDB.Calls.VERSIONREF, SUPCATEG_2.CATEGORY AS Type, REPLACE(SUPCATEG_3.CATEGORY, ''£ '',

    '''') AS Status, SUPCATEG_4.CATEGORY AS Priority, SupportDB.Calls.TITLE, SupportDB.Calls.CALLER, SupportDB.Calls.HANDLERTAG,

    SupportDB.Calls.CUSTSOWNREF, SupportDB.Calls.SERIALNO, SUPCATEG_1.CATEGORY AS SLA,

    SupportDB.SUPCATEG.CATEGORY AS ActionCategory, SupportDB.Calls.szemail, SupportDB.Calls.szphone, SupportDB.Calls.szline,

    SupportDB.Calls.szresolvedby,

    ''Live'' AS SourceTable

    FROM SupportDB.Calls INNER JOIN

    SupportDB.SUPCATEG ON SupportDB.Calls.ACTION = SupportDB.SUPCATEG.NO INNER JOIN

    SupportDB.SUPCATEG AS SUPCATEG_1 ON SupportDB.Calls.CATAREA = SUPCATEG_1.NO INNER JOIN

    SupportDB.SUPCATEG AS SUPCATEG_2 ON SupportDB.Calls.CATTYPE = SUPCATEG_2.NO INNER JOIN

    SupportDB.SUPCATEG AS SUPCATEG_3 ON SupportDB.Calls.CATSTATUS = SUPCATEG_3.NO INNER JOIN

    SupportDB.SUPCATEG AS SUPCATEG_4 ON SupportDB.Calls.CATPRIORITY = SUPCATEG_4.NO INNER JOIN

    SupportDB.CUSTOMERTABLE ON SupportDB.Calls.CUSTOMERREF = SupportDB.CUSTOMERTABLE.REFERENCE INNER JOIN

    SupportDB.SUPPROD ON SupportDB.Calls.PRODUCTREF = SupportDB.SUPPROD.REFERENCE ORDER BY Actioned DESC')

    INSERT into #SupportSystem_Calls

    SELECT CallRef, CustomerRef, NAME, Opened, Closed, Actioned, PRODUCTREF, 'Unspecified' AS ProductName, VERSIONREF, Type, Status, Priority, TITLE,

    CALLER, HANDLERTAG, CUSTSOWNREF, SERIALNO, SLA, ActionCategory, szemail, szphone, szline, szresolvedby, 'Unknown' AS SourceTable,

    dbo.Neaten(NAME) AS [Customer Name]

    FROM SupportDB.Archived_Calls

    WHERE (PRODUCTREF = '')

    SELECT TOP 100 PERCENT * FROM #SupportSystem_Calls

    ORDER BY CallRef ASC

    DROP TABLE #SupportSystem_Calls

    I tried configuring a remote view to retrieve this dataset. It took 8 seconds to retrieve 3600 records, from a SQL Server instance hosted on the server below mine in the rack and both servers are connected via the same gigabit switch.

    I tried using SP and it is faster - 3 seconds to retrieve the dataset. But considering the TINY size of the dataset, it's still too slow.

    About the only way that does seem to return the dataset without latency, is if I create a local table and then use SP to retrieve data from it; this reduces the lag considerably.

    But then it doesn't update itself, so I need to be able to run an upsert within the SP, for instance like this:

    INSERT into #HOTHTMP

    SELECT * FROM dbo.[Recent HOTH Transactions]

    UPDATE SupportCalls

    SET [Opened] = #TMP.[Opened],

    [Closed] = #TMP.[Closed],

    [Actioned] = #TMP.[Actioned],

    [Status] = #TMP.[Status],

    [TITLE] = #TMP.[TITLE],

    [SLA] = #TMP.[SLA],

    [ActionCategory] = #TMP.[ActionCategory],

    [Customer Name] = #TMP.[Customer Name],

    [HANDLERTAG] = #TMP.[HANDLERTAG],

    [szresolvedby] = #TMP.[szresolvedby],

    [SourceTable] = #TMP.[SourceTable],

    [szline] = #TMP.[szline]

    FROM #TMP WHERE SupportCalls.CallRef = #TMP.CallRef

    INSERT into SupportCalls

    SELECT * FROM #TMP WHERE CallRef NOT IN(SELECT DISTINCT CallRef FROM SupportCalls)

    SELECT * FROM SupportCalls

    DROP TABLE #HOTHTMP

    When I restricted the contents of the #TMP table so it only showed the last 50 modified records from the live table and the last 5 updated records from the archived table, I cut the execution time of the UPSERT by 80% and it still ensures that the Business Data List in Sharepoint is up to date.

    However, when I execute the final SELECT statement, every record that wasn't updated shows NULL in the twelve columns used by the update routine.

    If I can get the UPSERT to work and then return the complete updated recordset in the final statement of the SP, then I would expect that to be the most efficient solution for Sharepoint.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply