Optimising a Distributed Query

  • Hi Guys,

    I'm writing a hospital departmental database. I want to create a UDF that calculates a patient's age. Patient Demographics are stored on another server, and I want to retrieve the patient's DOB to do the calculation.

    At the moment the code looks like this :

    CREATE FUNCTION dbo.GetDOB(@PID as varchar(12))

    RETURNS datetime AS

    BEGIN

    DECLARE @DOB as datetime;

    SELECT @DOB = DOB FROM <serverName>.Demographics.dbo.Demographics WHERE Patient_ID = @PID

    RETURN @DOB

    END

    (Patient_ID is an indexed varchar(12))

    Where <servername> is a configured link server. The problem is the query is taking 7 1/2 minutes to execute per patient! If I make a remote call to a stored procedure its almost instantaneous, but this returns the whole patient record, and I've had difficulties extracting the DOB (using Openrowset/OpenQuery).

    I've set the collation option on the linked server configuration page and thats knocked about 15 seconds off. I am reluctant to write an sp on the Demographics server to return the DOB. If we could get this working using purely T-SQL it would be of great benefit.

    Any ideas?

     

     

     

  • Why implement it as a function?

    What happens if you just do

    SELECT @DOB = DOB FROM <serverName>.Demographics.dbo.Demographics WHERE Patient_ID = @PID

    ..?

    /Kenneth

  • I'm basing the time calculation running the query through query analyser without the function wrapping, and using a literal Patient ID

    I.e.

    Select DOB from Birmb91s.Demographics.dbo.Demographics Where Patient_ID='123456'

    And this is about 6m45s since I set the collation option on.

    7m30 without.

     

  • It sounds like the entire remote table is fetched across the wire before the search is done..?

    I don't do much linked servers, so I'm not sure what causes this, I only know it does happen sometimes.. What kind of difficulties did you ahve with openquery/openrowset? I assume those went quicker, thought?

    /Kenneth

  • OpenRowset and OpenQuery were proving stressful. Because I was passing in an input variable into the query I had to throw an EXEC round the command, and then the whole thing turned into quotes city.

    OpenQuery doesn't accept variables. I think the major problem was getting TSQL to execute the query and return the DOB to a variable.

    I'm surprised Microsoft didn't provide more support for distributed Querying, as it would make more sense to retrieve data from other sources than having to keep a copy locally.

    I found your point interesting regarding returning the whole table. I've been looking into the FIRSTROWFAST/HINT INDEX commands, but T-SQL throws a fit about using them on distributed queries. One of my colleagues thought that the delay was down to the Server not using the Indexs for the query.

    What do you think?

     

     

  • It does seem that the entire remote table is being transfered in order to pick the one row.

    What do you mean by setting the "collation option". Did you turn on the "Collation Compatible" or "Use Remote Collate" option?

    Do you get the same problem if you execute the select statement in QA outside of the function with a specific Patient_ID?

    I wonder what collation is used when variables are involved.

  • If the remote stored procedure call is faster, how about a stored procedure that returns the DOB in an output parameter?

    I have also seen that slow linked server response is often because the entire table is returned instead of the row you're looking for.

  • Have a look at the join hint "REMOTE".  If the cause of the delay is because the query is dragging all the data from the remote server and then applying the WHERE condition, this hint might help.

  • Happycat59,

    I've set the collation cimpatible option. It was suggested in a microsoft "Optimising Distributed queries" article. Should I select "Use Remote Collation" too? I have tested the basic query inside Query Analyser without variables and its 6m30s.

    Ahhhh.. I've just set use Remote Collation and the q

  • The REMOTE hint has nothing to do with collations.  It tells the query optimiser to let the remote server process as much of the query as it can...this should mean that the lookup of the one record you are after will be done remotely instead of having all of the data sent to the local server and then the local server finds the one record.

    You should be able to verify this by monitoring the network traffic between the two servers.  I suspect that at present you will have one heck of a lot of traffic happening.

  • Oops.. looks like my message got chopped.

    Turning on Use Remote Collation did the trick. Initially it didn't, but I set it when I had query analyser open, so its possible it was using the settings it had loaded. Query runs in 2 secs, and our department are now dead chuffed with UDFs!

    Thanks HappyCat59 and the other contributors to this thread.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply