Design approaches to improving a slow query

  • Hi,

    We have a query that is not performing well. Below is a simplified version of it to illustrate the problem.

    [font="Courier New"]SELECT

    B.*,

    X.XYZ

    FROM BigTable B

    JOIN (

    SELECT

    COALESCE(B3.ColA, B2.ColA) AS XYZ

    FROM BigTable2 B2

    LEFT MERGE JOIN BigTable3 B3 ON B2.Id=B3.Id

    WHERE B3.NameSearch LIKE @ClientName+'%'

    OR (B3.NameSearch IS NULL AND B2.NameSearch LIKE @ClientName + '%')

    ) AS X ON B.Id=X.Id

    WHERE

    (various filters on B and X, including X.XYZ)

    ORDER BY

    (various options on B and X.XYZ)[/font]

    Note that all tables have > 1 million rows:

    - Query is run across indexed-views, so we have denormalised the tables

    - The COALESCE is slower when the tables are directly JOINed with B (rather than via the derived table), and the COALESCE is run within the main SELECT (note the need to ORDER BY the derived column is mandatory)

    - No difference in query plan when using CROSS APPLY to 'join' X with B

    - Note that BigTable, BigTable2 and BigTable3 have been clustered, and benefit from the MERGE join

    - NameSearch is across a nonclustered index

    Ideally I want to avoid reporting tables, due to their maintenance costs. The PK on BigTable is 1:1 with both BigTable2 and BigTable3.

    Final approach we're testing is whether the COALESCE can be bypassed by using a UNION ALL within the derived table, where the tables that have been UNIONed are effectively the same recordset as X.

    Are there any other design approaches to the improving the query that seem worth pursuing?

    At present, the query plan for the above shows it is not linearly scalable, and pulls in all the records from BigTable2 and BigTable3, before joining to BigTable; OR, it processes all records in BigTable when joining to X. Ideally I'd like the query to use index-seeks for implementing the WHERE Clause, and then a JOIN on BigTable which doesn't involve a table scan. It may not be possible.

    Thanks in advance,

    regards

    Carl

  • Oh, two additional pieces of information.

    1. The query needs to run in milliseconds. Without the LEFT OUTER JOIN within X it does. With the JOIN, the results (obviously dependent on the query), and can range from <10 seconds > than 5 minutes. But it's scanning.

    2. The actual implementation of the query uses dynamic SQL to handle the various choices of ORDER BY and the WHERE clause, but none of these, nor the dynamic sql, have proven to be a factor.

    C

  • Please see this article

    http://qa.sqlservercentral.com/articles/SQLServerCentral/66909/

    Note that BigTable, BigTable2 and BigTable3 have been clustered, and benefit from the MERGE join

    Ordinarily, you shouldnt really need to provide a merge query hint.

    Remove it , and any other hints, you have put in place, before posting the plan.



    Clear Sky SQL
    My Blog[/url]

  • Thanks. I'll repost the query, plus the resources to support it, but need to depersonalise the information beforehand. I agree with your point on MERGE, and usually avoid query hints, but the query is about x10 faster when it's used. (Also, am using WITH (NOEXPAND)) 🙂

    I'm wondering whether the approach of the query to the problem is right. I suspect that merging two large tables and JOINing on the result with a third large table is probably always a bad thing 🙂

    C

  • Cool,

    WITH (NOEXPAND) is different. Post the plan with that on , IMO its not a hint more of an instruction.



    Clear Sky SQL
    My Blog[/url]

  • the derived table which contains

    SELECT

    COALESCE(B3.ColA, B2.ColA) AS XYZ

    FROM BigTable2 B2

    LEFT MERGE JOIN BigTable3 B3 ON B2.Id=B3.Id

    WHERE B3.NameSearch LIKE @ClientName+'%'

    OR (B3.NameSearch IS NULL AND B2.NameSearch LIKE @ClientName + '%')

    Surely going for table/index scan here , and this could be the culprit here.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • I agree with Bhuvnesh that your join on B3 and B2 is probably what is causing the problem. (Of course this would be easier to test with sample data.) Assuming that you have an index on the NameSearch column in both B3 and B2, this query will probably perform much better.

    WITH C AS (

    SELECT

    B.*,

    IsNull(

    ( SELECT TOP 1 ColA FROM BigTable3 WHERE ID = B.ID AND NameSearch LIKE @ClientName + '%' )

    , ( SELECT TOP 1 ColA FROM BigTable2 WHERE ID = B.ID AND NameSearch LIKE @ClientName + '%' )

    ) AS XYZ

    FROM BigTable B

    WHERE

    (various filters on B)

    )

    SELECT *

    FROM C

    WHERE XYZ IS NOT NULL

    AND (various additional filters on B and XYZ)

    ORDER BY

    (various options on B and XYZ)

    I'm guessing that your query was doing an index scan on B3 and this rewrite should allow it to do an index seek.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • If you're using dynamic SQL anyway, why not remove all references to B3 when "B3.NameSearch" is NULL, since in that situation B3 isn't needed/used anyway?

    Scott Pletcher, SQL Server MVP 2008-2010

  • Definitely haven't tried this option, will test when I get into the office later, and post back then! Thanks.

    Actually, we're pushing back on the whole requirement to search for what is the same fact in two locations, since removing the requirement also makes the query perform faster 🙂 But that option perhaps lies outside the remit of a SQL technical forum

    Carl

    drew.allen (10/4/2010)


    I agree with Bhuvnesh that your join on B3 and B2 is probably what is causing the problem. (Of course this would be easier to test with sample data.) Assuming that you have an index on the NameSearch column in both B3 and B2, this query will probably perform much better.

    WITH C AS (

    SELECT

    B.*,

    IsNull(

    ( SELECT TOP 1 ColA FROM BigTable3 WHERE ID = B.ID AND NameSearch LIKE @ClientName + '%' )

    , ( SELECT TOP 1 ColA FROM BigTable2 WHERE ID = B.ID AND NameSearch LIKE @ClientName + '%' )

    ) AS XYZ

    FROM BigTable B

    WHERE

    (various filters on B)

    )

    SELECT *

    FROM C

    WHERE XYZ IS NOT NULL

    AND (various additional filters on B and XYZ)

    ORDER BY

    (various options on B and XYZ)

    I'm guessing that your query was doing an index scan on B3 and this rewrite should allow it to do an index seek.

    Drew

  • That's an excellent point, and we're doing that. The initial compilation of each version of the query isn't a particular overhead, and it gives us the flexibility, as you say, to remove references to WHERE clauses, and in some places, the need to JOIN to some tables at all.

    The original version of the query used a construction like

    WHERE

    (@SomeParam IS NULL OR MyCol LIKE @SomeParam+'%')

    AND

    (@SomeParam2 IS NULL OR MyCol2 LIKE @SomeParam2+'%')

    etc

    and the same for the ORDER BY, in its attempt to avoid dynamic SQL.

    BigTable+BigTable2+BigTable3 were also originally 10 tables, and replacing these with 3 indexed views improved performance considerably, and generated much simpler query plans 🙂

    Carl

    scott.pletcher (10/4/2010)


    If you're using dynamic SQL anyway, why not remove all references to B3 when "B3.NameSearch" is NULL, since in that situation B3 isn't needed/used anyway?

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply