Design approaches to improving a slow query

Question

Post reply

Design approaches to improving a slow query

Carl Rayer

Valued Member

Points: 67
More actions
October 4, 2010 at 3:21 am

#387932

Hi,

We have a query that is not performing well. Below is a simplified version of it to illustrate the problem.

[font="Courier New"]SELECT

B.*,

X.XYZ

FROM BigTable B

JOIN (

SELECT

COALESCE(B3.ColA, B2.ColA) AS XYZ

FROM BigTable2 B2

LEFT MERGE JOIN BigTable3 B3 ON B2.Id=B3.Id

WHERE B3.NameSearch LIKE @ClientName+'%'

OR (B3.NameSearch IS NULL AND B2.NameSearch LIKE @ClientName + '%')

) AS X ON B.Id=X.Id

WHERE

(various filters on B and X, including X.XYZ)

ORDER BY

(various options on B and X.XYZ)[/font]

Note that all tables have > 1 million rows:

- Query is run across indexed-views, so we have denormalised the tables

- The COALESCE is slower when the tables are directly JOINed with B (rather than via the derived table), and the COALESCE is run within the main SELECT (note the need to ORDER BY the derived column is mandatory)

- No difference in query plan when using CROSS APPLY to 'join' X with B

- Note that BigTable, BigTable2 and BigTable3 have been clustered, and benefit from the MERGE join

- NameSearch is across a nonclustered index

Ideally I want to avoid reporting tables, due to their maintenance costs. The PK on BigTable is 1:1 with both BigTable2 and BigTable3.

Final approach we're testing is whether the COALESCE can be bypassed by using a UNION ALL within the derived table, where the tables that have been UNIONed are effectively the same recordset as X.

Are there any other design approaches to the improving the query that seem worth pursuing?

At present, the query plan for the above shows it is not linearly scalable, and pulls in all the records from BigTable2 and BigTable3, before joining to BigTable; OR, it processes all records in BigTable when joining to X. Ideally I'd like the query to use index-seeks for implementing the WHERE Clause, and then a JOIN on BigTable which doesn't involve a table scan. It may not be possible.

Thanks in advance,

regards

Carl

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Carl Rayer Valued Member Points: 67 More actions · Answer 1

Oh, two additional pieces of information.

1. The query needs to run in milliseconds. Without the LEFT OUTER JOIN within X it does. With the JOIN, the results (obviously dependent on the query), and can range from <10 seconds > than 5 minutes. But it's scanning.

2. The actual implementation of the query uses dynamic SQL to handle the various choices of ORDER BY and the WHERE clause, but none of these, nor the dynamic sql, have proven to be a factor.

C

Dave Ballantyne SSC-Dedicated Points: 33556 More actions · Answer 2

Please see this article

http://qa.sqlservercentral.com/articles/SQLServerCentral/66909/

Note that BigTable, BigTable2 and BigTable3 have been clustered, and benefit from the MERGE join

Ordinarily, you shouldnt really need to provide a merge query hint.

Remove it , and any other hints, you have put in place, before posting the plan.

Clear Sky SQL
My Blog[/url]

Carl Rayer Valued Member Points: 67 More actions · Answer 3

Thanks. I'll repost the query, plus the resources to support it, but need to depersonalise the information beforehand. I agree with your point on MERGE, and usually avoid query hints, but the query is about x10 faster when it's used. (Also, am using WITH (NOEXPAND)) 🙂

I'm wondering whether the approach of the query to the problem is right. I suspect that merging two large tables and JOINing on the result with a third large table is probably always a bad thing 🙂

C

Dave Ballantyne SSC-Dedicated Points: 33556 More actions · Answer 4

Dave Ballantyne

SSC-Dedicated

Points: 33556

October 4, 2010 at 3:50 am

#1230883

Cool,

WITH (NOEXPAND) is different. Post the plan with that on , IMO its not a hint more of an instruction.

Clear Sky SQL
My Blog[/url]

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 5

the derived table which contains

SELECT

COALESCE(B3.ColA, B2.ColA) AS XYZ

FROM BigTable2 B2

LEFT MERGE JOIN BigTable3 B3 ON B2.Id=B3.Id

WHERE B3.NameSearch LIKE @ClientName+'%'

OR (B3.NameSearch IS NULL AND B2.NameSearch LIKE @ClientName + '%')

Surely going for table/index scan here , and this could be the culprit here.

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

drew.allen SSC Guru Points: 76776 More actions · Answer 6

I agree with Bhuvnesh that your join on B3 and B2 is probably what is causing the problem. (Of course this would be easier to test with sample data.) Assuming that you have an index on the NameSearch column in both B3 and B2, this query will probably perform much better.

WITH C AS (



	SELECT

	B.*,

	IsNull(

		  ( SELECT TOP 1 ColA FROM BigTable3 WHERE ID = B.ID AND NameSearch LIKE @ClientName + '%' )

		, ( SELECT TOP 1 ColA FROM BigTable2 WHERE ID = B.ID AND NameSearch LIKE @ClientName + '%' )

		) AS XYZ

	FROM BigTable B

	WHERE

		(various filters on B)

	)

SELECT *

FROM C

WHERE XYZ IS NOT NULL

AND (various additional filters on B and XYZ)

ORDER BY

(various options on B and XYZ)

I'm guessing that your query was doing an index scan on B3 and this rewrite should allow it to do an index seek.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

scott.pletcher SSCertifiable Points: 5958 More actions · Answer 7

scott.pletcher

SSCertifiable

Points: 5958

October 4, 2010 at 4:48 pm

#1231369

If you're using dynamic SQL anyway, why not remove all references to B3 when "B3.NameSearch" is NULL, since in that situation B3 isn't needed/used anyway?

Scott Pletcher, SQL Server MVP 2008-2010

Carl Rayer Valued Member Points: 67 More actions · Answer 8

Definitely haven't tried this option, will test when I get into the office later, and post back then! Thanks.

Actually, we're pushing back on the whole requirement to search for what is the same fact in two locations, since removing the requirement also makes the query perform faster 🙂 But that option perhaps lies outside the remit of a SQL technical forum

Carl

drew.allen (10/4/2010)

I agree with Bhuvnesh that your join on B3 and B2 is probably what is causing the problem. (Of course this would be easier to test with sample data.) Assuming that you have an index on the NameSearch column in both B3 and B2, this query will probably perform much better.

WITH C AS (
SELECT B.*, IsNull( ( SELECT TOP 1 ColA FROM BigTable3 WHERE ID = B.ID AND NameSearch LIKE @ClientName + '%' ) , ( SELECT TOP 1 ColA FROM BigTable2 WHERE ID = B.ID AND NameSearch LIKE @ClientName + '%' ) ) AS XYZ FROM BigTable B WHERE (various filters on B) ) SELECT * FROM C WHERE XYZ IS NOT NULL AND (various additional filters on B and XYZ) ORDER BY
(various options on B and XYZ)

I'm guessing that your query was doing an index scan on B3 and this rewrite should allow it to do an index seek.

Drew

Carl Rayer Valued Member Points: 67 More actions · Answer 9

That's an excellent point, and we're doing that. The initial compilation of each version of the query isn't a particular overhead, and it gives us the flexibility, as you say, to remove references to WHERE clauses, and in some places, the need to JOIN to some tables at all.

The original version of the query used a construction like

WHERE

(@SomeParam IS NULL OR MyCol LIKE @SomeParam+'%')

AND

(@SomeParam2 IS NULL OR MyCol2 LIKE @SomeParam2+'%')

etc

and the same for the ORDER BY, in its attempt to avoid dynamic SQL.

BigTable+BigTable2+BigTable3 were also originally 10 tables, and replacing these with 3 indexed views improved performance considerably, and generated much simpler query plans 🙂

Carl

scott.pletcher (10/4/2010)

If you're using dynamic SQL anyway, why not remove all references to B3 when "B3.NameSearch" is NULL, since in that situation B3 isn't needed/used anyway?