Dynamic Queries (not Dynamic SQL)

  • The following 'treatise' comes about as a result of a gut instinct that the mechanism employed by a (name withheld to protect the guilty) developer was suboptimal and there must be a better, faster, stronger way to do the same thing.

    The premise:

    Create a single query with multiple filter criteria that allows the return of a result set of rows matching those criteria that have specified values and allowing the return of any value for which the criteria is specified as NULL.

    If a table has columns A & B and the filter is A=@A, B=@B:

    When @a=123, @b-2=456 returns rows WHERE A=123 & B=456

    When @a=213, @b-2=NULL returns rows WHERE A=132 & B=any value

    It is a type of generic query that is useful for generating result sets for reports, lists, etc. when multiple criteria MAY be used but not necessarily and precludes coding separate queries for each possible combination filter criteria.

    A little web searching and posts within this forum produced four basic static (i.e. not dynamic SQL) queries that generated the desired results. I am sure there are more variations, algorithms and mechanisms. I am sure they will magically find there way here, too! 😉

    Disclaimer: This is a quick & dirty. No warranty written or implied. Demo use only. Don't try this at home. No user serviceable parts inside -- get the idea?

    Build the test table:

    -- Run this as is on a NON-PRODUCTION server

    -- Use the SELECT'd values of A, B & C as the variable values for @varA, @varB & @varC, respectively in second code section.DynQuery TestQuery.sql

    -- NULL each variable and combinations of variables to alter the query plan.

    -- Compare the estimated query plans - note the index used in each

    -- Drop indexes (commands at end of this script) to alter the plans & rerun queries

    -- This may/should be executed several times to generate new values

    -- The value of @maxRows may be altered to generate larger/smaller tables

    USE tempdb

    go

    RAISERROR('This may take a moment...', 10, 1) WITH NOWAIT

    go

    SET ANSI_NULLS ON

    SET QUOTED_IDENTIFIER ON

    SET NOCOUNT ON

    IF OBJECT_ID('dbo.tempTbl') IS NOT NULL

    DROP TABLE dbo.tempTbl

    CREATE TABLE dbo.tempTbl

    (

    PK int IDENTITY(1,1) NOT NULL,

    A int NOT NULL,

    B int NOT NULL,

    C int NOT NULL,

    CONSTRAINT PK_tempTbl

    PRIMARY KEY CLUSTERED (PK ASC)

    )

    DECLARE

    @maxRows int,

    @rndA int,

    @rndB int,

    @rndC int,

    @cnt int

    SET @cnt = 0

    -- Alter @maxrows to change the table row count

    SET @maxRows = 50000

    -- slap some values in the table - not efficient but it works

    WHILE @cnt < @maxRows

    BEGIN

    SET @rndA = CAST(RAND() * 101010.0 AS int)

    SET @rndB = CAST(RAND() * RAND(@rndA) * 3633027.3 AS int)

    SET @rndC = CAST(RAND() * RAND(@rndB) * 211.7 AS int)

    INSERT INTO dbo.tempTbl

    (

    A,

    B,

    C

    )

    VALUES

    (

    @rndA,

    @rndB,

    @rndC

    )

    SET @cnt = @cnt + 1

    END

    -- Index this puppy six ways from sunday!

    CREATE NONCLUSTERED INDEX A

    ON TempTbl (A)

    CREATE NONCLUSTERED INDEX B

    ON TempTbl (B)

    CREATE NONCLUSTERED INDEX C

    ON TempTbl (C)

    CREATE NONCLUSTERED INDEX AB

    ON TempTbl (A, B)

    CREATE NONCLUSTERED INDEX AC

    ON TempTbl (A, C)

    CREATE NONCLUSTERED INDEX BA

    ON TempTbl (B, A)

    CREATE NONCLUSTERED INDEX BC

    ON TempTbl (B, C)

    CREATE NONCLUSTERED INDEX CA

    ON TempTbl (B, C)

    CREATE NONCLUSTERED INDEX CB

    ON TempTbl (B, C)

    CREATE NONCLUSTERED INDEX ABC

    ON TempTbl (A, B, C)

    CREATE NONCLUSTERED INDEX ACB

    ON TempTbl (A, C, B)

    CREATE NONCLUSTERED INDEX BCA

    ON TempTbl (B, C, A)

    CREATE NONCLUSTERED INDEX BAC

    ON TempTbl (B, A, C)

    CREATE NONCLUSTERED INDEX CAB

    ON TempTbl (C, A, B)

    CREATE NONCLUSTERED INDEX CBA

    ON TempTbl (C, B, A)

    SELECT TOP 1 *

    FROM tempTbl

    WHERE pk = CAST(RAND() * 50000.0 AS int)

    -- Drop indexes individually & rerun query

    -- DROP INDEX dbo.tempTbl.A

    -- DROP INDEX dbo.tempTbl.B

    -- DROP INDEX dbo.tempTbl.C

    -- DROP INDEX dbo.tempTbl.AB

    -- DROP INDEX dbo.tempTbl.AC

    -- DROP INDEX dbo.tempTbl.BA

    -- DROP INDEX dbo.tempTbl.BC

    -- DROP INDEX dbo.tempTbl.CA

    -- DROP INDEX dbo.tempTbl.CB

    -- DROP INDEX dbo.tempTbl.ABC

    -- DROP INDEX dbo.tempTbl.ACB

    -- DROP INDEX dbo.tempTbl.BAC

    -- DROP INDEX dbo.tempTbl.BCA

    -- DROP INDEX dbo.tempTbl.CAB

    -- DROP INDEX dbo.tempTbl.CBA

    ---------------------------

    -- End of test table build

    ---------------------------

    OK, run the above to generate the test table. While that's going you can copy the below into a new Query Analyzer window. When the table generation is complete it will return a row. Use the values (or rerun the SELECT at the bottom for new values) to assign to the variables below:

    -- NULL each variable and combinations of variables to alter the query plan.

    -- Compare the estimated query plans (note: a third option was added just for kicks)

    -- Also execute the queries proper & note the IO statistics on the messages tab

    --

    SET NOCOUNT OFF

    USE tempdb

    DECLARE

    @varA int,

    @varB int,

    @varC int

    -- replace the values below with those returned by the table generation script

    SET @varA = 94240

    SET @varB = 1486844

    SET @varC = 14

    SELECT

    *

    FROM tempTbl

    WHERE

    (A = @varA OR @varA IS NULL) AND

    (B = @varB OR @varB IS NULL) AND

    (C = @varC OR @varC IS NULL)

    SELECT

    *

    FROM tempTbl

    WHERE

    (A = @varA AND @varA IS NOT NULL) AND

    (B = @varB AND @varB IS NOT NULL) AND

    (C = @varC AND @varC IS NOT NULL)

    SELECT

    *

    FROM tempTbl

    WHERE

    A = COALESCE(@varA, A) AND

    B = COALESCE(@varB, B) AND

    C = COALESCE(@varC, C)

    SELECT

    *

    FROM tempTbl

    WHERE

    A = ISNULL(@varA, A) AND

    B = ISNULL(@varB, B) AND

    C = ISNULL(@varC, C)

    -----------------------------------

    -- End of test script

    -----------------------------------

    OK, the table has been generated and the values returned copied to their respective variables, right?

    Now click the "Display Estimated Execution Plan" button and review the results. Note the index used by each query. Note the differences in the execution paths. Note the difference in the COALESCE() vs the ISNULL() versions - not terribly surprising, I guess...

    Now start playing with the indices. Drop the index used by the 'most efficient' query and check the plans again. Note any trends in the query plans. Repeat this delete-rerun process until you are exhausted, lunch time or the boss wants you to give an end-user the sa password (whichever comes first).

    In my fiddling with it, one of the queries consistently beat the others by approximately 20x until all of the indices were dropped. The IS NOT NULL version...

    I did not try sp_executesql or EXEC query versions but it would be interesting to see how they faired.

    Play on!

    Art

  • Most interesting ...

    I guess the most likely indexing regime would be the single index per column (i.e INDEX A, B and C) - and in that mode the IS NOT NULL method cost is only about 1.4% of batch in my testing and the IS NULL and IsNull() methods are almost identical (0.01% difference which could well be a rounding issue) COALESCE was marginally worse - looking at the detail of the clustered index scan that still takes place on the PK it would seem that it might be because TSQL does not natively understand COALESCE as it seems to have translated it into IF's whereas the IsNull() is shown as just that - that would fit with MS view of the world - why natively support part of the SQL standard rather than your own proprietary function!

    But the result is that the conveniently neat shorthand of the IsNull() method must be replaced by the wordier but massively more efficient IS NOT NULL one

    Thanks for taking the time to setup the tests

  • The problem with all of those 'all-in-one' type queries is that they make query plan reuse a very dodgy affair. What may be suitable for one set of parameters often isn't for another set.

    The construct (ColA = @a or @a is null) AND (ColB = @b-2 or @b-2 is null) AND ... and similar ones cannot be properly analysed by the optimiser and it tends to assume that the second half of the OR is true for all, or false for all. As you can imagine, this doesn't lend itself well to optimal plans

    (btw, the second query gives a different result from the others for the case where @varA and @varB have values, and @varC is null. Retunrs 0 rows whereas the other 3 return 310 in my test)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • There is another issue with the IS NOT NULL version too. NULLing all variables returns an empty set.

    Back to the drawing board...glad I didn't erased it!

  • No wonder it was faster ....

    If any of the variables are Null if will be doing a compare with a Null which is a no no -- perhaps this simple switch would make a difference (I have trashed the test table already my end...)

    SELECT

    *

    FROM tempTbl

    WHERE

    (@varA IS NOT NULL AND A = @varA) AND

    (@varB IS NOT NULL AND B = @varB) AND

    (@varC IS NOT NULL AND C = @varC)

    As that would possibly not bother doing the A=@VarA if it has already seen that @VarA was Null

  • James Horsley (11/16/2007)


    No wonder it was faster ....

    If any of the variables are Null if will be doing a compare with a Null which is a no no -- perhaps this simple switch would make a difference (I have trashed the test table already my end...)

    Same thing. If any of the variables are NULL, the resultset has no records.

    As that would possibly not bother doing the A=@VarA if it has already seen that @VarA was Null

    SQL doesn't use short-circuit evaluation of the form that some programming languages do. The order that criteria are specified in the where clause has no effect on the order in which they are evaluated. The evaluation order depends on the available indexes, the distribution of data and the plan that the optimiser comes up with, based on those.

    Is a failry common thing I hear, about putting constant expressions first in the where clause, or putting the null comparisons first. I wrote a more detailed piece on this, complete with exec plans: Order of execution[/url]

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • So MS could add a performance boost in SQL2009 by short circuiting ...

  • I haven't given up but it appears my gut instinct cannot be satisfied (yet)...

    I snagged the 'IS NOT NULL' from the URL mentioned above (http://www.sommarskog.se/dyn-search.html). Two problems surfaced with that. The first is that the code provided required at least one of the filter parameters to be populated. I was playing with that code in order to allow it the capability of returning all rows if all of the parameters were NULL.

    The second problem was that the code I snipped for the test script was the wrong code (doh!). In order to make the query work the WHERE clause needs to take into consideration, via a separate comparison, the condition when all parameters are NULL. The corrected clause is:

    WHERE

    (A = @varA AND @varA IS NOT NULL) OR

    (B = @varB AND @varB IS NOT NULL) OR

    (C = @varC AND @varC IS NOT NULL) OR

    (@varA IS NULL AND @varB IS NULL AND @varC IS NULL)

    On SQL2000, this change made the query fall behind the original IS NULL version. Interestingly, on SQL2005, all four versions were equal regardless of indexing.

    Another irritating but probably not as critical issue is the distribution of values within the test table. For any value in column A, the values in B and C would be the same for all rows.

    The new testTbl row generation I'm using produces a better distribution:

    DECLARE

    @maxRows int,

    @rndA int,

    @rndB int,

    @rndC int,

    @incVal int

    SET @incVal = 1

    -- Alter @maxrows to change the table row count

    SET @maxRows = 50000

    -- slap some values in the table

    INSERT INTO dbo.tempTbl (A, B, C)

    SELECT 0, 0, 0

    WHILE @incVal <= @maxRows

    BEGIN

    INSERT

    INTO dbo.tempTbl (A, B, C)

    SELECT 0, 0, 0

    FROM

    (

    SELECT PK + 1 dtCol

    FROM dbo.tempTbl

    WHERE PK <= @maxRows - @incVal) dt

    SET @incVal = @incVal * 2

    END

    UPDATE dbo.tempTbl

    SET

    A = FLOOR(RAND(PK) * 633027.3),

    B = FLOOR(1 / (RAND(PK) / PK)),

    C = FLOOR((RAND() / PK) * 1111111.1)

    Of course, I have not beaten this horse sufficiently and will continue to do so until I either have a better solution or dog food...

    Art

  • R2ro (11/21/2007)


    ...

    WHERE

    (A = @varA AND @varA IS NOT NULL) OR --line1

    (B = @varB AND @varB IS NOT NULL) OR --line1

    (C = @varC AND @varC IS NOT NULL) OR

    (@varA IS NULL AND @varB IS NULL AND @varC IS NULL)

    ....

    Art

    This STILL doesn't meet your initial criteria because the logic is entirely wrong. What happens when only @varC is null? you get all records that match either line1 or line2 but not necessarily both. This particular horse is lame there doc, time to apply "equine medicine" to quote a far side strip of yore....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • James Horsley (11/19/2007)


    So MS could add a performance boost in SQL2009 by short circuiting ...

    I don't think so.

    Bear in mind that SQL is a declarative language, not a procedureal one. You state what you want and the engine takes care of the details of how. The order of evaluation is part of the how. In a procedural language, that's the responsibility of the programmer. In a declarative language it is not.

    The query engine does do a form of short circuiting, evaluating some criteria before others. Only rows that satisfy the earlier applied conditionas are evaluated for later ones. It's just that the query writer has little control over which criteria get evaluated in which order.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I don't think so.

    Surely it isn't beyond the realms of what the optimiser is doing ... but as database is one area MS does have real competition I guess if it was easy it would have been done!

  • So that we're on the same page, what are you thinking about when you say short-circuit?

    What C++ (and other C-like languages have) where it the first expression in a condition determines the outcome, the second is not evaluated?

    Something else?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Yes that's just what I mean - so if you have say IF (A!=B OR C!=D) then there is no need to evaluate C!=D id A did not equal B because we are Or'ing

  • But that means that the order of statements in the where clause now becomes important. That will absolutely cripple the optimiser as it will not be allowed to look for plans that evaluate the third statement (say of 4) in the where clause first, in the case that there is an effective index available on the third but none on the others.

    The way the optimiser currently works is it will check which of the conditions in the where clause (or in the from clause) it can use to quickly reduce the number of rows involved in latter operations. A table scan followed by a filter is less desired (and has a higher cost) than an index seek

    At the moment, the optimiser is a little shaky with constant evaluations in OR statements, where the other part of the OR references a column. Maybe that will change in future versions.

    Remember as I said before. SQL's a declarative language, not a procedural language. In a declarative language (like prolog as well) you don't tell the execution engine what to do. You just tell it what you want out.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply