UDF's vs. SP's and optimization

  • Consider a stored procedure with optional parameters that is involved in a select. We often end up with where clauses like this:

    where (@P1 is null or table.p1=@P1) and

    (@P2 is null or table.p2=@P2) and

    (@P3 is null or table.p3=@P3)

    With stored procedures, typically this precompile into table scans due to the OR, and even if a full index is covered by the actual parameter values (i.e. they are non-null) at run time, performance stinks.

    This comes up in stored procedures that are run as a result of user data entry, eg. into a web page, and is one reason we have been using dynamic sql for them instead, since then you can omit the clause if the parameter is null as opposed to relying on the "OR".

    When you do the same in a UDF, however, so far it SEEMS as if it is not pre-optimized, and instead looks at the values at run time and makes a good index choice. Presumably you are paying the price of the optimizer then as well.

    Is that true, that query paths in UDF's are not precompiled in any fashion? Or is it some kind of statement specific decision?

  • Not entirely sure of the answer but have to disagree with the where clause.

    I have yet to have that cause a performance issue.

    I use it a lot (SQL2K) without any problems. Maybe you have something else which is causing a issue.

    One of the guys here said he did have problems with it on SQL 7. Could not increase the speed. As soon as he moved it onto 2000, it was fine.

    Are you using 7 or 2k?

    Cheers,

    Crispin

    Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!

  • I am using SQL2000.

    In terms of the where clause, just do this and look at the query plans. The one with the OR gives a scan, the one without a seek. Though this seems to belie my original findings (in much more complicated tables) where the udf was faster, as in this case the udf also gives a scan.

    I guess this example shows that dynamic sql is a better choice than either ohter, due to the ability to eliminate the OR clause.

    PS. I realize that there's no stored procedure in the below, but your question related to the WHERE.

    ------

    use Northwind

    go

    create function lef (@OrderId int)

    returns table

    as

    return (select * from [Order Details] where (@OrderId is null or OrderId=@OrderId))

    go

    declare @OrderID int

    set @OrderId = 1

    select * from [Order Details] where (@OrderId is null or OrderId=@OrderId)

    select * from [Order Details] where (OrderId=@OrderId)

    select * from lef(@OrderId)

  • While the plan is as you said, have a look at the times. Dynamic SQL is expensive. Fastest option is still having the OR's.

    UDF becomes the slowest.

    
    
    declare @OrderID int
    set @OrderId = 1

    Declare
    @vX Varchar(1000)

    Set @vX = 'select * from [Order Details] where (OrderId=' + CAST(@OrderId AS VARCHAR) + ')'

    Exec (@vX)
    select * from [Order Details] where (@OrderId is null or OrderId=@OrderId)
    select * from [Order Details] where (OrderId=@OrderId)
    select * from lef(@OrderId)
     
    
    Dur. Time. Reads. Writes
    0 0 2 0 Prepare Str
    0 0 54 0 Exec Str
    0 0 13 0 ORed Select
    0 0 2 0 Baisc Select
    16 16 15 0 UDF Select

    Hope the formatting lives....

    Cheers,

    Crispin

    Maybe a bit better

    Edited by - crappy on 09/25/2003 06:50:16 AM

    Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!

  • Well, no. Sorry, I don't buy it. The Northwind database is tiny, and on tiny tables scans are often faster, and compile times significant in comparison to execution.

    When I run similar queries against hundred-million-row tables, the difference in a table scan (or index scan) and a seek is a bit larger to say the least, and when we're talking many seconds of run time even for a seek (or minutes for a scan) then few milliseconds for the compilation is completely irrelevant.

    All general statements are false, including "Dynamic sql is expensive". Dynamic SQL costs a compile; if execution is comparable to the compile time, then it is expensive. If execution is always a lot longer than the compile, and if the compile can make execution more efficient, then the compile is not only irrelevant in terms of actual cost, it is actually much faster.

  • hmmm, I'm gonna have to trust you on this one as I do not have a table with a couple million rows.... 🙁

    I agree that if you taking a couple seconds then the small amount taken for the execution plan is irrelevant.

    As is always said, I would always go for the fastest method.

    Odd thing though which is similiar to this.

    One of the guys came for help with a proc which was slow (26 seconds to return 200 odd rows)

    The main killer was a function in the select.

    The function returned a BIT. Contents of the function was If Exissts(Select xxx) return 1 else 0

    Taking the exact statement out and using it in a CASE in the select brought the time down to a shade ove 2 seconds.

    The statement was using the same index in both(?) cases.

    Spose it boils down to "Use the fastest method"....

    Cheers,

    Crispin

    Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!

  • Hello!

    What do you actually mean by this:

    quote:


    All general statements are false


    You are saying a paradox 😉

    If you want to eliminate the OR in your statements try:

    WHERE (ISNULL(@P1,table.p1) = table.p1)

    AND (ISNULL(@P2,table.p2) = table.p2)

    AND (ISNULL(@P3,table.p3) = table.p3)

    I'm not sure if it will change your query executio though, it depends on the statisctics of your table column and the indicis you have on them.

    Have you tried forcing index use?

    Regards, Hans!

  • instead of

    where (@P1 is null or table.p1=@P1) and

    (@P2 is null or table.p2=@P2) and

    (@P3 is null or table.p3=@P3)

    Try using

    where (table.p1= isnull( @P1, table.p1) and

    (table.p2=isnull( @P2, table.p2 ) and

    (table.p3=isnull( @P3, table.p3 )

  • I guess "All general statements are false" might be a paradox, but I like the statement.

    The IsNull usage is an intriging syntax to get rid of the OR, but I think it's just hiding it. I tried a few real like queries against largish tables and got the same query plan (more or less) and same execution time. At least in the cases I tried.

    I just did a real world example. The table Orders has an index over UnitCd, CustCd, RefNbr and is large enough I can see real differences (just in case the query plan is misleading) if it uses it or not.

    declare @RefNbr int

    declare @CustCd int

    declare @UnitCd char(3)

    set @RefNbr = 2123456

    set @CustCd = 123

    set @UnitCd='LUN'

    select *

    from Orders

    where (Refnbr=@Refnbr and CustCd=@CustCd and UnitCd=@UnitCd)

    The above uses the index. Replace the select with either of:

    select *

    from Orders

    where (@RefNbr is null or Refnbr=@Refnbr) and

    (@CustCd is null or CustCd=@CustCd) and

    (@UnitCd is null or UnitCd=@UnitCd)

    select *

    from Orders

    where Refnbr=IsNull(@RefNbr,RefNbr) and CustCd=IsNull(@CustCd,CustCd) and UnitCd=IsNull(@unitCd,UnitCd)

    and it stops using the index, and does a scan. However, if you use the function:

    create function lef (@RefNbr int, @CustCd int, @UnitCd char(3)) returns table

    as return

    select OrderCd

    from Orders

    where (@RefNbr is null or Refnbr=@Refnbr) and

    (@CustCd is null or CustCd=@CustCd) and

    (@UnitCd is null or UnitCd=@UnitCd)

    And then the select:

    select * from lef(123456,123456,'LUN')

    It uses the same index as the first select, doing a seek, and is very fast. Which relates to my original question, that something different is happening in UDF optimization, in this case something smarter. However, in experimenting I also tried:

    select * from lef(123456,Null,'LUN')

    and it is still using the index. Of the three items in the index, this covers the 1st and 3rd. It's questionable whether it should have used the index, but it's not a wrong decision.

    If I do instead:

    select * from lef(123456,Null,NULL)

    That select's parameters cover only the 3rd item in the index, it should not use it, and indeed it does not use it. So it is definitely re-optimizing the UDF each time it runs, kind of as thought it was dynamic sql

    The interesting thing about this however, is that the UDF still contains the OR. It appears to be smarter in how it optimizes the execution plan for the UDF than it is for straight SQL, no stored procedure involved. Something about the UDF compile is actually removing the OR, whereas the straight SQL statement is leaving the OR and doing the index scan.

    As someone famous once said, "use the fastest method".

    But I'm really intriged with the optimization of the UDF appearing to be superior to the SQL it contains when used alone.

  • Nice thread. I like the comments about dynamic vs stored proc. There could well be times when a specifically built dynamcic statement might have a better query plan than a more generic (or hacked) stored proc. You do pay for the compile time, but hopefully thats a tiny bit compared to the real work. Somewhere in the middle might be using the 'with recompile' option on the proc to get a new plan each time.

    Andy

    http://qa.sqlservercentral.com/columnists/awarren/

  • Indeed. We're very sensitive to the Dynamic vs. non issue because about 90% of what we do is DSS type stuff, that is basically:

    - Put up a screen with a ton of parameters

    - Turn the parameters into a monster query

    - Display the results

    Some of these queries, especially if a lot of parameters are specified, can have very complex where clauses. However, the more typical user responses are to enter just a few parameters. By having the code be smart enough to leave off those clauses, and in some cases whole joins, we can save huge amounts of execution time.

    Some of my programmers prefer to build the queries as stored procedures with all the logic there (as above, checking for null parameters, or defaulting them to infinite from/to ranges). This means simplier code (i.e. web code) with a static query. But for big tables (and most of our commonly used ones are rather large) building them dynamically performs much faster.

    That is what was intriging about the UDF -- it seemed at least in some cases to have the best of both. I've been thinking about it, and trying to think back (far back) to days when I worked on compilers (language type). My guess is that the real difference is in the parameter specification. Two SQL statements, eg.

    set @x=1

    select... where something=@x

    I think are optimized completely independently of each other, so the compiler treats @x as a run time variable. A UDF however:

    select * from lef(1)

    the parameter is specificed, and the compiler (despite it being a variable) is treating it as a constant, so it can see that a clause like

    where (@x is null or x=@x)

    that the "@x is null" is really an expression of constants and so tell if it is true or false, and collapse the OR away.

    I think (but would have to experiment) that the stored procedure case, with the with-recompile, would not be as good as the UDF either. Teh SP would be recompiled, but not in the context of the invoked parameters specifically. That's the advantage the optimizer has in the UDF, it actually looks at the passed parameters and does a (my guess) literal replacement of them before doing the compile, so what it's seeing is

    where (1 is null or x=@x)

    turned into

    where x=@x

    Now this just moves the dynamic sql up a layer (or down depending on your viewpoint).

    It might be interesting (if I ever find time) to experiment with this a bit in terms of (for example) UDF's in stored procedures. Are they compiled only once there? I'd bet they are, with the parameters used in the invocation.

  • Hi Ferguson!

    Looking in BOL for executing UDFs it says that you can use the WITH RECOMPILE keywords and indicates that a new plan will be forced to be compiled. If the UDF was not precompiled this would be unusefull since it then would have to be compiled every time. Ok, it does explicitly say that it (WITH RECOMPILE) is valid for an SP but with SP3 it should be valid also for UDFs.

    Hmm, I don't know...

  • I have noticed the very same behavior that is described by Ferguson. I would also echo many of Ferguson's comments. I would really like to understand what's going on behind the scenes.

    Creator of SQLFacts, a free suite of tools for SQL Server database professionals.

  • I can't explain what's going on in the UDF, but I can state emphatically that you should avoid "WHERE (@p is NULL or field = @p)" in stored procedures. The procedure can have only one execution plan, and my guess is that the compiler assumes all parameters will be non-null and choses indexes accordingly. Typically these procedures work fine when all parameters are supplied but performance goes to hell when some are null, which makes the chosen indexes useless.

    The most extreme example I've dealt with had execution time go from 10 seconds to 34 minutes.

    Using ISNULL() is not a solution, because using any kind of function will prevent optimized use of indexes.

    The solutions I've seen are either dynamic SQL (especially using sp_executesql instead of EXEC), and creating 4 or 8 (never seen anyone go to 16 or beyond) versions of the query and using IF to pick one based on the parameters.

    I've never tested this in a UDF, and I'll be thrilled if that is a solution, but I wanted to make it clear that the WHERE clause construction in Ferguson's original post is a huge mistake.

    As for worrying about the compilation overhead, keep in mind that Ferguson said these queries come from web pages. You're not going to get complaints that the next page came up a few hundred milliseconds slower because it had to compile the SQL, but you'll get lots of complaints when the page comes up empty because the query timed out. If you use sp_executesql, the execution plans for the most common forms of the query may be cached and it won't have to be recompiled anyway (not true if you use EXEC).

  • Re Scott's: The procedure can have only one execution plan, and my guess is that the compiler assumes all parameters will be non-null and choses indexes accordingly.

    I agree with you completely except that I think you mean "may be null" as opposed to "will be non-null". If it assumed they are non-null it could choose an index and it would then fail if it wasn't non-null.

    If you choose a scan, it works regardless if null or not.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply