How to understand the SQL Server Statistics information and Query Cost of Execut

  • In figure 1, there are SQL statements , these 2 sql statments join 2 tables, one is on current database server, the other is on another database server. the first SQL statement runs by SQL Server linker server to join; the second SQL statement uses openquery and linked server to get same data. and in figure 2 to figure 4 show its execution plan and statistics information (IO/time). now I have some questions below.

    1. can we use Estimated substree Cost to estimate the total query cost for the SQL execution cost ? if not, we use different ways to encode sql scripts to get same result, how to estimate which way (sql statement) best is? is there a parameter for this estimation under SQL server execution plan?

    2. can we use the Query cost (relative to the batch) percentage to estimate which one best ?

    3. which parameters are used to compute the Query cost for the execution plan

    4. in figure 4, for second sql statement, the logic reads is 100518 and scan count is 23107, it logic reads and scan count is much higher than the first sql statement, why its query cost is 15% and the first sql statement is 85% ? noramlly, if there are many IOs, it will take more time, in this case, why its IO bigger but query cost ?

    5. in the said 2 sql statements, it only joins 2 tables, without joining table worktable, why there is worktable under it statistics inforamtion?

    6. in Figure 5, I just retrived data from one table, how to understand the relations among the part1 to part4? which is the detail information and which is the summary info? what are the relations of them ?

    thanks everyone for your kind help and instruction!

     

    Attachments:
    You must be logged in to view attached files.
    1. I don't advise it.  Those are Estimated costs, and the estimates are only as good as your statistics, and things get even worse when a linked server is involved.  Use the actual execution plan (although even then, the percentages are based on the estimated plan, not the actual).
    2. See (1) above
    3. I don't know.  I don't even know whether Microsoft publish that information.
    4. See (3) above
    5. The work table is something internal to the query execution engine (that isn't the right term for it).  One reason why a work table is created is if stale statistics lead to an insufficient memory grant and data has to spill to tempdb.  That may be why you've got an exclamation mark on the final operator in the plan.  (By the way, it's much bettter if you post the plan itself (.sqlplan file) rather than pictures of it.)
    6. I don't know, because I can't see what you're executing.  Possibly parts 1 and 2 are where you're setting statistics IO on?

    John

  • Actually,  when I executed these SQL statement I click the menu named Include Actual  Execution Plan. and don't know it still show estimated subtress cost etc.

  • Yes, with the Actual execution plan, you get estimated and actual values.  By comparing them you can get some idea of how accurate your statistics are.

    John

  • 892717952 wrote:

    In figure 1, there are SQL statements , these 2 sql statments join 2 tables, one is on current database server, the other is on another database server. the first SQL statement runs by SQL Server linker server to join; the second SQL statement uses openquery and linked server to get same data. and in figure 2 to figure 4 show its execution plan and statistics information (IO/time). now I have some questions below.

    1. can we use Estimated substree Cost to estimate the total query cost for the SQL execution cost ? if not, we use different ways to encode sql scripts to get same result, how to estimate which way (sql statement) best is? is there a parameter for this estimation under SQL server execution plan?

    2. can we use the Query cost (relative to the batch) percentage to estimate which one best ?

    3. which parameters are used to compute the Query cost for the execution plan

    4. in figure 4, for second sql statement, the logic reads is 100518 and scan count is 23107, it logic reads and scan count is much higher than the first sql statement, why its query cost is 15% and the first sql statement is 85% ? noramlly, if there are many IOs, it will take more time, in this case, why its IO bigger but query cost ?

    5. in the said 2 sql statements, it only joins 2 tables, without joining table worktable, why there is worktable under it statistics inforamtion?

    6. in Figure 5, I just retrived data from one table, how to understand the relations among the part1 to part4? which is the detail information and which is the summary info? what are the relations of them ?

    thanks everyone for your kind help and instruction!

    I advice to use the SET STATISTICS_IO ON and SET STATISTICS_TIME ON. It will give you simple yet meaningful info for optimizing your query or choosing the best performing query.

    Although, referring the Execution Plan is also a good option. My point is if something can be achieved with small knife then there is no need of big Axe.

    One more thing to remember is never ride solely on Estimated Plan and if you find huge difference in Estimated and Actual Plan then you may need to update the Statistics and/or clear the Plan Cache.

  • So, execution plans are a poor way to measure query behavior. The best way to measure query behavior is by capturing execution time, I/O and CPU. Execution plans are how you understand why you're seeing specific behaviors, not a measure. Now, you can use the values in execution plans to understand how the optimizer is arriving at a plan and which of those plans the optimizer thinks will be better. However, you still need to do the measures to truly know which is which.

    As far as estimated and actual plans, a bunch of us are working with Microsoft to get rid of these terms. They're horribly misleading. Here's the truth of the matter.

    There are execution plans.

    There are execution plans plus runtime metrics.

    The first of these things, execution plans, are what are currently referred to as "estimated" execution plans. They're not. They are execution plans. The only "estimate" about them is what the optimizer is doing.

    Then, you can capture runtime metrics and overlay them on an execution plan. This is what is currently referred to as an "actual" plan. There's nothing "actual" about it. The plan is the same as the "estimated" plan (recompiles not withstanding, difference in time and location of the generation of plans not withstanding, these are the only differences). Then it has runtime metrics, the "actuals", layered on top.

    So, shortest possible answer to all your questions: No. Don't use the plans as measuring sticks. They are behavior explainers.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Brahmanand Shukla wrote:

    I advice to use the SET STATISTICS_IO ON and SET STATISTICS_TIME ON. It will give you simple yet meaningful info for optimizing your query or choosing the best performing query.

    Be a little cautious here. SET STATISTICS IO & TIME actually adds a little overhead to the measurements and can skew some results, especially on queries that are already running fast. This is because the cost of returning the information from IO & TIME adds to the runtime measure of the query. A much more accurate and safer measure is to use Extended Events. I reserve STATISTICS IO when I want to see individual IO across objects in a query.

    Brahmanand Shukla wrote:

    One more thing to remember is never ride solely on Estimated Plan and if you find huge difference in Estimated and Actual Plan then you may need to update the Statistics and/or clear the Plan Cache.

    Again, some degree of caution. Differences between estimates & actuals could be more than just statistics. It can be caused by the lack of statistics (table variables, multi-statement UDFs) or code that precludes the use of statistics or even forces particular behaviors (FAST 1 for example). So immediately charging off to update stats may not be the answer.

    Also, instead of clearing the plan cache, we could look to removing a single plan from cache, a sniper rifle instead of an atom bomb. DBCC FREEPROCCACHE(@planhandle)  will remove just one plan, a much safer suggestion.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Nicely explained Grant !

    In addition to what you mentioned, I would like to add more point.

    One of the top contributor to poor performance of the query is high number of Physical Reads. More Physical Reads means more CPU Requirements and more processing time. If we can limit the Physical Reads with the help of proper Index selection then it will solve more than half of the performance issue. Lot of performance issues happens around the Physical Reads only either due to missing/improper index selection or due to limited physical memory.

    STATISTICS_IO and TIME is not recommended to be kept ON at all the times as you rightly mentioned and should be immediately SET to OFF once we have the stats. But I feel there is no harm in using it but with due care of considering the production load etc.

    Be it Profiler, Extended Events,  Execution Plan or STATISTICS IO/TIME. Every option has some associated overhead and should be used with due care in Production.

    I hope I'm making sense !

  • Brahmanand Shukla wrote:

    STATISTICS_IO and TIME is not recommended to be kept ON at all the times as you rightly mentioned and should be immediately SET to OFF once we have the stats. But I feel there is no harm in using it but with due care of considering the production load etc.

    Be it Profiler, Extended Events,  Execution Plan or STATISTICS IO/TIME. Every option has some associated overhead and should be used with due care in Production.

    I hope I'm making sense !

    Absolutely. The caveat I will add is that when you capture STATS IO/TIME, the time measurement is impacted by the capture. In short, not only is there a little more overhead than using ExEvents (which, as you note, does have some overhead, all observer effect rules are in place), but the time measurement is made less than accurate. That's the reason I've stopped using these except when I want the object-level IO.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Thanks everyone for your good suggestion and  making me much clearer about execution plan. but I have one question about the physical reads and logic reads when executing two select statement.

    if we use 2 methods to encode the sql script to get the same result , if the physical  reads count of the 1st SQL scrips is much higher than the 2nd one, but it logic reads in the 1st script is much lower than the 2nd one, if the case like this, how to   evaluate which sql scripts performance better ? thanks!

  • Logical Reads means reading from Memory and Physical Reads means reading from Disk. Physical Reads involves Disk IO and requires more CPU and processing time to load the Pages in the memory. For e.g. If your query requested 10 pages and all these 10 pages are found in Memory (Logical Read) then there won't be any disk based reads (Physical Read).

    Based on what you have mentioned, if same query on the same data set is giving different Logical Reads then it means Indexes of your DB with high Logical/Physical Reads is fragmented due to which more pages are read from Memory/Disk. You should also check PLE (Page Life Expectancy) of both the instances.

    I would also recommend you to compare the H/W Infra of both the servers and also the SQL Server Instance and DB settings before you compare the stats, plan or performance etc.

  • Dear Brahmanand Shukla,  thank you for your kind help!

    I meant I encoded 2 SQL statements to get the same result ,  if the numbers of physical reads plus  its logic reads of  first sql statement is bigger than the the numbers of physical reads plus  its logic reads of  second sql statement, can we say the performance of first sql statement is lower than the second SQL statement ?   thanks!

  • 892717952 wrote:

    Dear Brahmanand Shukla,  thank you for your kind help!

    I meant I encoded 2 SQL statements to get the same result ,  if the numbers of physical reads plus  its logic reads of  first sql statement is bigger than the the numbers of physical reads plus  its logic reads of  second sql statement, can we say the performance of first sql statement is lower than the second SQL statement ?   thanks!

    Absolutely correct in most of the cases !

    Less Physical/Logical Reads means you have less-fragmented indexes and/or strong filter clause.

    I saw your query. You second query with OPENQUERY has a WHERE condition and only the filtered rows are used in JOIN and for GROUPING. That is why it's performing better.

    As a suggestion, avoid using linked server directly in joins unless it's just a one time task. Instead, dump the filtered output in a temporary table and then use the temporary table in join.

  • Dear Brahmanand Shukla,

    Got it , thank you for  your kind help!

     

     

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply