Updating stats make things worse

  • We have an older 2008 R2 Standard server that has been migrated to SQL 2012 Enterprise. There was very little maintenance done on the older server however a particular query takes about 1 minute to run. On the new server it takes over 10 minutes. Also on the old server if you rebuild the indexes or update the stats the query starts running over 10 minutes. 

    The strange thing is if you drop the stats on the new server and script out the stats from the original server with the old stats (just for 1 table), the query runs in 1 minute again.

    Anyone come across this behavior before?

  • I think you will find if you set the Database Compatibility level in the options of the database you are running the query against to Sql Server 2008 R2 (the old version) that your query will again run in the expected one minute.  From what I read, the execution plan is altered significantly and thus the query takes longer to run.  The only other option is to completely re-write the query for better optimization for the newer version.

  • RVSC48 - Monday, November 20, 2017 8:32 AM

    I think you will find if you set the Database Compatibility level in the options of the database you are running the query against to Sql Server 2008 R2 (the old version) that your query will again run in the expected one minute.  From what I read, the execution plan is altered significantly and thus the query takes longer to run.  The only other option is to completely re-write the query for better optimization for the newer version.

    But the new CE was on SQL Server 2014 so that may not apply in this case.

    Sue

  • I agree with Sue here; the new CE wasn't indroduced in 2012.  Plus the OP indicated that updating statistics on the 2008 one caused the 10 minute run as well.

    Back on topic, I would start by looking at the execution plan before and after updating the statistics.  Updating the statistics shouldn't drop the performance that badly, but it obviously can happen.  I had that happen on one of my systems.  A process that normally took between 15 and 50 seconds (depending on how busy the system was at the time), suddenly jumped to taking 10+ minutes after updating the statistics one day.  Adding 2 indexes onto 1 table that was being used in the query dropped it back down to between 15 and 50 seconds.  Later on, the indexes got disabled on one of the tables, and the process jumped back up to taking exessivly long to complete.  Rebuilding the 2 indexes solved it again.  In my case, I believe the problem was that the query compilation was timing out; SQL didn't have enough time to figure out the most optimal plan and was just using a "good enough" plan.  I didn't investigate it far enough to determine properly if this was the problem as I was more concerned on fixing the performance issue. This blog post may help:
    https://www.brentozar.com/blitzcache/compilation-timeout/

    Are there any indexes you could add to improve the performance of the query?  If so, I would recommend adding them on a test version of the system and doing some performance testing on other things that use that table (other jobs, SP's, applications, reports, etc) to make sure the new index(es) don't hurt the other performance too badly.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • If you can, post the actual execution plans of the query when it runs in 1 minute and when it runs in 10 minutes. 

    That could provide some clues about what's happening.

    Cheers!

  • Sounds like sampled stats might be resulting in better plans than full scan stats (based on the index rebuild). It's certainly possible. However, it could be a number of other things. The trick is going to be getting the exec plan and understanding which plan is generated when it runs fast versus which plan is generated when it runs slow and understanding what row counts are resulting in the faster plan.

    Also, it is possible that changes in the optimizer between 2008 and 2012 could lead to the regression. It's not just the CE that's likely to lead to this issue (although, the CE changes in 2014 are very likely to lead to this issue).

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Jacob Wilkins - Monday, November 20, 2017 9:33 AM

    If you can, post the actual execution plans of the query when it runs in 1 minute and when it runs in 10 minutes. 

    That could provide some clues about what's happening.

    Cheers!

    So we moved the original database to  a new server 2014, we ran the query and got the following execution plan 9m:40s
    https://www.brentozar.com/pastetheplan/?id=rycQrqxeM

    We dropped and copied the stats from the old server (just from the gifts table) runs in 1:40!
    https://www.brentozar.com/pastetheplan/?id=rJ7kS5leM

  • jcourtjr 29296 - Monday, November 20, 2017 10:59 AM

    So we moved the original database to  a new server 2014, we ran the query and got the following execution plan 9m:40s
    https://www.brentozar.com/pastetheplan/?id=rycQrqxeM

    We dropped and copied the stats from the old server (just from the gifts table) runs in 1:40!
    https://www.brentozar.com/pastetheplan/?id=rJ7kS5leM

    Yes, but we don't know for certain what the cause is. It may be the new cardinality estimation engine since you're saying 2014. You were referring to 2012. Which is it?

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Grant Fritchey - Monday, November 20, 2017 11:06 AM

    jcourtjr 29296 - Monday, November 20, 2017 10:59 AM

    So we moved the original database to  a new server 2014, we ran the query and got the following execution plan 9m:40s
    https://www.brentozar.com/pastetheplan/?id=rycQrqxeM

    We dropped and copied the stats from the old server (just from the gifts table) runs in 1:40!
    https://www.brentozar.com/pastetheplan/?id=rJ7kS5leM

    Yes, but we don't know for certain what the cause is. It may be the new cardinality estimation engine since you're saying 2014. You were referring to 2012. Which is it?

    So the original issues was that it ran on the older server 2008R2, which had no index or stats maintenance for years, it ran ~1:40 seconds, when we copied that database to a 2012 server it started running in 10 minutes, after we updated stats and rebuilt indexes.

    We took the 2008R2 database and restored it to a 2014 server (for a test). We updated the stats and rebuilt indexes and it ran in 9:40, then we dropped the stats and copied them from the 2008R2 server, reran the query and it ran in 1:40.

    In the meantime I will see if I can get the actual execution plans from the original 2008R2 and 2012 server.
    -John

  • It looks like straight up differences in row counts. The fast one is estimating 1 row, the slow one 278.402. In one operator I saw estimated number of rows 514,908 with the actual being 2,844,351,792. That's a HUGE disparity. By the way, a table scan retrieve 5,000 rows from the 284 billion. I'm amazed this query ran in under a minute.

    So, both plans are timing out, so neither is necessarily optimized. What I'm seeing though is that the more accurate row counts in the second plan is leading to table spools in order to try to minimize the number of times it goes back to the well. I suspect, in this instance, it's not helping, but the optimizer timed out getting to the plan, so it's just the cheapest current plan.

    However, the statistics still look out of date because there is a VERY wide disparity again with the slower plan showing 4,135,392 actual rows to the estimated 516,924. Now that's on a table scan. But that brings up another question. One table scan is for 2.8 billion rows the other is for only 4 million. Are we looking at the same data sets between these two plans? It looks very different.

    For us to help, for you to get your answer, we need to really be clear that we're comparing apples to apples. The same databases and structures and data on both sides. If there's variation beyond 2008 to 2012, it just makes everything harder.

    After we verify that we're looking at apples on both sides, I'd suggest updating statistics with a FULL SCAN to get them to be as accurate as possible. Then, capture and post the plan again. These wild disparities are just going to prevent anything from compiling accurately.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • By the way, being able to use 2017 plan comparison for these things was a life saver for quickly going through the plan.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • I keep going back to it. You also have some wild disparity in the other direction, estimated 500k rows, actual 55. I think with such completely out of wack statistics, the poor optimizer can't make any good choices for you.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Grant Fritchey - Monday, November 20, 2017 1:14 PM

    By the way, being able to use 2017 plan comparison for these things was a life saver for quickly going through the plan.

    Thanks for the reminder - I always forget about that feature. It is great for something such as this.

    Sue

  • Also, just to be frank, with the query being as it is, even with statistics as tailored for this query as you can, it's going to be difficult/impossible for SQL Server to estimate the rows coming out of a lot of the operators.

    There are lot of ORs, with some nesting even, and the join criteria are also not exactly trivial. These are the sorts of things that give the optimizer fits; as you're seeing, you might get "acceptable" performance from some very specific statistics that happen to give you the query plan you like, but unless you use something like a plan guide you're going to be at the mercy of the optimizer, which is going to be very, very confused by this query.

    You'll likely be better off looking at rewriting the query, investigating approaches like splitting it into multiple queries and such.

    Cheers!

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply