Strange Query Optimizer Behavior

  • This is SQL Server 2012 w/ SP1 developer edition running on vmware (4 core, 64 GB ram).

    This is a select statement against one large (200M+ rows) table that has zip code as one column. The clustered index is on zip + a unique int PIN column. I have observed this behavior with one additional column in the WHERE clause as well as several different criteria.

    Select statements against a small zip code range return in 10 seconds or less. Select statements that are nationwide, however, are using a clustered index seek on zip code range, which makes no sense to me since that is the entire table.

    If I use WITH (FORCESCAN) hint, this makes the overall execution time much faster (from 8 minutes to 3 minutes, for example), but these queries are controlled by an application (e.g. not easy to change), so I would like to try and understand why the query optimizer would pick a seek over a scan in this case, given that there are full statistics on the zip column.

    I would appreciate any insight that the community might have in this matter.

  • 1) Please post sample actual queries for both.

    2) Please attach actual query plans for both too.

    3) 3 minutes to scan 200M rows is HORRIBLE performance, even if the rows are fat!! :w00t: Low RAM and poor IO, or blocking issues?

    4) Did you try OPTION (RECOMPILE) on each query? Could be simple parameter sniffing/plan cache issue.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I'll have to see if the higher ups are okay with me posting that.

    Definitely not blocking because this is a test database so no other queries are running against it (I created it to test strategies to improve performance, such as compression, changing clustered index key, etc). It is also running on a dev/staging server so not a heavily utilized VM.

    Backend is EMC SAN so I'd be surprised if IO is the issue, although it is reading a good amount of data (each test query returns several million rows into a temp table).

    I don't think parameter sniffing is a factor since I'm running it as ad hoc TSQL and clearing cache inbetween tests (Again, NOT a production box). To me it just looks like the query optimizer is choosing a really poor choice of plans.

    TheSQLGuru (1/7/2014)


    1) Please post sample actual queries for both.

    2) Please attach actual query plans for both too.

    3) 3 minutes to scan 200M rows is HORRIBLE performance, even if the rows are fat!! :w00t: Low RAM and poor IO, or blocking issues?

    4) Did you try OPTION (RECOMPILE) on each query? Could be simple parameter sniffing/plan cache issue.

  • Added sqlplan as requested (with some name changes).

    The table is roughly 68 GB if that helps (this is with page compression).

    Looking at both plans in sql sentry plan explorer, I can see that the optimizer thinks the cost of the index seek plan is slightly less, so this could just be a case of cost-based optimization.

  • Backend is EMC SAN so I'd be surprised if IO is the issue, although it is reading a good amount of data (each test query returns several million rows into a temp table).

    HAHAHAHAHAHA ... oh, you were being serious, weren't you? 😀 I have had to have quite a number of talks with clients (including C-level management) about how their 6, 7 or even 8 figure SAN is CAUSING SQL Server to be slow instead of helping to make it rip along!!!

    Oh, and returning millions of rows into a temp table alone has the ability to skew runtimes by the 200%+ that you are seeing. Sometimes writes go fast on a SAN and sometimes all the caches are full and you get floppy-disk speeds. :blink:

    I will review the plans and post separately on them.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • 1) 68GB is pretty damn fat for just 200M+ rows!!

    2) Those are estimated plans. Not helpful for what I need to see. Can you provide actuals?

    3) I would play around with variations of compound statistics with zip, col01 and col08. Depending on how specific the value distribution is for zip, col01 and col08 and your normal query patterns different orders of those 3 fields could really help the optimizer out I bet.

    4) interesting identity definition!! 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I'll see about posting actual plans later today. The table these queries are going against has 177 columns, the majority of which could potentially be used in the WHERE clause. The only definite field is, of course, zip. Solving this with permutations of statistics would become impractical, I believe.

    Personally, my preferred solution for this is to just use a columnstore index on the whole table, as testing against that worked pretty well. Unfortunately, this table has nightly updates so that is not possible with sql server 2012. I am trying to give the application the best performance I can with the least amount of application change in the short term while the more extensive changes are planned out.

    I've only used HP and EMC SAN so far in my career, so you probably have more experience than I, there. According to the hosting company, these supposedly use SSD for your "hottest" data, and so on down the line with 3 tiers of disk performance. Staging is on its own pool of disks so I don't think it would be getting interfered with by prod or the various other pools carved out of the SAN.

    TheSQLGuru (1/7/2014)


    1) 68GB is pretty damn fat for just 200M+ rows!!

    2) Those are estimated plans. Not helpful for what I need to see. Can you provide actuals?

    3) I would play around with variations of compound statistics with zip, col01 and col08. Depending on how specific the value distribution is for zip, col01 and col08 and your normal query patterns different orders of those 3 fields could really help the optimizer out I bet.

    4) interesting identity definition!! 🙂

  • Actual plans.

    Looks like this is just cost-based optimization, which is interesting but somewhat surprising.

  • Ernest Libertucci (1/8/2014)


    Actual plans.

    Looks like this is just cost-based optimization, which is interesting but somewhat surprising.

    1) It is the 15% mismatch between estimated and actual rows that is getting you here (and yes, in the grand scheme of things that is a pretty small mismatch!!). If the estimated rows were closer the query cost would rise to over 10000 and you would get the scan.

    2) Why is DOP only 1?!?!

    3) I wonder if you are getting hit with too much nested loop prefetch. In a non-production environment, try adding this to your query (without the forcescan):

    OPTION (RECOMPILE, QUERYTRACEON 8744)

    See if that gets you less reads and better seek performance on the nested loop operations.

    4) you didn't use OPTION (RECOMPILE) on the non-scan query. Hmm, I think I recall you saying you flushed the cache between runs maybe? Was just wondering if that would make a difference in estimates.

    5) Lots of columns that could be predicates is a tough nut to crack. You are correct - adding in a slew of multi-columns stats may not be workable. But you never know. I would do a proof of concept in any case if I were you with the columns involved in the current query.

    6) Depending on how many days you have in your table you COULD use SQL 2012 Column Store Index though. CSIs are fully partition aware/enabled. You can do up to 15K partitions if need be, which normally allows for daily partitioning which will allow a simple ETL process to do the daily population into a stage partition and then create column store index and swap it in. Voila!! Note that if you need to do ANY updates to existing data you are screwed ...

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • 1) Strangely, I had just created the clustered index (zip,pin) on the table so it should have had 100% knowledge (statistics) of that field (e.g. the range that it is running encompasses the entire table).

    2) Not sure about that, the server does not have maxdop set and I did not use any maxdop hint.

    3) This did not seem to improve things, but thanks because I had not previously heard of that trace flag. Also, changing the text of the query is what I'm trying to avoid, since that would be an application change.

    4) Correct, I'm cleaning plan cache in between runs, although since the text of both queries is different, in theory it would not use the same execution plan anyway, since the hash would be different.

    5) Stats exist for the predicate, although I have not verified they are FULLSCAN at present. I may test this if I have time.

    6) Unfortunately, this is not a situation that gets time based incremental loads, and the nightly updates are not predictable, so partitioning is not really an option. I do have a plan that would circumvent this, but it is something I need to plan out with the ETL/Application teams in order to implement, so not a short term solution (I am the DBA, of course).

    Thanks for taking the time to reply, and by the way I enjoyed your presentation at 2013 PASS 🙂

    TheSQLGuru (1/8/2014)


    Ernest Libertucci (1/8/2014)


    Actual plans.

    Looks like this is just cost-based optimization, which is interesting but somewhat surprising.

    1) It is the 15% mismatch between estimated and actual rows that is getting you here (and yes, in the grand scheme of things that is a pretty small mismatch!!). If the estimated rows were closer the query cost would rise to over 10000 and you would get the scan.

    2) Why is DOP only 1?!?!

    3) I wonder if you are getting hit with too much nested loop prefetch. In a non-production environment, try adding this to your query (without the forcescan):

    OPTION (RECOMPILE, QUERYTRACEON 8744)

    See if that gets you less reads and better seek performance on the nested loop operations.

    4) you didn't use OPTION (RECOMPILE) on the non-scan query. Hmm, I think I recall you saying you flushed the cache between runs maybe? Was just wondering if that would make a difference in estimates.

    5) Lots of columns that could be predicates is a tough nut to crack. You are correct - adding in a slew of multi-columns stats may not be workable. But you never know. I would do a proof of concept in any case if I were you with the columns involved in the current query.

    6) Depending on how many days you have in your table you COULD use SQL 2012 Column Store Index though. CSIs are fully partition aware/enabled. You can do up to 15K partitions if need be, which normally allows for daily partitioning which will allow a simple ETL process to do the daily population into a stage partition and then create column store index and swap it in. Voila!! Note that if you need to do ANY updates to existing data you are screwed ...

  • I wonder if this is a simple as putting a constraint on zip. Will test tomorrow.

  • I had totally forgotten about the fact that you cannot change the query code. With that limitation, combined with the fact that the queries can use a wide array of columns (and values/ranges for those columns), I think you are absolutely stuck with just one option: a slew of multi-column statistics that will hopefully get you very close estimates and the best plan most of the time. Plan guides are out. And even stats could be unhelpful in some cases without OPTION (RECOMPILE) on every query sent in by the app because auto-parameterization could get a plan cached for the first set of values of a given WHERE clause that is reused for an identical query with different filter values. Bummer ...

    Oh, I did not present at PASS 2013. I did present at 23 SQL Saturdays last year though! 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Well, thanks for looking into this, anyway, I appreciate it.

    And, my mistake, it was SQL Saturday in Miami that I saw your presentation!

  • A little bit of analysis of your problem is this. You say that the query may return server million records. Lets just say there are 2,000,000 results . Having a 5 digit Zip Code will divide you data into about 100,000 regions (actually less because not all 5 digits numbers are valid zip codes). On a nation wide search each zip code on average will hold 20 matching records. In others knowing the zip code will do little good since if the records are spread evenly across zip codes, just about every zip code will need to be searched. The optimizer than has a choice. It can either scan the entire table, or it can use some other index to narrow the results and then lookup each matching in a loop. Even is there is an index available, if there are 2,000,000 results this requires 2,000,000 lookups. Even if this is only 1% of the data the server must still loop through each item and do a lookup. At some point the cost of all these lookups exceeds the cost of reading the entire table. That is why the optimizer chooses a full table scan on a large search.

    If you are searching a smaller range of zip codes, the zip code information in combination with the clustered index, is useful and will allow the server to only read a small portion of the table.

    I think one of the rules that database developers should chisel in stone and hand to any business asking for ad hoc queries is this:

    Large table size + full ad hoc query capability = slow performance :crying:

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply