SQL Statistics on a sequential DATE table

  • I have seen and use a DATE table for a db that I use for weather reporting

    The date table is typical of these and this code will generate an example of it with data from 2007 to 2015 (I have kept it simple to 2 columns for this msg)

    --drop table Dates2

    go

    CREATE TABLE [dbo].[Dates2](

    [DateValue] [datetime] NOT NULL,

    [day] [int] NULL,

    CONSTRAINT [PK_Dates2] PRIMARY KEY CLUSTERED

    (

    [DateValue] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    go

    declare @CurrentDate datetime = '2007-1-1'

    while (@CurrentDate < '2015-1-1')

    begin

    insert into Dates2(DateValue, [day])

    values (@CurrentDate, datepart(day, @CurrentDate))

    set @CurrentDate= dateadd(day, 1, @CurrentDate)

    end

    go

    UPDATE STATISTICS Dates2 with fullscan, all

    go

    DBCC SHOW_STATISTICS (Dates2, PK_Dates2) with histogram

    go

    Now my question is that I know that there is exactly 1 record for each day. Also the statistics that are generated although random intervals are used, are accurate for each period.

    So for any period I know that say between Jan 1, 2012 and Jan 21, 2012 there are exactly 21 records

    select * from dates2 where datevalue between '2012-01-01' and '2012-01-21'

    If you turn on Actual Query plan and hover over the actual index seek the estimated rows will be correct or very close to it (20.5446)

    But do the same for this query which has variables.

    declare @then DateTime

    declare @Today DateTime

    set @Today = dateadd(dd,datediff(dd,0, getdate()),0)

    set @then = dateadd(day, -6, @Today)

    select @today

    select DateAdd(day, -6, @Today)

    select * from dates2 where datevalue between '2012-02-22' and '2012-03-11'

    select * from dates2 where datevalue between DateAdd(day, -6, @Today) and @Today

    select * from dates2 where datevalue between @then and @Today

    select * from dates2 where datevalue between '2012-03-07' and '2012-03-13'

    The hard coded values "for me" are returning an estimted row count of 20.5446 but the ones with the variables are returning an estimated row count of 262.98? More than 10 times as many.

    I am using SQL Server 2012 RTM for this test (but tested with SQL 2008 SP1 and got similar results)

    This makes all my joins using this table have statistics that look wrong and updating, dropping, freeing proc cache and restarting sql returns the same data.

    Any cluses why the variables which is probably how most people query tables are so different?

    Chris

  • Hello,

    I've test your data and I've the same results as you on my SS2k8 R2 RTM.

    I can easily explain the estimated ~20 rows.

    For the 268 estimated rows it seems that the engine is taking 2 steps (or two times the same) from the histogram to compute the estimated size which it can't resolved a precise estimate at first and then fallback on the configured default value of around 30% of the rows which gives ~268. (When looking at the histogram, it clearly shows that all rows between steps are unique which make me wonder why the engine seems to use two steps to resolve the estimate row size, maybe checking into the XML plan will give you more details.)

    Only when I'm taking this into account I can reproduce the plan estimated rows size.

    But I would need to go much deeper and I can't right now.

    Hope that will shed some light for you.

  • This: http://sqlinthewild.co.za/index.php/2008/02/25/parameter-sniffing-pt-2/

    The use of variables is not all that common. More common would be parameters, which behave similarly to constants when it comes to row estimations

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thank you Gila,

    I will need to update that part of the sniffing issue...

  • GilaMonster (3/15/2012)


    This: http://sqlinthewild.co.za/index.php/2008/02/25/parameter-sniffing-pt-2/

    The use of variables is not all that common. More common would be parameters, which behave similarly to constants when it comes to row estimations

    Thanks for the link - that explains it to me quite well. I have never used Query Hints and always thought recompile was bad.

    Without the recompile hint

    declare @Today DateTime

    SET @Today = dbo.F_START_OF_DAY(GETDATE())

    SELECT Dates.DateValue as Date, isnull(WeatherDailyStats.RainFall,0) as Rainfall

    FROM Dates LEFT OUTER JOIN

    WeatherDailyStats ON Dates.DateValue = WeatherDailyStats.Date

    WHERE (Dates.DateValue between DATEADD(day, - 6, @Today) and GETDATE())

    and this produced this query plan

    and with the recompile hint

    declare @Today DateTime

    SET @Today = dbo.F_START_OF_DAY(GETDATE())

    SELECT Dates.DateValue as Date, isnull(WeatherDailyStats.RainFall,0) as Rainfall

    FROM Dates LEFT OUTER JOIN

    WeatherDailyStats ON Dates.DateValue = WeatherDailyStats.Date

    WHERE (Dates.DateValue between DATEADD(day, - 6, @Today) and GETDATE())

    option ( recompile)

    As you can see quite a different plan is generated - and running them both and comparing the % of each plan

    • First query plan was rated at 74% of the batch, 26% for the 2nd

    First Query

    Table 'WeatherDailyStats'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Dates'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Elapsed time = 32 ms.

    Second Query

    Table 'WeatherDailyStats'. Scan count 0, logical reads 14, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Dates'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Elapsed time = 44ms.

    I assume the second is slower since there is more logical reads - so technically the first query is better even though the stats are not correct and it is more complex 74% of the batch.

    If I wss running this query 1000's of times I would go with the first option.

    chris

  • 4 or 14 logical reads are comparable, and they're too close in time to make any broad conclusions on which is better. At that speed, the measuring inaccuracies are significant.

    Personally I'd use neither and make sure my query (which I assume will be in a procedure) uses the parameters that are passed in, not variables assigned within the procedure (if possible)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply