Help with the query and index.

  • I've a table which has about 15 columns, I'll just put 3 columns here in the script that we would be using in the script.

    CREATE TABLE "schema"."tblTest"

    (

    "TestId" uniqueidentifier NOT NULL ,

    "StartDateTime" datetime2 NOT NULL ,

    "EndDateTime" datetime2 NULL

    )

    TestId is PK with a clustered index. Also, there is 1 nonclustered index on StartDateTime. No index is present on Enddatetime.

    The query I need to rewrite\optimise is like this:

    SELECT TOP (500) test.[TestId]

    FROM tblTest as test

    WHERE COALESCE(test.EndDateTime, test.StartDateTime) < @Date;

    The table will have about 1 million records, and about 1/16th (about 6-7%) of that would be part of the output at any given time. Right now this query does clustered index scan. I would like it to do seeks instead.

    Solutions I tried:

    1) Add a computed column with definition: (case when EndDateTime is not null then EndDateTime Else StartDateTime End) and have an index on this. This works perfect but I cannot use this because we use microsoft sync framework that does not like computed columns.

    2) Indexed view with computed column. Again, we cannot use this as we need to support SQLExpress.

    3) Completely re-write the query by splitting it into 2 queries and union the 2 results. Create 2 separate indexes on enddatetime and startdatetime. This seems little overkill maybe.

    ;with cte as

    (SELECT test.[TestId]

    FROM schema.tblTest as test

    WHERE test.EndDateTime < @Date

    union all

    SELECT test.[TestId]

    FROM schema.tblTest as test

    WHERE test.EndDateTime is null and test.StartDateTime < @Date)

    select top 500 testid from cte

    Any help on finding a better solution would be appreciated.

    Thanks in advance.

  • Why not just split the two conditions? No need for a cte here.

    SELECT TOP (500) test.[TestId]

    FROM tblTest as test

    WHERE test.EndDateTime < @Date OR test.StartDateTime < @Date;

    The next question is, which 500 rows do you want? Do you care? Is it ok if the order changes randomly? Unless you add an order by to this query you have no way of knowing which 500 rows it will return.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (8/7/2013)


    Why not just split the two conditions? No need for a cte here.

    SELECT TOP (500) test.[TestId]

    FROM tblTest as test

    WHERE test.EndDateTime < @Date OR test.StartDateTime < @Date;

    The next question is, which 500 rows do you want? Do you care? Is it ok if the order changes randomly? Unless you add an order by to this query you have no way of knowing which 500 rows it will return.

    Thanks Sean.

    We have to use StartDateTime only if Enddatetime is null. We are doing batching that's why 500. That's configurable and we do not care about ordering.

  • Solutions I tried:

    1) Add a computed column with definition: (case when EndDateTime is not null then EndDateTime Else StartDateTime End) and have an index on this. This works perfect but I cannot use this because we use microsoft sync framework that does not like computed columns.

    2) Indexed view with computed column. Again, we cannot use this as we need to support SQLExpress.

    3) Completely re-write the query by splitting it into 2 queries and union the 2 results. Create 2 separate indexes on enddatetime and startdatetime. This seems little overkill maybe.

    ;with cte as

    (SELECT test.[TestId]

    FROM schema.tblTest as test

    WHERE test.EndDateTime < @Date

    union all

    SELECT test.[TestId]

    FROM schema.tblTest as test

    WHERE test.EndDateTime is null and test.StartDateTime < @Date)

    select top 500 testid from cte

    Any help on finding a better solution would be appreciated.

    Thanks in advance.

    What is wrong with your solution 3? It seems like an ok solution to me?

    Another possibility could be to create a new regular column with a non-clustered index and let a trigger populate it with (case when EndDateTime is not null then EndDateTime Else StartDateTime End)

    This would of course increase the storage requirements and slow down inserts, but at lest the select would be extremely fast.

    /SG

  • Do you really need only the TestId column in the result set? You don't all the columns?

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • Thank you everybody for the reply. Sorry for replying late as I got caught up in something else.

    @stefan,

    I don't really mind increased storage as long as I get best performance for reads. Normal column with trigger is a good idea. We can't really use triggers as we have distributed environment, data is sync'd by MS sync framework.

    @scott,

    I only need TestId as I want to delete those older tests.

  • You don't need separate indexes on StartDateTime and EndDateTime, you need them in the same index.

    SQL will only be able to do a seek on that index if it's headed by EndDateTime.

    So, you could:

    1) Create a new index on ( EndDateTime, StartDateTime). You can write the combined conditions as an "OR" and SQL should be able to do seeks on that index. For the purposes of the SELECT, this is the best option.

    OR

    2) Add the EndDateTime to the existing nonclustered index with StartDateTime. SQL will have to scan the entire index, but not the main table itself.

    Btw, if possible, determine if SQL is actually using the existing nonclustered index -- if not, get rid of it. Naturally that would negate option 2 and leave only option 1.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • ScottPletcher (8/9/2013)


    You don't need separate indexes on StartDateTime and EndDateTime, you need them in the same index.

    SQL will only be able to do a seek on that index if it's headed by EndDateTime.

    So, you could:

    1) Create a new index on ( EndDateTime, StartDateTime). You can write the combined conditions as an "OR" and SQL should be able to do seeks on that index. For the purposes of the SELECT, this is the best option.

    OR

    2) Add the EndDateTime to the existing nonclustered index with StartDateTime. SQL will have to scan the entire index, but not the main table itself.

    Btw, if possible, determine if SQL is actually using the existing nonclustered index -- if not, get rid of it. Naturally that would negate option 2 and leave only option 1.

    Thanks Scott. I tried option 1 and it still does a scan on the non-clustered index with enddatetime.

  • mayurkb (8/9/2013)


    ScottPletcher (8/9/2013)


    You don't need separate indexes on StartDateTime and EndDateTime, you need them in the same index.

    SQL will only be able to do a seek on that index if it's headed by EndDateTime.

    So, you could:

    1) Create a new index on ( EndDateTime, StartDateTime). You can write the combined conditions as an "OR" and SQL should be able to do seeks on that index. For the purposes of the SELECT, this is the best option.

    OR

    2) Add the EndDateTime to the existing nonclustered index with StartDateTime. SQL will have to scan the entire index, but not the main table itself.

    Btw, if possible, determine if SQL is actually using the existing nonclustered index -- if not, get rid of it. Naturally that would negate option 2 and leave only option 1.

    Thanks Scott. I tried option 1 and it still does a scan on the non-clustered index with enddatetime.

    And your problem is what exactly?

    IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL

    DROP TABLE #SomeTestTable;

    --===== Declare some obviously named variables

    DECLARE @NumberOfRows INT,

    @StartDate DATEtime,

    @EndDate DATEtime,

    @Days DECIMAL(10,2), --This is still the "range"

    @StartValue INT,

    @EndValue INT,

    @Range INT;

    --===== Preset the variables to known values

    SELECT @NumberOfRows = 1000000, -- a million rows

    @StartDate = '2010', --Inclusive

    @EndDate = '2020', --Exclusive

    @Days = DATEDIFF(dd,@StartDate,@EndDate),

    @StartValue = 100,

    @EndValue = 900,

    @Range = @EndValue - @StartValue + 1;

    --===== Create "random constrained" integers within

    -- the parameters identified in the variables above.

    SELECT

    ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),

    StartDateTime,

    EndDateTime = DATEADD(DD,ABS(CHECKSUM(NEWID()))%100,StartDateTime)

    INTO #SomeTestTable

    FROM (

    SELECT TOP (@NumberOfRows)

    StartDateTime = DATEADD(DAY,ABS(CHECKSUM(NEWID())) % @Days,@StartDate)

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ) d;

    UPDATE t SET EndDateTime = NULL

    FROM #SomeTestTable t

    WHERE ID IN (

    SELECT TOP 100000 ID FROM #SomeTestTable ORDER BY NEWID())

    CREATE UNIQUE CLUSTERED INDEX ucx_ID ON #SomeTestTable (ID)

    CREATE INDEX ix_StartDateTime ON #SomeTestTable (StartDateTime)

    CREATE INDEX ix_EndDateTime ON #SomeTestTable (EndDateTime)

    CREATE INDEX ix_StartDateTimeEndDateTime ON #SomeTestTable (StartDateTime,EndDateTime)

    CREATE INDEX ix_EndDateTimeStartDateTime ON #SomeTestTable (EndDateTime,StartDateTime)

    -- #########################################################################

    --

    -- Q1 Index scan ix_EndDateTimeStartDateTime; Duration about 60ms

    DECLARE @Date DATE = '20120513'

    SELECT TOP (10000) test.ID

    FROM #SomeTestTable as test

    WHERE COALESCE(test.EndDateTime, test.StartDateTime) < @Date;

    go 5

    --

    -- Q2 Index seek ix_EndDateTime; Duration about 40ms

    DECLARE @Date DATE = '20120513'

    SELECT TOP (10000) test.ID

    FROM #SomeTestTable as test

    WHERE (test.EndDateTime < @Date)

    go 5

    --

    -- Q3 index scan ix_EndDateTimeStartDateTime; Duration about 50ms

    DECLARE @Date DATE = '20120513'

    SELECT TOP (10000) test.ID

    FROM #SomeTestTable as test

    WHERE (test.EndDateTime < @Date) OR (test.EndDateTime IS NULL AND test.StartDateTime < @Date)

    go 5

    --

    -- Q4 index scan ix_EndDateTimeStartDateTime; Duration too short to measure

    DECLARE @Date DATE = '20120513'

    SELECT TOP (500) test.ID

    FROM #SomeTestTable as test

    WHERE COALESCE(test.EndDateTime, test.StartDateTime) < @Date;

    go 5

    --

    -- Q5 index seek ix_EndDateTime; Duration too short to measure

    DECLARE @Date DATE = '20120513'

    SELECT TOP (500) test.ID

    FROM #SomeTestTable as test

    WHERE (test.EndDateTime < @Date)

    go 5

    --

    -- Q6 Index scan ix_EndDateTimeStartDateTime; Duration too short to measure

    DECLARE @Date DATE = '20120513'

    SELECT TOP (500) test.ID

    FROM #SomeTestTable as test

    WHERE (test.EndDateTime < @Date) OR (test.EndDateTime IS NULL AND test.StartDateTime < @Date)

    go 5


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Hmm, interesting. SQL's still "dumber" than I thought :-).

    We'll need to UNION the query results instead of using "OR".

    Also, as always, we want the variable to be exactly the same data type as the column, to insure that no implicit conversions will be needed.

    DECLARE @Date datetime2

    SET @Date = '20120513'

    SELECT TOP (500) test.ID

    FROM (

    -- index seek

    SELECT TOP (500) test.ID

    FROM #SomeTestTable as test

    WHERE (test.EndDateTime < @Date)

    UNION --sort and eliminate dups, since we're sorting in the outer query anyway

    -- index seek

    SELECT TOP (500) test.ID

    FROM #SomeTestTable as test

    WHERE test.EndDateTime IS NULL AND test.StartDateTime < @Date

    ) AS derived

    ORDER BY

    test.ID

    Too bad SQL can't do that itself with the "OR" there, but I know it has to allow for far more complex variations where such a clean translation might not be possible.

    Edit: Changed "datetime" to "datetime2" to match the OP's table defs.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply