Help with the query and index.

Question

Post reply

Help with the query and index.

mayurkb

Ten Centuries

Points: 1155
More actions
August 7, 2013 at 9:03 am

#281672

I've a table which has about 15 columns, I'll just put 3 columns here in the script that we would be using in the script.

CREATE TABLE "schema"."tblTest"

(

"TestId" uniqueidentifier NOT NULL ,

"StartDateTime" datetime2 NOT NULL ,

"EndDateTime" datetime2 NULL

)

TestId is PK with a clustered index. Also, there is 1 nonclustered index on StartDateTime. No index is present on Enddatetime.

The query I need to rewrite\optimise is like this:

SELECT TOP (500) test.[TestId]

FROM tblTest as test

WHERE COALESCE(test.EndDateTime, test.StartDateTime) < @Date;

The table will have about 1 million records, and about 1/16th (about 6-7%) of that would be part of the output at any given time. Right now this query does clustered index scan. I would like it to do seeks instead.

Solutions I tried:

1) Add a computed column with definition: (case when EndDateTime is not null then EndDateTime Else StartDateTime End) and have an index on this. This works perfect but I cannot use this because we use microsoft sync framework that does not like computed columns.

2) Indexed view with computed column. Again, we cannot use this as we need to support SQLExpress.

3) Completely re-write the query by splitting it into 2 queries and union the 2 results. Create 2 separate indexes on enddatetime and startdatetime. This seems little overkill maybe.

;with cte as

(SELECT test.[TestId]

FROM schema.tblTest as test

WHERE test.EndDateTime < @Date

union all

SELECT test.[TestId]

FROM schema.tblTest as test

WHERE test.EndDateTime is null and test.StartDateTime < @Date)

select top 500 testid from cte

Any help on finding a better solution would be appreciated.

Thanks in advance.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286562 More actions · Answer 1

Why not just split the two conditions? No need for a cte here.



SELECT TOP (500) test.[TestId]

FROM tblTest as test

WHERE test.EndDateTime < @Date OR test.StartDateTime < @Date;

The next question is, which 500 rows do you want? Do you care? Is it ok if the order changes randomly? Unless you add an order by to this query you have no way of knowing which 500 rows it will return.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

mayurkb Ten Centuries Points: 1155 More actions · Answer 2

Sean Lange (8/7/2013)

Why not just split the two conditions? No need for a cte here.

SELECT TOP (500) test.[TestId] FROM tblTest as test WHERE test.EndDateTime < @Date OR test.StartDateTime < @Date;

The next question is, which 500 rows do you want? Do you care? Is it ok if the order changes randomly? Unless you add an order by to this query you have no way of knowing which 500 rows it will return.

Thanks Sean.

We have to use StartDateTime only if Enddatetime is null. We are doing batching that's why 500. That's configurable and we do not care about ordering.

Stefan_G SSCertifiable Points: 6609 More actions · Answer 3

Solutions I tried:

1) Add a computed column with definition: (case when EndDateTime is not null then EndDateTime Else StartDateTime End) and have an index on this. This works perfect but I cannot use this because we use microsoft sync framework that does not like computed columns.

2) Indexed view with computed column. Again, we cannot use this as we need to support SQLExpress.

3) Completely re-write the query by splitting it into 2 queries and union the 2 results. Create 2 separate indexes on enddatetime and startdatetime. This seems little overkill maybe.

;with cte as

(SELECT test.[TestId]

FROM schema.tblTest as test

WHERE test.EndDateTime < @Date

union all

SELECT test.[TestId]

FROM schema.tblTest as test

WHERE test.EndDateTime is null and test.StartDateTime < @Date)

select top 500 testid from cte

Any help on finding a better solution would be appreciated.

Thanks in advance.

What is wrong with your solution 3? It seems like an ok solution to me?

Another possibility could be to create a new regular column with a non-clustered index and let a trigger populate it with (case when EndDateTime is not null then EndDateTime Else StartDateTime End)

This would of course increase the storage requirements and slow down inserts, but at lest the select would be extremely fast.

/SG

ScottPletcher SSC Guru Points: 100262 More actions · Answer 4

ScottPletcher

SSC Guru

Points: 100262

August 7, 2013 at 4:06 pm

#1639590

Do you really need only the TestId column in the result set? You don't all the columns?

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

mayurkb Ten Centuries Points: 1155 More actions · Answer 5

Thank you everybody for the reply. Sorry for replying late as I got caught up in something else.

@stefan,

I don't really mind increased storage as long as I get best performance for reads. Normal column with trigger is a good idea. We can't really use triggers as we have distributed environment, data is sync'd by MS sync framework.

@scott,

I only need TestId as I want to delete those older tests.

ScottPletcher SSC Guru Points: 100262 More actions · Answer 6

You don't need separate indexes on StartDateTime and EndDateTime, you need them in the same index.

SQL will only be able to do a seek on that index if it's headed by EndDateTime.

So, you could:

1) Create a new index on ( EndDateTime, StartDateTime). You can write the combined conditions as an "OR" and SQL should be able to do seeks on that index. For the purposes of the SELECT, this is the best option.

OR

2) Add the EndDateTime to the existing nonclustered index with StartDateTime. SQL will have to scan the entire index, but not the main table itself.

Btw, if possible, determine if SQL is actually using the existing nonclustered index -- if not, get rid of it. Naturally that would negate option 2 and leave only option 1.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

mayurkb Ten Centuries Points: 1155 More actions · Answer 7

ScottPletcher (8/9/2013)

You don't need separate indexes on StartDateTime and EndDateTime, you need them in the same index.

SQL will only be able to do a seek on that index if it's headed by EndDateTime.

So, you could:

1) Create a new index on ( EndDateTime, StartDateTime). You can write the combined conditions as an "OR" and SQL should be able to do seeks on that index. For the purposes of the SELECT, this is the best option.

OR

2) Add the EndDateTime to the existing nonclustered index with StartDateTime. SQL will have to scan the entire index, but not the main table itself.

Btw, if possible, determine if SQL is actually using the existing nonclustered index -- if not, get rid of it. Naturally that would negate option 2 and leave only option 1.

Thanks Scott. I tried option 1 and it still does a scan on the non-clustered index with enddatetime.

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 8

mayurkb (8/9/2013)

ScottPletcher (8/9/2013)

You don't need separate indexes on StartDateTime and EndDateTime, you need them in the same index.

SQL will only be able to do a seek on that index if it's headed by EndDateTime.

So, you could:

1) Create a new index on ( EndDateTime, StartDateTime). You can write the combined conditions as an "OR" and SQL should be able to do seeks on that index. For the purposes of the SELECT, this is the best option.

OR

2) Add the EndDateTime to the existing nonclustered index with StartDateTime. SQL will have to scan the entire index, but not the main table itself.

Btw, if possible, determine if SQL is actually using the existing nonclustered index -- if not, get rid of it. Naturally that would negate option 2 and leave only option 1.

Thanks Scott. I tried option 1 and it still does a scan on the non-clustered index with enddatetime.

And your problem is what exactly?

IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL



      DROP TABLE #SomeTestTable;        

--===== Declare some obviously named variables

DECLARE @NumberOfRows INT,

        @StartDate    DATEtime,

        @EndDate      DATEtime,

        @Days         DECIMAL(10,2), --This is still the "range"

        @StartValue   INT,

        @EndValue     INT,

        @Range        INT;

--===== Preset the variables to known values

SELECT @NumberOfRows = 1000000, -- a million rows

        @StartDate    = '2010', --Inclusive

        @EndDate      = '2020', --Exclusive

        @Days         = DATEDIFF(dd,@StartDate,@EndDate),

        @StartValue   = 100,

        @EndValue     = 900,

        @Range        = @EndValue - @StartValue + 1;

--===== Create "random constrained" integers within

     -- the parameters identified in the variables above.

SELECT

      ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),

      StartDateTime,

      EndDateTime = DATEADD(DD,ABS(CHECKSUM(NEWID()))%100,StartDateTime)

   INTO #SomeTestTable

FROM (

SELECT TOP (@NumberOfRows)

        StartDateTime =  DATEADD(DAY,ABS(CHECKSUM(NEWID())) % @Days,@StartDate)

   FROM sys.all_columns ac1

  CROSS JOIN  sys.all_columns ac2

) d;

UPDATE t SET EndDateTime = NULL

FROM #SomeTestTable t

WHERE ID IN (

SELECT TOP 100000 ID FROM #SomeTestTable ORDER BY  NEWID()) 

CREATE UNIQUE CLUSTERED  INDEX ucx_ID ON #SomeTestTable (ID)

CREATE INDEX ix_StartDateTime ON #SomeTestTable (StartDateTime)

CREATE INDEX ix_EndDateTime ON #SomeTestTable (EndDateTime)

CREATE INDEX ix_StartDateTimeEndDateTime ON #SomeTestTable (StartDateTime,EndDateTime)

CREATE INDEX ix_EndDateTimeStartDateTime ON #SomeTestTable (EndDateTime,StartDateTime) 

-- #########################################################################

--

-- Q1 Index scan ix_EndDateTimeStartDateTime; Duration about 60ms

DECLARE @Date DATE =  '20120513'

SELECT TOP (10000) test.ID

FROM #SomeTestTable as test

WHERE COALESCE(test.EndDateTime, test.StartDateTime) < @Date;

go 5

--

-- Q2 Index seek ix_EndDateTime; Duration about 40ms

DECLARE @Date DATE =  '20120513'

SELECT TOP (10000) test.ID

FROM #SomeTestTable as test

WHERE (test.EndDateTime < @Date)

go 5

--

-- Q3 index scan ix_EndDateTimeStartDateTime; Duration about 50ms

DECLARE @Date DATE =  '20120513'

SELECT TOP (10000) test.ID

FROM #SomeTestTable as test

WHERE (test.EndDateTime < @Date)  OR (test.EndDateTime IS NULL  AND test.StartDateTime  < @Date)

go 5

--

-- Q4 index scan ix_EndDateTimeStartDateTime; Duration too short to measure

DECLARE @Date DATE =  '20120513'

SELECT TOP (500) test.ID

FROM #SomeTestTable as test

WHERE COALESCE(test.EndDateTime, test.StartDateTime) < @Date;

go 5

--

-- Q5 index seek ix_EndDateTime; Duration too short to measure

DECLARE @Date DATE =  '20120513'

SELECT TOP (500) test.ID

FROM #SomeTestTable as test

WHERE (test.EndDateTime < @Date)

go 5

--

-- Q6 Index scan ix_EndDateTimeStartDateTime; Duration too short to measure

DECLARE @Date DATE =  '20120513'

SELECT TOP (500) test.ID

FROM #SomeTestTable as test

WHERE (test.EndDateTime < @Date)  OR (test.EndDateTime IS NULL  AND test.StartDateTime  < @Date)

go 5

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

ScottPletcher SSC Guru Points: 100262 More actions · Answer 9

Hmm, interesting. SQL's still "dumber" than I thought :-).

We'll need to UNION the query results instead of using "OR".

Also, as always, we want the variable to be exactly the same data type as the column, to insure that no implicit conversions will be needed.



DECLARE @Date datetime2

SET @Date =  '20120513'

SELECT TOP (500) test.ID

FROM (

    -- index seek

    SELECT TOP (500) test.ID

    FROM #SomeTestTable as test

    WHERE (test.EndDateTime < @Date)

    UNION --sort and eliminate dups, since we're sorting in the outer query anyway

    -- index seek

    SELECT TOP (500) test.ID

    FROM #SomeTestTable as test

    WHERE test.EndDateTime IS NULL AND test.StartDateTime  < @Date

) AS derived

ORDER BY

    test.ID

Too bad SQL can't do that itself with the "OR" there, but I know it has to allow for far more complex variations where such a clean translation might not be possible.

Edit: Changed "datetime" to "datetime2" to match the OP's table defs.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!