Advantage to creating an Index when searching for NULLs?

Question

Post reply

Advantage to creating an Index when searching for NULLs?

ChrisMoix-87856

SSCertifiable

Points: 7288
More actions
July 12, 2007 at 1:50 pm

#117655

Hello,

I was just asked a question by my boss. Is there an advantage to indexing a column if you are going to search on it like this:

SELECT * FROM tblA

WHERE column1 IS NULL

Assuming that it is selective (few NULL values compared to the overall number of rows in the table).

Is it likely that creating an index in tblA.column1 will help the query?

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Sergiy SSC Guru Points: 110192 More actions · Answer 1

Sergiy

SSC Guru

Points: 110192

July 12, 2007 at 2:37 pm

#719100

Why don't you run a simple test and show results to your boss?

_____________
Code for TallyGenerator

Ben Sullins-437405 SSCertifiable Points: 5397 More actions · Answer 2

"Non-sargable search arguments in the WHERE clause, such as "IS NULL", "", "!=", "!>", "!<", "NOT", "NOT EXISTS", "NOT IN", "NOT LIKE", and "LIKE '%500'" generally prevents (but not always) the query optimizer from using an index to perform a search." So no...it won't help because it most likely will not be used anyway...

src: http://www.sql-server-performance.com/transact_sql.asp

Cheers,

Ben Sullins
bensullins.com
Beer is my primary key...

Sergiy SSC Guru Points: 110192 More actions · Answer 3

Sergiy

SSC Guru

Points: 110192

July 12, 2007 at 4:10 pm

#719129

Ben, I would suggest to you as well to run a simple test.

You cannot believe how much stuff you can find in Internet is absolute cr.p.

And you probably missed this: "generally prevents (but not always)"

_____________
Code for TallyGenerator

Vladan SSC-Insane Points: 22072 More actions · Answer 4

Vladan

SSC-Insane

Points: 22072

July 13, 2007 at 5:39 am

#719253

I agree that test is the best way to find out.

I will not quote any authorities (true or would-be), but based on my experience, if the table is large and does not contain many NULLs in the column, such index would help a lot.

ChrisMoix-87856 SSCertifiable Points: 7288 More actions · Answer 5

I would try to simulate it, but the scenario involves data to be populated in a table in the future. We're talking about 100s of thousands of rows, with only a few hundred rows will NULL values in the column we're looking for.

When my boss asked me the question, I thought the index wouldn't help. I just wanted to find out if anyone had any experience or knew of any definitive answer (I tried googling it, but I got a bunch of non-related results).

Ben, thanks for posting a constructive answer.

John Rowan SSC Guru Points: 56440 More actions · Answer 6

Your best bet here really is to run a test of your own as Sergiy and Vladan have stated. While you may find sources that say that it may or may not help, I would advise taking it with a grain of salt and testing it for yourself especially since this would be an easy mock up for you. Even with 1 million rows, you could create a simple test to prove or disprove the effects of adding the index with relative ease. As a rule, even when it makes total sense that a certain indexing scenario should provide improvements, I always run a test so I have proof that a change that I recommend is going to help.

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Andy Warren SSC Guru Points: 119747 More actions · Answer 7

I've got a small table with about 9k rows, and it has a column that has about 3k of nulls. When I select top 50 where column is a null, I get nice index seek against the column index. Remove the top and it changes to a clustered scan. No final answer there, but it would appear in my test that the index does help for the Top N cases.

I think its fair to point out the value of testing, but it's also nice to just ask a question and have a conversation about it.

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

Sergiy SSC Guru Points: 110192 More actions · Answer 8

It's not exacly about TOP N.

It's about shortest way to the target.

When you do SELECT * in a query involving non-clustered index seek SQL Server needs to perform bookmark lookup in order to return all requested values.

In case of TOP 50 the cost of index seek + lookup is smaller than the cost of clustered index scan.

If you don't have TOP 50 statistics will tell optimiser that it's gonna be lookup for 30% of rows and it's gonna be faster to perform single clustered index scan than index seek with following bookmark lookup.

Change your index used in WHERE clause to be clustered and it will be "clustered index seek" in both occasions.

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 9

And, if you do have a clustered index in the right spot...

GO

--===== Create and populate a 1,000,000 row test table.

-- Column RowNum has a range of 1 to 1,000,000 unique numbers

-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

-- Column "SomeString" has a range of "AA" to "ZZ" non-unique 2 character strings

-- Column "SomeNumber has a range of 0.0000 to 99.9999 non-unique numbers

-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

-- Takes about 42 seconds to execute.

SELECT TOP 1000000

RowNum = IDENTITY(INT,1,1),

SomeInt = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),

SomeString = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)),

SomeCSV = 'Part1,Part2,Part3,Part4,Part5,Part6,Part7,Part8,Part9,Part10',

SomeNumber = CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY),

SomeDate = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME)

INTO dbo.JBMTest

FROM Master.dbo.SysColumns sc1,

Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN

--===== A table is not properly formed unless a Primary Key has been assigned

ALTER TABLE dbo.JBMTest

ADD PRIMARY KEY CLUSTERED (RowNum)

--===== Create an index on the test column

CREATE INDEX IX_JBMTest_SomeInt ON dbo.JBMTest (SomeInt)

--===== Change some of the data in the test column to NULL (more than 30%)

UPDATE dbo.JBMTest

SET SomeInt = NULL

WHERE SomeInt <= 16000

--======================================================================

-- Run some tests... "*" sucks...

--======================================================================

SET STATISTICS IO ON

SELECT * --===== Test1

FROM dbo.JBMTest

WHERE SomeInt IS NULL

SELECT RowNum,SomeInt --===== Test2

FROM dbo.JBMTest

WHERE SomeInt IS NULL

SELECT TOP 50 * --===== Test3

FROM dbo.JBMTest

WHERE SomeInt IS NULL

SELECT TOP 50 RowNum,SomeInt --===== Test4

FROM dbo.JBMTest

WHERE SomeInt IS NULL

SET STATISTICS IO OFF

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

David McFarland SSChampion Points: 11815 More actions · Answer 10

David McFarland

SSChampion

Points: 11815

July 13, 2007 at 11:27 pm

#719469

Of course if you drop the indexes altogether, 3 of the 4 tests should be slightly faster than with the indexes (the second won't be, as the columns are covered). That obviously only applies in certain data scenarios, but this happens to be one of them.

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 11

Exactly. For example, having read the cited article before, I was never able to understand why something like "in" and "exists" are sargable but "not in" and "not exists" are not. The "not" only takes the result of the "in" or "exists" and inverts it. If one is sargable, the other has to be.

As for the specific question, I would fully expect an index to help. Null values would appear before any other value so the optimizer should know that as soon as a non-null appears, the search is over.

However, yes, do perform a test. My expectations have been dashed before...

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms