specific character validation

Question

Post reply

specific character validation

Robert Klimes

SSCoach

Points: 17865
More actions
September 8, 2008 at 11:43 am

#192437

I have a an issue were i want to check if specific characters or strings exist in a string based on values in a table.

create table validname (id int primary(1,1), name varchar(20) CONSTRAINT [PK_validname] PRIMARY KEY CLUSTERED ([id] ASC)) insert into validname(name) select 'Bob' union select 'B0B' union select 'B@B' union select 'boooob' create table badname (bad varchar(10) CONSTRAINT [PK_badname] PRIMARY KEY CLUSTERED ([bad] ASC)) insert into badname select '0' union select '@' union select 'oooo'

I use the following query to do the comparison and it runs well

select A.name from validname as A,badname as B where A.id = 1 and (A.name like '%' + B.bad + '%')

but now I have a requirement to add an index on the name field in valid name which causes the query to perform very badly.

create index name_A on validname (name asc) include (id)

in this example both run similar but production validname table has millions of rows. The execution shows that with the index on the name column, it does and index seek on name_a as well as a constant scan. without name_A it does index seek on PK_validname and no constant scan.

I am not quite sure why/if constant scan would cause this to perform poorly. Is it looping through every value in the name_a index and then comparing that to each value in the badname table?

is there a better way to handle the badname string check?

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

I'd move the part about validname like '%' + badname + '%' to the Join clause, but otherwise it looks fine.

I created a 100-thousand row copy of your table and ran it with and without that index. In my test, without index ran in 60 to 65 milliseconds, while with index ran in 10-15 milliseconds, CPU time.

Are you sure you're comparing the exact same query with and without the index? You didn't accidentally remove part of the Where clause or something like that?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Michael Earl-395764 SSC Guru Points: 53873 More actions · Answer 2

An index is not going to help you because your criteria begins with a wildcard. So, the query needs to evaluate for every row regardless of an index existing.

In the execution with the index, the optimizer seems to think that the index is going to help so it is using the index and then must be looking up the value in the heap or clustered index. If you specify fullscan on your clustered index, it will probably go back to the performance before the index.

If you added the index to help this process, remove it. It is not going to help.

Robert Klimes SSCoach Points: 17865 More actions · Answer 3

In my production environment the application runs query so it hasn't changed with or with out index. and the difference in time for me is 10 ms without name index and application times out after 30 sec when i have the name index created. The validname table has over 32mil rows. In dev environment, only 41000 rows but the difference in time is significant (8ms without index, 500ms with index)

did you mean moving to join clause like so?



select A.name from validname as A

join badname as B

on A.name like '%' + B.bad + '%'

where A.id = 1

If so, I myself prefer this format but unfortunately it does nothing to change the execution plan.

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]

Robert Klimes SSCoach Points: 17865 More actions · Answer 4

Thanks Michael. Adding the index hint to the query forced it to use the primary key thus performance is good(at least in test environment)



select A.name from validname with (index (PK_validname))as A,badname as B where A.id = 1 and (A.name like '%' + B.bad + '%')

btw, I knew that have a wildcard would cause the index to not be useful thus I was surprised to see the optimizer choose that index instead of the primary key. The name index is to help another query.

thanks again

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]

GSquared SSC Guru Points: 260824 More actions · Answer 5

I can't duplicate the behavior you're describing with what I have for testing. Might be something specific to your data.

Are the stats up-to-date on your tables? Are the indexes heavily fragmented?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Robert Klimes SSCoach Points: 17865 More actions · Answer 6

Robert Klimes

SSCoach

Points: 17865

September 9, 2008 at 10:42 am

#869169

the stats on the name column were last updated August 26 and fragmentation is quite high, but gets reindexed usually three time a week.

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]