substring or like

Question

Post reply

substring or like

garryurban

Old Hand

Points: 393
More actions
January 24, 2006 at 5:43 am

#113169

I have queries that nearly always based on part-code

is it better to use substring or like ?

e.g

where substring(partCode,1,4)='ukd5'

or

where partcode like 'ukd5%'

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

David Burrows SSC Guru Points: 65111 More actions · Answer 1

Depends if the table has indexes

If not then either will use a Table Scan

If the column is indexed SUBSTRING will use Index Scan and LIKE will use Index Seek

If the table has index (but the column has not) then both will use Index Scan

Any combination can affect performance and you should check the execution plan(s)

Far away is close at hand in the images of elsewhere.
Anon.

garryurban Old Hand Points: 393 More actions · Answer 2

garryurban

Old Hand

Points: 393

January 24, 2006 at 6:18 am

#616634

thank you David,

In this context, where there is a clustered index on the column, the LIKE uses the clustered index seek and is over 20 times faster

3 secs for the seek compared to 1min 11 seconds for the scan

happy days!

cliffb SSCarpal Tunnel Points: 4547 More actions · Answer 3

cliffb

SSCarpal Tunnel

Points: 4547

January 25, 2006 at 7:07 am

#616884

It depends on the functionality you are looking for. If you trying to find all records where partcode starts with, contains, or ends with some value, then the LIKE operator is the correct choice. If you want to find the value at an exact position within the part code, SUBSTRING is a better choice.

paulhumphris Hall of Fame Points: 3496 More actions · Answer 4

paulhumphris

Hall of Fame

Points: 3496

January 25, 2006 at 9:22 am

#616943

If this was something you were going to be regularly searching for, you could create a calculated column with a formula of:

substring(PartCode, 1, 4)

Then build an index on this calculated column - the database will then use the index to search instead of performing a table scan.

garryurban Old Hand Points: 393 More actions · Answer 5

ah - this might help me

the part code is normally 10 chars long (alphanumeric)

but is normally used in first 3, first 4, first 7, or whole 10

so if i made calculated colums for 3,4,7 then created an index for each of these the database would choose the most optimal index to use depending on what I was searching for or would i have specy the calulated column in the where clause?

paulhumphris Hall of Fame Points: 3496 More actions · Answer 6

Just specify the Calculated column in the where clause and the query optimiser should choose the appropriate index. You can check by looking at the Execution Plan.

Beware that creating too many indexes will have a performance impact on your Inserts, Updates & Deletes since these new indexes will have to be kept up to date whenever your change the data.

garryurban Old Hand Points: 393 More actions · Answer 7

garryurban

Old Hand

Points: 393

January 25, 2006 at 9:39 am

#616955

well we are fortunate - no updates or deletes, just inserts once a quarter (aprx 2.4 million records)

is there any benefit in not bothering with calculated column, and just have a fixed column for each ?

paulhumphris Hall of Fame Points: 3496 More actions · Answer 8

The data in calculated columns will not add to the storage requirement for the table - it is in effect virtual data. However, by creating the index you will persist the value of the calculated column in the index even though the value is not stored in the table.

With a table of this size it would probably be beneficial to drop the index prior to the quarterly data population and recreate it after the import completes.

garryurban Old Hand Points: 393 More actions · Answer 9

i just got back to this issue

I prepared a test:

Table: 76 million rows

Query: Matches 360k rows and sums up into 26 rows

like 'ukd5%' - indexed partCode column: 8secs

= 'ukd5' - indexed calculated column of substring(partCode,1,4): 7secs

= 'ukd5' - indexed normalised column of substring(partCode,1,4): 7secs

so the '=' is marginally faster than the 'like',

and there was no direct improvement from normalising the column in this case