substring or like

  • I have queries that nearly always based on part-code

    is it better to use substring or like ?

     

    e.g

    where substring(partCode,1,4)='ukd5'

    or

    where partcode like 'ukd5%'

     

     

  • Depends if the table has indexes

    If not then either will use a Table Scan

    If the column is indexed SUBSTRING will use Index Scan and LIKE will use Index Seek

    If the table has index (but the column has not) then both will use Index Scan

    Any combination can affect performance and you should check the execution plan(s)

    Far away is close at hand in the images of elsewhere.
    Anon.

  • thank you David,

     

    In this context, where there is a clustered index on the column, the LIKE uses the clustered index seek and is over 20 times faster

    3 secs for the seek compared to 1min 11 seconds for the scan

     

    happy days!

     

  • It depends on the functionality you are looking for. If you trying to find all records where partcode starts with, contains, or ends with some value, then the LIKE operator is the correct choice. If you want to find the value at an exact position within the part code, SUBSTRING is a better choice.

  • If this was something you were going to be regularly searching for, you could create a calculated column with a formula of:

    substring(PartCode, 1, 4)

    Then build an index on this calculated column - the database will then use the index to search instead of performing a table scan.

  • ah - this might help me

    the part code is normally 10 chars long (alphanumeric)

    but is normally used in first 3, first 4, first 7, or whole 10

    so if i made calculated colums for 3,4,7 then created an index for each of these the database would choose the most optimal index to use depending on what I was searching for or would i have specy the calulated column in the where clause?

  • Just specify the Calculated column in the where clause and the query optimiser should choose the appropriate index.  You can check by looking at the Execution Plan.

    Beware that creating too many indexes will have a performance impact on your Inserts, Updates & Deletes since these new indexes will have to be kept up to date whenever your change the data.

  • well we are fortunate - no updates or deletes, just inserts once a quarter (aprx 2.4 million records)

     

    is there any benefit in not bothering with calculated column, and just have a fixed column for each ?

     

     

  • The data in calculated columns will not add to the storage requirement for the table - it is in effect virtual data.  However, by creating the index you will persist the value of the calculated column in the index even though the value is not stored in the table.

    With a table of this size it would probably be beneficial to drop the index prior to the quarterly data population and recreate it after the import completes.

  • i just got back to this issue

    I prepared a test:

    Table: 76 million rows

    Query: Matches 360k rows and sums up into 26 rows

    like 'ukd5%' - indexed partCode column: 8secs

    = 'ukd5' - indexed calculated column of substring(partCode,1,4): 7secs

    = 'ukd5' - indexed normalised column of substring(partCode,1,4): 7secs

    so the '=' is marginally faster than the 'like',

    and there was no direct improvement from normalising the column in this case

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply