Searching with like operator(billions of rows)

  • Hello,

    We have two tables and we are searching whether column a in table2 contains values in a column of table1.

    e.g.

    Select t1.id,t1.col1,t1.col2,t2.col1

    from

    table1 t1,

    table2 t2

    where

    substring([t1.Col1],1,charindex(',',[t1.Col1],1)-1) like '%' + t2.col2 + '%'

    We have created a clustered index in columns id & col1 of table1 (col1,id)

    and a clustered index on col2 of table2

    t1.Col1 is nvarchar 510 and

    t2,col2 is nvarchar 100.

    Table1 (t1.col1) has 10200000 rows.

    What is the best way to optimize this query and get the results in minimum TAT.

  • Possibly fulltext indexing would help you.

    But your design is wrong , there is no magic bullet for this one.

    Please post some sample data for further advice.



    Clear Sky SQL
    My Blog[/url]

  • Table1 contains name & address concatenated together.

    ID Col1

    1 Name1,address1

    2 name2,address2..

    There are certain set of words stored in col2 of table2(~300) and those words are required to be searched in Name part of the sample data above.

    Therefore I am searching it using sub string.

  • Filter Index

    Regards,
    Syed Jahanzaib Bin Hassan
    BSCS | MCTS | MCITP | OCA | OCP | OCE | SCJP | IBMCDBA

    My Blog
    www.aureus-salah.com

  • SQLRO (6/29/2011)


    Table1 contains name & address concatenated together.

    ID Col1

    1 Name1,address1

    2 name2,address2..

    There are certain set of words stored in col2 of table2(~300) and those words are required to be searched in Name part of the sample data above.

    Therefore I am searching it using sub string.

    Considering your clarification I would recommend you to change your query to use proper JOINs (INNER JOIN) / EXISTS operator and instead of comparing the search operator using substring and other operators use only LIKE operator. Something like this... but remember using the full text search will have better performance than any other option.

    select a.*

    from Table1 a

    WHERE exists(select 1 from Table2 b where a.Name like '%'+ b.Col2 +'%')

    Abhijit - http://abhijitmore.wordpress.com

  • Syed Jahanzaib Bin hassan (6/29/2011)


    Filter Index

    how would that help?

  • Thanks Abhijeet. Can you tell me how that works internally.

  • If it were me I would create an index on col1 and the ID (covered index) on both tables and see how that works. Other than that, I think a full text index would be your only other option.

    The probability of survival is inversely proportional to the angle of arrival.

  • Yes. I did exactly that. It took 2.51 hrs to return 150K rows. I think it also depends on what I select in output list. I think selecting only id in the output column will be much faster.

  • SQLRO (6/29/2011)


    Thanks Abhijeet. Can you tell me how that works internally.

    Internally it checks the existenace of records for matching criteria and even we can introduce some indexes to get more performance.

    Abhijit - http://abhijitmore.wordpress.com

  • SQLRO (6/29/2011)


    Yes. I did exactly that. It took 2.51 hrs to return 150K rows. I think it also depends on what I select in output list. I think selecting only id in the output column will be much faster.

    Sir, you said you created a "clustered index" that is NOT a covered index and would result in a table scan. If the table is wide you are scanning a lot more pages than would be the case if it were NON clustered index.

    The probability of survival is inversely proportional to the angle of arrival.

  • steveb. (6/29/2011)


    Syed Jahanzaib Bin hassan (6/29/2011)


    Filter Index

    how would that help?

    Can't help. Looks like he's slipping back into his old habit of giving completely useless advice without really trying to help (yes I still see them I just stopped bugging you on everyone of them).

  • Here is your query rewritten using ANSI-92 JOINS:

    SELECT t1.id,

    t1.col1,

    t1.col2,

    t2.col1

    FROM table1 t1

    CROSS JOIN table2 t2

    WHERE SUBSTRING(t1.Col1, 1, CHARINDEX(',', t1.Col1, 1) - 1) LIKE '%' + t2.col2 + '%' ;

    No amount of indexing will help you avoid scans with a query like that.

    You'll need to change your approach before you have a shot at a well-performing query. For table1, if you can make schema changes and normalize your comma-delimited lists that would be my first move. For table2, full-text looks like a good option to explore given your use of wildcards.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply