Searching with like operator(billions of rows)

Question

Post reply

Searching with like operator(billions of rows)

SQLRO

SSCrazy

Points: 2923
More actions
June 29, 2011 at 3:28 am

#242876

Hello,

We have two tables and we are searching whether column a in table2 contains values in a column of table1.

e.g.

Select t1.id,t1.col1,t1.col2,t2.col1

from

table1 t1,

table2 t2

where

substring([t1.Col1],1,charindex(',',[t1.Col1],1)-1) like '%' + t2.col2 + '%'

We have created a clustered index in columns id & col1 of table1 (col1,id)

and a clustered index on col2 of table2

t1.Col1 is nvarchar 510 and

t2,col2 is nvarchar 100.

Table1 (t1.col1) has 10200000 rows.

What is the best way to optimize this query and get the results in minimum TAT.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Dave Ballantyne SSC-Dedicated Points: 33556 More actions · Answer 1

Dave Ballantyne

SSC-Dedicated

Points: 33556

June 29, 2011 at 3:54 am

#1345760

Possibly fulltext indexing would help you.

But your design is wrong , there is no magic bullet for this one.

Please post some sample data for further advice.

Clear Sky SQL
My Blog[/url]

SQLRO SSCrazy Points: 2923 More actions · Answer 2

SQLRO

SSCrazy

Points: 2923

June 29, 2011 at 4:02 am

#1345768

Table1 contains name & address concatenated together.

ID Col1

1 Name1,address1

2 name2,address2..

There are certain set of words stored in col2 of table2(~300) and those words are required to be searched in Name part of the sample data above.

Therefore I am searching it using sub string.

Syed Jahanzaib Bin hassan SSChampion Points: 11410 More actions · Answer 3

Syed Jahanzaib Bin hassan

SSChampion

Points: 11410

June 29, 2011 at 4:06 am

#1345771

Filter Index

My Blog
www.aureus-salah.com

Abhijit More SSCertifiable Points: 6897 More actions · Answer 4

SQLRO (6/29/2011)

Table1 contains name & address concatenated together.

ID Col1

1 Name1,address1

2 name2,address2..

There are certain set of words stored in col2 of table2(~300) and those words are required to be searched in Name part of the sample data above.

Therefore I am searching it using sub string.

Considering your clarification I would recommend you to change your query to use proper JOINs (INNER JOIN) / EXISTS operator and instead of comparing the search operator using substring and other operators use only LIKE operator. Something like this... but remember using the full text search will have better performance than any other option.

select a.*

from Table1 a

WHERE exists(select 1 from Table2 b where a.Name like '%'+ b.Col2 +'%')

Abhijit - http://abhijitmore.wordpress.com

steveb. SSC-Forever Points: 46733 More actions · Answer 5

steveb.

SSC-Forever

Points: 46733

June 29, 2011 at 4:12 am

#1345773

Syed Jahanzaib Bin hassan (6/29/2011)

Filter Index

how would that help?

SQLRO SSCrazy Points: 2923 More actions · Answer 6

SQLRO

SSCrazy

Points: 2923

June 29, 2011 at 4:21 am

#1345781

Thanks Abhijeet. Can you tell me how that works internally.

sturner SSC-Insane Points: 22411 More actions · Answer 7

sturner

SSC-Insane

Points: 22411

June 29, 2011 at 6:54 am

#1345883

If it were me I would create an index on col1 and the ID (covered index) on both tables and see how that works. Other than that, I think a full text index would be your only other option.

The probability of survival is inversely proportional to the angle of arrival.

SQLRO SSCrazy Points: 2923 More actions · Answer 8

SQLRO

SSCrazy

Points: 2923

June 29, 2011 at 6:59 am

#1345891

Yes. I did exactly that. It took 2.51 hrs to return 150K rows. I think it also depends on what I select in output list. I think selecting only id in the output column will be much faster.

Abhijit More SSCertifiable Points: 6897 More actions · Answer 9

Abhijit More

SSCertifiable

Points: 6897

June 29, 2011 at 7:02 am

#1345894

SQLRO (6/29/2011)

Thanks Abhijeet. Can you tell me how that works internally.

Internally it checks the existenace of records for matching criteria and even we can introduce some indexes to get more performance.

Abhijit - http://abhijitmore.wordpress.com

sturner SSC-Insane Points: 22411 More actions · Answer 10

SQLRO (6/29/2011)

Yes. I did exactly that. It took 2.51 hrs to return 150K rows. I think it also depends on what I select in output list. I think selecting only id in the output column will be much faster.

Sir, you said you created a "clustered index" that is NOT a covered index and would result in a table scan. If the table is wide you are scanning a lot more pages than would be the case if it were NON clustered index.

The probability of survival is inversely proportional to the angle of arrival.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 11

Ninja's_RGR'us

SSC Guru

Points: 294069

June 29, 2011 at 7:18 am

#1345908

steveb. (6/29/2011)

Syed Jahanzaib Bin hassan (6/29/2011)

Filter Index

how would that help?

Can't help. Looks like he's slipping back into his old habit of giving completely useless advice without really trying to help (yes I still see them I just stopped bugging you on everyone of them).

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 12

Here is your query rewritten using ANSI-92 JOINS:

SELECT t1.id,



        t1.col1,

        t1.col2,

        t2.col1

FROM    table1 t1

        CROSS JOIN table2 t2

WHERE SUBSTRING(t1.Col1, 1, CHARINDEX(',', t1.Col1, 1) - 1) LIKE '%' + t2.col2 + '%' ;

No amount of indexing will help you avoid scans with a query like that.

You'll need to change your approach before you have a shot at a well-performing query. For table1, if you can make schema changes and normalize your comma-delimited lists that would be my first move. For table2, full-text looks like a good option to explore given your use of wildcards.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato