Find records in table that match patterns in different table

  • I had a co-worker ask me to help him solve a problem and this was the best solution I could come up with. It worked, but I figured with a large amount of records, using a cross join would start to hurt the performance. I just wanted to know if anyone had ideas on a perhaps better performing solution.

    The problem was he had a table of customers with descriptions on them and needed to find the ones whose description matched a few strings he was looking for anywhere in the description. It's not a big deal if the customer returns more than once if the description matches multiple patterns.

    Here is some data with my solution:

    create table #myData

    (

    userid int,

    descr varchar(6)

    )

    go

    insert into #myData

    select 1, 'abc123'

    union all select 2, 'abcdef'

    union all select 3, 'lmnop'

    union all select 4, '123456'

    union all select 5, 'a1b2c3'

    union all select 6, 'qwerty'

    go

    create table #patterns

    (

    pat varchar(5)

    )

    go

    insert into #patterns

    select '%abc%'

    union all select '%123%'

    union all select '%xyz%'

    union all select '%ert%'

    select *

    from #myData myData

    cross join #patterns patterns

    where PATINDEX(patterns.pat, myData.descr) > 0

  • Rory, how big are the 2 tables?

  • There was only about 6,000 in the customer table and 7 patterns we were looking for so it really wasn't an issue. This just got my curiosity up and I couldn't think of another solution and was just wondering if anyone else would have approached this in a different manner.

  • roryp 96873 (4/19/2012)


    There was only about 6,000 in the customer table and 7 patterns we were looking for so it really wasn't an issue. This just got my curiosity up and I couldn't think of another solution and was just wondering if anyone else would have approached this in a different manner.

    if you want to use some dynamic sql its nice and fast.

    DECLARE @SQL VARCHAR(MAX)

    DECLARE @pat VARCHAR (MAX)

    SET @SQL = 'SELECT * FROM RandomGuid R WHERE RANDOM LIKE '

    SET @pat = STUFF((Select 'OR RANDOM LIKE ' + '''' + pat + '''' + ' ' FROM #patterns FOR XML PATH('')),1,15,'')

    SET @SQL = @SQL + @pat

    --SELECT @SQL

    EXEC (@SQL)

    its faster by quite a bit on my million row giud table since im running your patterns all at once instead of one at a time and as the number of patterns or customers increases the time savings will increase (relative to cross join). you would need to replace the table (RandomGuid) and column (RANDOM) to meet your needs.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • Thanks Hector, I like that approach. I've used dynamic SQL so little that I forget about it sometimes.

  • Did you adjust the original query so you're doing an apples-to-apples comparison?

    SELECT *

    FROM #myData md

    CROSS JOIN #patterns p

    WHERE

    md.descr LIKE p.pat

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • ScottPletcher (4/19/2012)


    Did you adjust the original query so you're doing an apples-to-apples comparison?

    SELECT *

    FROM #myData md

    CROSS JOIN #patterns p

    WHERE

    md.descr LIKE p.pat

    still comparing the entire table 4 times, a little faster than the patindex but not much. even with out millisecond accurecy (i can post it if you want) cross joining (or cross applying) the #patterns takes almost 10 seconds while the dynamic build of the where clause to directly compare the filters is comming in at about 3 seconds on a million rows.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • I expected that, just wanted to be fair about it 🙂 .

    The optimizer still needs improvement.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • ========== ORIGINAL METHOD ======================================================

    SQL Server Execution Times:

    CPU time = 9251 ms, elapsed time = 9267 ms.

    ========== With out PATINDEX =======================================================

    SQL Server Execution Times:

    CPU time = 9360 ms, elapsed time = 9500 ms.

    ========== Dynamic Sql ===============================================================

    SQL Server Execution Times:

    CPU time = 2761 ms, elapsed time = 2866 ms.

    for giggles and to run the stats, using pat index is faster, not by much but faster.

    here is what i ran

    --CREATE the data

    CREATE TABLE Random (col VARCHAR(36))

    INSERT INTO Random

    SELECT TOP 1000000 CONVERT(VARCHAR(36),NEWID()) AS name

    FROM sys.all_columns sc1,sys.all_columns sc2

    --Create our pattern table

    create table #patterns

    (

    pat varchar(5)

    )

    insert into #patterns

    select '%abc%'

    union all select '%123%'

    union all select '%xyz%'

    union all select '%ert%'

    SET NOCOUNT ON --make every thing quiet, does not matter very much in this case.

    DECLARE @Bitbucket VARCHAR(50) -- kill off any output

    PRINT '========== ORIGINAL METHOD ======================================================'

    SET STATISTICS TIME ON;

    select @Bitbucket = col

    from Random r

    cross join #patterns patterns

    where PATINDEX(patterns.pat, r.col) > 0

    SET STATISTICS TIME OFF;

    PRINT '========== With out PATINDEX ======================================================='

    SET STATISTICS TIME ON;

    SELECT @Bitbucket = col

    FROM Random r

    CROSS JOIN #patterns p

    WHERE r.col LIKE p.pat

    SET STATISTICS TIME OFF;

    PRINT '========== Dynamic Sql ==============================================================='

    SET STATISTICS TIME ON;

    DECLARE @SQL VARCHAR(MAX)

    DECLARE @pat VARCHAR (MAX)

    SET @SQL = 'DECLARE @Bitbucket VARCHAR(50); SELECT @Bitbucket = col FROM Random R WHERE col LIKE ' + STUFF((Select ' OR col LIKE ' + '''' + pat + '''' FROM #patterns FOR XML PATH('')),1,13,'')

    EXEC (@SQL)

    SET STATISTICS TIME OFF;


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply