Advanced string search.....

  • I need help / any ideas to perform advanced search on a string...

    For ex: I have a string like below.

    "SQL Server 2005 Express Edition is the next version of MSDE and is a free, easy-to-use, lightweight, and embeddable version of SQL Server 2005."

    User's search would be Express near4("next")

    Sentence should have "Express" word and "next" word should be there in either left or right side 4 words of "Express".

    In the above sentence "next" is the 4th word (right side) from "Express" word so this should come as result...

    If user enters Express near3("next") -- above sentence should not come because "next" word is not there 3 words....either side

  • I've never used full-text search, but I think that it has the capabilities that you are looking for. It's certainly a good place to start.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Hi Allen

    Thanks for your reply...

    Definitely FTS can do nearness search....but we cannot control that it should be within 3 words or 4 words ....beside to the main word....

    Also I will have many many databases in a server.... so maintenance of FTS is difficult... and FTS index will have some latency in updating it...

    Thanks,

    Surya Prasad.

  • declare @searchTable table( TextValue varchar(max))

    insert into @searchTable values ('SQL Server 2005 Express Edition is the next version of MSDE and is a free, easy-to-use, lightweight, and embeddable version of SQL Server 2005.')

    select * from @searchTable T

    cross apply dbo.fnsplit(' ', T.TextValue ) D

    cross apply dbo.fnsplit(' ', T.TextValue) N

    where

    D.Data = 'Express' -- target

    and N.Data = 'next' -- 2nd target

    and N.ItemNum >=D.ItemNum - 4 -- N.Data Items that are within 4 below D.Data

    and N.ItemNum <=D.ItemNum + 4 -- N.Data Items that are within 4 above D.Data

    --- supporting function:

    CREATE FUNCTION [dbo].[fnsplit](

    @Delimiter nvarchar(max)

    ,@String nvarchar(max)

    )

    RETURNS TABLE AS

    RETURN

    SELECT

    SUBSTRING(@String, Numbers.N+1, CHARINDEX(@Delimiter, @String + @Delimiter, Numbers.N+1) - Numbers.N-1) AS Data

    , Numbers.N as Pos

    ,( Numbers.N) - LEN(REPLACE(LEFT(@String, Numbers.N), @Delimiter, N'')) + 1 AS ItemNum

    FROM dbo.Tally Numbers with (nolock) -- http://qa.sqlservercentral.com/articles/T-SQL/63003/ (mines Zero based)

    WHERE Numbers.N <= LEN(@String)

    AND SUBSTRING(@Delimiter + @String, Numbers.N+1, 1) = @Delimiter;

    You will still need to deal with the hard work of punctuation and multiple spaces.

    Strip them out before hand or alter the split function so that instead of a delimiter it uses a pattern matching (PatIndex('[^\W]%')) to locate next Punctuation or Space(s).

    You also have to decide what the result of 2+ matches in one string? This currently will give multiple results.

  • Thanks David....

    Actually I started working on this and I am going in the similar directions that you mentioned.... In fact I am extracting words and their positions into other table when actual data is loaded...

    Once I got that data then it is all number game to do search......

    As Your idea is also in the similar directions it is giving me a comfort feeling that I am going in right direction..... Thanks a lot....:-)

    I may need one more suggestion from you....

    Number of words that I need to search is not always one, user can enter multiple words.... with dynamic SQL this can be done .... but any other idea will be helpful....

    Once again thanks for your reply and idea.

    Thanks for SQLServerCentral.com too....:-)

  • [p]you can change the where clause to a like if you input is like Express near3("next,SQL"):[/p]

    and ','+@searchString +',' like '%,'+N.Data+',%' -- commas are so substring words don't match - also input is stripped of any whitespace

    it is a little harder if you want to find matching string phrase "next version"

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply