Full-text indexes

  • Hello, I have two questions concerning full-text indexes.

    If I run the following queries on a table ftc which is full-text indexed I get:-

    1. select * from ftc order by pk

    pk Txt

    ----------- -------------

    1 AA.MK

    2 AA,MK

    3 AA\MK

    4 AA/MK

    5 100W

    (5 row(s) affected)

    2. select * from ftc where contains(Txt,'"MK*"') order by pk

    pk Txt

    ----------- -------------

    2 AA,MK

    4 AA/MK

    (2 row(s) affected)

    3. select * from ftc where contains(Txt,'"AA,MK*"') order by pk

    pk Txt

    ----------- -------------

    2 AA,MK

    4 AA/MK

    (2 row(s) affected)

    4. select * from ftc where contains(Txt,'"AA.MK*"') order by pk

    pk Txt

    ----------- ---------------

    1 AA.MK

    (1 row(s) affected)

    5. select * from ftc where contains(Txt,'"1W*"') order by pk

    pk Txt

    ----------- ---------------

    5 100W

    (1 row(s) affected)

    So 2 questions:-

    1.

    (See queries 1 - 4 above ) I am attempting to do a 'prefix-term' search where records are returned where Txt contains the search string (= 'MK' ) or a word which begins with the search string.

    When the txt field is split into separate words for a particular record why do some punctuation marks cause the string do be delimited (e.g. '/' )and others do not (e.g. '\').

    Is there a definative list documented which outline what characters cause a string to be delimited?

    2.

    (See query 5 above) Why is there a record returned in query 5? The string '100W' neither is '1W' or begins with '1W' so why is Txt = '100W' in the resultset?

    Thanks for any ideas.

    Regards,

    Andy Jones

    andyj93@hotmail.com

    .

  • Jon usually handles our full text search stuff, I sent him a note. If he's busy one of us will dig into it in a day or so!

    Andy

  • A couple of things you might want to look at for your full text querying approach.

    I believe you may only search for suffixes, no prefix searches are allowed. Another reason you may be getting off the wall results is that your contains clause cannot contain noise words. This will either cause an error or give you the erroneous results you are experiencing. It has been a little while since I have realyy dug deep into full text indexing, but give my article a quick read for info and references.

    Hope it helps...

    http://qa.sqlservercentral.com/columnists/jwiner/20010422025723_1.asp

  • Thanks for the response Jon,

    I'm trying to do a prefix term search as described in BOL:-

    <prefix_term>

    Specifies a match of words or phrases beginning with the specified text. Enclose a prefix term in double quotation marks ("") and add an asterisk (*) before the ending quotation mark, so that all text starting with the simple term specified before the asterisk is matched.........

    With regards to noise words I don't believe any of:-

    AA.MK

    AA,MK

    AA\MK

    AA/MK

    100W

    are noise words????

    The select 5. above is giving me extra results than expected not less as would happen with noise words. I can't see how 100W can be returned?

    Regards,

    Andy Jones

    andyj93@hotmail.com

    .

  • Not sure I understand exactly why it is happening either.

    Maybe try deleting the full text catalog and rebuilding it. Maybe something is corrupt. Just a guess.

    What is the data type of the 'txt' column?

  • txt is of datatype varchar(50). I don't think the 100W problem is a corruption as the problem first surfaced in a database of electrical components when entering 50W would return entries containing 500W and 5000W. I created the simple table ftc outlined below as a test to see what was going on, it seems to be a consistent that unexpected records are returned when numerics are mid-string???

    Regards,

    Andy Jones

    andyj93@hotmail.com

    .

  • Also it only brings back unexpected results when the character at the end of the string is W, Y or Z !! For example - search this website from the search link with the string "<five>W*". <five>W appears nowhere on this page but it will be returned in the results as 50W is on this page.

    Regards,

    Andy Jones

    andyj93@hotmail.com

    .

  • nb: In the example above replace <five> by the numeric. I didn't want to write it explicitly as then you would expect this page to be returned in the search example.

    Regards,

    Andy Jones

    andyj93@hotmail.com

    .

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply