Full Text Index ignoring words and not returning results

  • I have an odd problem with a full-text indexed table, and worse still, I can replicate it using a brand new table with a single row of data.

    Essentially the keyword CONTAINS() is not returning rows if certain words are mentioned in the criteria, however I am not using stoplists or stopwords.

    The example I have is a field with a full-text index defined on it called [coursetitle] and with a single row in it with the text 'Certificate in Will Drafting'

    If I use the clauses WHERE CONTAINS(coursetitle, '"certificate"'), WHERE CONTAINS(coursetitle, '"drafting'") or WHERE CONTAINS(coursetitle, '"certificate" and "drafting"') get the record returned, however if I add in the word "will" then nothing is returned. e.g. CONTAINS(coursetitle, '"will"'), CONTAINS(coursetitle, '"certificate" and "will"')

    It is causing problems on our websites as the presence of such a word in the search criteria is suppressing information that should be found.

    I have even rebuilt the FTI from scratch in case it had become corrupt and queried sys.fulltext_stopwords in case anything was in there, but it is empty.

    What am I missing? :crazy:

    Thanks in advance.

    Colin


    ---------------------------------------
    It is by caffeine alone I set my mind in motion.
    It is by the Beans of Java that thoughts acquire speed,
    the hands acquire shaking, the shaking becomes a warning.
    It is by caffeine alone I set my mind in motion.

  • Colin,

    In 2005 "will" is in default noise list therefore FTS just ignores it (certainly it ignores ' "certificate" AND "will" ' also). I do not have right now 2008 in front of me but IMO you should find this file on server (from BOL- FTDATA\ FTNoiseThesaurusBak ) to be sure it's empty (regardless of sys.fulltext_stopwords results).

  • Okay, now I'm getting more confused. According to BOL, the 2005 noise word file has been superseded with stopwords and stoplists in SQL Server 2008.

    The test database I am working on is level 100 and created as new on SQL Server 2008.

    I have queried sys.fulltext_stopwords and it comes up empty.

    sys.dm_fts_index_keywords clearly indicates that only the words certificate and drafting have been indexed.

    sys.dm_fts_parser shows that certificate and drafting are exact matches, will is a noise word (I can understand why as in the phrase "I will go shopping", but annoying as I am referring to it in the context of "last will and testament").

    As a result, by including 'AND "will"' into the search criteria it is failing the match.

    This is where I am getting confused. If sys.dm_fts_parser is indicating that "will" is a noise word, surely it will ignore it when searching, but this does not appear to be the case.

    It might help if I could get to the noise word list programmatically and eliminate these words from my search condition, but how I do this is currently evading me.

    I seem to have two options, either set the stoplist to off, or create an empty stoplist and have the FTI use this, however this then indexes every word, but it works; the problem here is now that if someone searches for "certificate in a will" it fails as it does not match, and back to square one.

    We eliminate what we believe are stop words, but obviously aren't catching them all. It is becoming very apparent that I need access to the noiseword table to ensure that I don't use them in the search condition, but again, how do I get to this list in code ....

    ... or am I missing something glaringly obvious ... again :rolleyes:


    ---------------------------------------
    It is by caffeine alone I set my mind in motion.
    It is by the Beans of Java that thoughts acquire speed,
    the hands acquire shaking, the shaking becomes a warning.
    It is by caffeine alone I set my mind in motion.

  • Sorted it!

    Took me a while to find the system view sys.fulltext_system_stopwords which gave me the list I needed. Having used this to exclude words from my search criteria and have now also learning how to create, manipulate and use my own custom stoplists along the way.

    I never even realised they were there - doh!

    :hehe:


    ---------------------------------------
    It is by caffeine alone I set my mind in motion.
    It is by the Beans of Java that thoughts acquire speed,
    the hands acquire shaking, the shaking becomes a warning.
    It is by caffeine alone I set my mind in motion.

  • Shame on me- completely forgot that there are actually 2 stop lists- system and DB.

    Thanks for reminding 🙂

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply