String Manipulation

  • I have two tables

    1. Noise Table that has

    ID INT

    Word VARCHAR(50)

    2. Key Table that has

    ID INT

    Word VARCHAR(500)

    WordCategory INT

    Now what my application does is it saves words marked as Noise into Noise table and Words marked for Searching into Key table.

    And now when a user wants to search all the words in a sentence like He will type some free form text like 'I want to know the best cricketers of 2007'

    Now what my stored procedure has to do is it should first check if there are any noise words in this sentence and remove them and then search for the records in Word table which have matches in that table and retrieve all the rows that match any of the words in this sentence.

    Also one more additional requirement is if the user types in something like Golfers in the above example and if one row in Word table has cricketer and not cricketers, it should retrieve that records also.

    Prasad Bhogadi
    www.inforaise.com

  • One way to isolate records from table Noise matching pattern is something like:

    SELECT Word

    from Noise

    WHERE 'user entry' like '%' + Word + '%'

    This result set you can put into cursor and then replace from 'user entry' with empty string.

    After that you can search Word table in the same way.

  • i think what you are asking for is exactly what full text indexing actually does.

    it creates the word list for you, maintains an editable list of noise words, and handles word variations like work, working and worked.

    althought here's not nearly as many posts here about it, it's really easy to set up, and even easier in sql2005.

    read a bit form books on line and confirm that this is really what you wanted to do, but without re-inventing the wheel:

    Full-Text Catalogs and Indexes

    A Microsoft® SQL Server™ 2000 full-text index provides efficient support for sophisticated word searches in character string data. The full-text index stores information about significant words and their location within a given column. This information is used to quickly complete full-text queries that search for rows with particular words or combinations of words.

    Full-text indexes are contained in full-text catalogs. Each database can contain one or more full-text catalogs. A catalog cannot belong to multiple databases and each catalog can contain full-text indexes for one or more tables. A table can only have one full-text index, so each table with a full-text index belongs to only one full-text catalog.

    Full-text catalogs and indexes are not stored in the database to which they belong. The catalogs and indexes are managed separately by the Microsoft Search service.

    A full-text index must be defined on a base table; it cannot be defined on a view, system table, or temporary table. A full-text index definition includes:

    A column that uniquely identifies each row in the table (primary or candidate key) and does not allow NULLs.

    One or more character string columns covered by the index.

    The full-text index is populated with the key values. The entry for each key has information about the significant words (noise-words or stop-words are stripped out) that are associated with the key, the column they are in, and their location in the column.

    Formatted text strings, such as Microsoft® Word™ document files or HTML files, cannot be stored in character string or Unicode columns because many of the bytes in these files contain data structures that do not form valid characters. Database applications may still have a need to access this data and apply full-text searches to it. Many sites store this type of data in image columns, because image columns do not require that each byte form a valid character. SQL Server 2000 introduces the ability to perform full-text searches against these types of data stored in image columns. SQL Server 2000 supplies filters that allow it to extract the textual data from Microsoft Office™ files (.doc, .xls, and .ppt files), text files (.txt files), and HTML files (.htm files). When you design the table, in addition to the image column that holds the data, you include a binding column to hold the file extension for the format of data stored in the image column. You can create a full-text index that references both the image column and the binding column to enable full-text searches on the textual information stored in the image column. The SQL Server 2000 full-text search engine uses the file extension information from the binding column to select the proper filter to extract the textual data from the column.

    Full-text indexing is the component that implements two Transact-SQL predicates for testing rows against a full-text search condition:

    CONTAINS

    FREETEXT

    Transact-SQL also has two functions that return a set of rows that match a full-text search condition:

    CONTAINSTABLE

    FREETEXTTABLE

    Internally, SQL Server sends the search condition to the Microsoft Search service. The Microsoft Search service finds all the keys that match the full-text search condition and returns them to SQL Server. SQL Server then uses the list of keys to determine which table rows are to be processed.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks for the reply. I actually did it using a cursor and charindex. However I am wondering if this could be accomplished without using a cursor.

    Prasad Bhogadi
    www.inforaise.com

  • Second approach is to extract words from user entry and insert into temporary table so after that you can join it with your noise and key tables.

    declare @UserEntry nvarchar(4000)

    declare @Position int

    declare @Word nvarchar(100)

    declare @Words table (Word nvarchar(100))

    set nocount on

    set @UserEntry = N'some words'

    while charindex(' ' , @UserEntry) != 0

    BEGIN

    select @Position = charindex(' ' , @UserEntry)

    select @Word = substring(@UserEntry, 1, @Position - 1)

    insert into @Words values (@Word)

    select @UserEntry = stuff(@UserEntry, 1, @Position, '')

    END

    This is solution if you don't want to activate full-text search.

  • Notwithstanding the above, I think this might work

    You will need a Numbers table (search this site for examples)

    SELECT k.[ID],k.Word,k.WordCategory

    FROM (SELECT SUBSTRING(' ' + @String + ' ',

    n.Number + DATALENGTH(' '),

    CHARINDEX(' ', ' ' + @String + ' ', n.Number + DATALENGTH(' ')) - n.Number - DATALENGTH(' ')) AS [Word]

    FROM msp.dbo.Numbers n

    WHERE n.Number > 0

    AND n.Number < DATALENGTH(' ' + @String + ' ')

    AND SUBSTRING(' ' + @String + ' ', n.Number, DATALENGTH(' ')) = ' ') a

    INNER JOIN [Key] k

    ON k.Word = a.Word

    OR (RIGHT(a.Word,1) = 's' AND k.Word = LEFT(a.Word,LEN(a.Word)-1))

    OR (RIGHT(k.Word,1) = 's' AND a.Word = LEFT(k.Word,LEN(k.Word)-1))

    WHERE NOT EXISTS (SELECT * FROM [Noise] n WHERE n.Word = a.Word)

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Hello David,

    Thank you for the solution. However I completed it using a cursor and Select statement and seems to be performing fine. I know that Cursors kill performance however I am content with performance as compared to the other solution.

    Thanks Again

    Prasad Bhogadi
    www.inforaise.com

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply