Text search inside attached html files

  • Good Day,

    I am totally new to databases. I have a question for SQL Server 2005.

    Here goes:

    I have a table with few columns, in one column i am having html file attachments. I want the user to search the text inside these html files to retrieve the data. Let me more precise.

    I have a search form, User input the search text, it searches the relevant text inside the attached html files and populate their hyperlinks in list view with highlighted search words.

    Need some clues that how to do this.

    Thanks in advance.

  • Look at full-text search in Books Online. You can create full-text indexes on textual data (stored in [var]char, [var]binary or XML columns).

    ML

    ---
    Matija Lah, SQL Server MVP
    http://milambda.blogspot.com

  • Thanks for your reply.

    Actually I got the basic information for full text search. Now I am confused between data types. That what data type I need to use in my scenario. like I am implementing this project by file system. so in this case I need to use VARCHAR or VARBINARY. And whether I need to strip the text from html for search or query the search as is.

    Thanks.

  • If all your HTML documents are well-formed XML, you can use the XML data type.

    If you plan on using other binary types (doc, pdf, etc.) you should use the varbinary(max) data type. Also note that in this case you need an extra column to hold the BLOB type (the file extension - e.g. "html", "doc", "pdf", etc.) which you also need to specify when creating the full-text index.

    Word breakers used for indexing varbinary columns come with a slight overhead when the index is built/updated, and since HTML is basically character data you could also use a [n]varchar(max) column - provided that no binary data will be used in the foreseable future.

    Frankly, I think varbinary(max) (w/ the type column) is your best option.

    ML

    ---
    Matija Lah, SQL Server MVP
    http://milambda.blogspot.com

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply