Document storage / retrieval.

  • Hi Guys,

    I am a web programmer  / not a DBA, but am required obviously, to use SQL and interact with databases.

    Sometimes - in this case - I am required to develop the DB along with the application.

    A (very simple) Document Management System in this case is required and ultimately my question is:

    What is the best way to store the document and allow for ease of searching  / retrieval.

    MY intital thought was to store ALL documents as IMAGE datatypes and provide relevant columns for the storage and later searching of meta data. (ie fileName, fileDescription, fileType etc)

    I asked a colleague their thoughts and they responded with:

    * If the file checked in is a word doc, use the COM interop objects for

    word to save it as a temp .txt file and then load that file into a ntext

    column in a table that relates to the master document. This will let

    you use the free text search in MS SQL Server (CONTAINS and FREETEXT

    keywords) - The downside of this is free text search requires regular index

    maintenance.

    I can see great benefits in allowing the search of the contents of the internal text of documents.

    But if it is going to be limited to certain filetypes - what about the others? PDF / Powerpoint / CAD documents?

    Is it better to provide great search features for filetypes that have an appropriate COM object to utilise, or is it better to use one process for ALL filetypes and keep the system internals nice and simple?

    Or is there some other idea out there that we have not thought to utilise that we should be at the very least considering?

    Thanks for all your assistance...


    Gavin Baumanis

    Smith and Wesson. The original point and click device.

  • Hi Gavin,

    This may be the second post because something strange happened to my last one.

    My company provides access to documents via the web and we use a number of approaches.  The following approach is simple and yet extremely effective:-

    Keep the documents in their original format if possible!

    Create a table containing virtual folder path, real folder path and a path reference code

    Create a table containing document name and path reference code

    Create a table containing document name, keyword and description.

    The idea of the path reference is to make moving documents around easier.  The idea of the virtual path is so that if you wish you can do a response.redirect to the virtual directory and file name

     

    Hope this is of some help

    Quis custodiet ipsos custodes.

  • Just to add a bit to what Shaun said. If you want to have full text searching of the documents, you can create a table with the document id and a TEXT field containg the document text. As your colleague mentioned, this is easy enough to do with a Word documents, for things like PDF's you'll need a converter. We do this with MS's Filtdump tool. This will, however, increse complexity as you will have to create some sort of doucument loader, which will need to be used for every document put into the system.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply