Digitizing Documents and Repository

  • Just want to gather some ideas. Our company are looking at going paperless in the near future and digitizing documents in our Finace/Legal departments and beyond. This does not involve multimedia at this time. I'm imagining were looking at ~10-20TB of information to store in a database repository, what would be the best repository for this purpose ie SQL or oracle?? SharePoint would be the front end driving force. We are currently a MS SQL shop running off clustered SQL 2005 with SAN LUN's attached to each node. I believe 2TB is the maximum size of a LUN on any given node. That's what we've configured here. Does anyone have any suggestions on how best to store all this data. I'm sure Search Crawls will be tedious on databases with over 2TB off data. We are running 64-bit server hardware and compression of data whether stored in most likely XML format will play a factor to determine the design needs. SQL Server 2008 has some of these advantages but is it the best solution for Storing TeraBytes of database in a complete digital paperless environment. Any thoughts or ideas would be greatly appreciated...

  • If you are planning on using Sharepoint and it's document management solution then you are tied to SQL Server since that is the backend of Sharepoint.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • Thanks Jack for your response. If we are bound to SQL server then for SharePoint, is storage of Terabytes of data practical for SQL server to operate in norm or should we investigate a different front-end for a Document Management Solution??

  • I've never worked with databases that size, but with the proper configurations I don't know why SQL Server wouldn't scale to that. It's more of a storage issue than a querying issue I think at that point as it probably would not be highly transactional.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • SQL will scale via federated servers. Naturally, you will want to partition the DB and use multiple filegroups/Files so 2TB limit will not be an issue.

    You may want to start federated so the foundation is laid for growth.

    Have used SQL for DB’s over 1TB without federating and it works fine, even better on 64bit.

  • I would suggest to contact SQLCAT (ms sqlserver customer advisory team) !

    They may have more experience with that kind of volume(s) and may be in close contact with the sharepoint team.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • We have Sharepoint and the total db size is approaching 1Tb. It runs on a 4 cpu db server with 32Gb RAM. However, there are many databases that make up our Sharepoint system and we are intending to split it out across multiple database servers and web farms. We see heavy disk access and store many documents in it.

  • We use a Content Management system called ApplicationXtender to manage electronic documents. In our case, well over half of our documents come from external sources and the paper must be scanned and indexed. Only the document index data is stored in the SQL database. The actual document files are either on magnetic or optical media in TIF format. It would be impractical and unnecessary to OCR and fulltext index all of these externally sourced documents. We tag each document with 5 - 10 identifiers that can be used to retrieve the document - fields like acct#, taxid, name, doc type, doc date, etc.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply