Image File storing-Database vs File system

  • Hello,

    We are architecting a system that will potentially process over 2 million records a day at peak usage. Each record contains 4 image files about 70K, 60K and 170K and 180 K.

    However, I am tending towards application just storing pointers to the actual image if we are going to use the current database and sql server 2005. My main reason for this is: concern regarding I/O for processing so many BLOB records and increased time in database maintenance window. We as it is have no defined window for maintenance. So, I try to do as much online reindexing and DBCC check when I can sneak in ( at least once a week ). But with the current size estimate it will be almost 1 TB data added every 8 days if we store the images in the database server, 4 TB addiotional data to be maintained for 30 days retention period. Also, without significant maintenance window, can't reindex the tables regularly as it is not online operation. So, I am concerned about performance/maintenance/Backup restore strategy change for over all database system.

    Please let me know your views/thoughts as I need to reevaluate if I am not on the right track.

    Thanks,

    Neepa

  • Your best bet will probably be having just pointers in your database to the actual image location.

    Managing Databases with BLOB's and at the rate you state is a nightmare.

    Having external disk space for the images provides you with great flexibility in adding and managing space requirements.

    Also, if images need to moved from one location to another, a simple database update is all that is needed.

    You obviously need to have in place an image retrieval system that would allow those images to be displayed again in a client system (or webbased) without compromising any security on the actual images.

    Obviously a backup mechanism for the actual images is also required....

    _______________________________________________________________________
    For better assistance in answering your questions, click here[/url]

  • I tend to agree with Richard. While for performance reasons it's shown that in the db at these sizes is quicker than having things served off the db filesystem (with FILESTREAM), you can't distribute the load if they're in the db.

    Backup/restore is easier, but as you mentioned, it's more data, and copies you need to keep. I don't think the indexing is a big deal. You don't need to put the BLOBs in your main tables, you can link them in a horizontally partitioned table.

    I'd put them on the filesystem, especially since you can then distribute them to another machine(s), even something like a web server. IIS could store things quickly, even if you had a client app, you could http retrieve them quickly from an internal IIS server.

    The one thing I'd caution is that you need to write routines that double check paths v actual storage and verify things are correct. The biggest issue is that you'll get something out of sync and will appear to lose images.

  • Just got the updated requirements. According to that the storage requirements can vary from 150 GB to 400 GB a day. I don't think that I can incorporate that kind of structure in my current database environment where basically there is a single physical server right now. ( sql server 2005 32 GB RAM, Enterprise version, 4 quadcore CPU and local storage ). Even we go to SAN, 3-4 TB of drastic difference is difficult to deal with database.

    Could anyone tell me what is required for the file system for effective file retrival? There will be millions of files stored on the file system with the pointers on database. I am concerned that if the retrieval is not efficiently done, we will see 'ASYNCH NETWORK I/O' waits ( or may be I am thinking way too ahead for now ).

    Thanks,

    Neepa

  • You can actually go with mountpoints and hence not deal with one big drive or drives for the images. Just have different Mountpoints and with those either balance or divide the images among them. This allows also for quick addition of extra space by adding additional mountpoints.

    As part of the data storage, it would point to the current place on the file system, As for retrieval, from a SAN you should have no issues, unless you are retrieving dozens/hundreds of images at a time (which is usually not the case).

    _______________________________________________________________________
    For better assistance in answering your questions, click here[/url]

  • What's your client? How are you thinking to pull them? The client app gets a path and ????

    I think an HTTP request from an internal or external IIS server is probably the quickest, especially if there are multiple requests for the same images and IIS can cache things.

    There are dedicated image servers that have API access, or you can use SMB/file operations to pull from a UNC/mount point.

  • Thanks Steve and Richard for your guidance.

    I have started getting more info. These are actual PDF documents that we have to store.

    Filestream would have been the better bet but since we can migrate yet to 2008, that is not even an option.

    The developers are trying to insert docs in sql server through vb.net application. This is going to be an internal application. I guess speed is a concern but when we talk about storing the data in the database for such an estimated volume, I am not ready to have that stored in database with my current set up.

    Thanks,

    Neepa.

  • Take a look at http://research.microsoft.com/apps/pubs/default.aspx?id=64525 (To Blob or not to Blob) from Microsoft research.

    On the machines used for this study, the paper shows that images under 256K are always processed faster if stored in the database, and if over 1MB then they are always processed faster if stored in the file system. As machine, memory and I/O performance increases, I would expect the values of these thresholds to rise.

    There may be factors other than raw speed that influence your decision, but the important message for you is that your potential largest object of 180K will always be processed faster if stored in the database.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • That's an interesting read, though a simple example. They' cover most of the issues from file size, but they have the files on the same server as the SQL box, and they don't use IIS, which could introduce some caching.

    I'd be curious about your testing of an implementation if you can post back what you find out and what you choose.

  • We have way too many variables right now in the picture in general:

    -Moving to virtual enviroment

    -Possibly upgrading from 2005 to 2008 also from 32 bit to 64 bit sql server version

    -Taking on additional 2-3 fold transactions due to aquisitions/merger

    -On top of it adding the file processing on the same server as core business transactions are happening

    -Trying to find money for sql licensing if we want to upgrade as the person who purchased Enterprise licenses last time did not buy software assurance.

    We are going to try prototype (hopefully) both the ways. Keeping the entire image file on the file system ( 1 file contains several images and text ) and store the pointers within the database OR parsing images, text etc and store it in database separately.

    My concern is the data storage if stored on database. Currently we have local disk storage attached to the server which total could provide upto 2 TB. So, if the current estimated volumn is correct, I am out of disk space within days. If I move the server to virtual environment where I have enough storage, I am not sure whether I will get desired I/O output in current set up. Either way I see it as a Fryer or Frying Pan situation but am hoping to get it right and learn a lot from this.

    I will definitely keep sharing the outcomes..

    Neepa

  • In your situation I would warn my management of the risks and unknowns, and try to manage their expectations.

    IMHO you have too many variables to predict performance, CPU and I/O load, and disk storage. I think you are taking the right technical approach in doing some prototyping, but your management need to know you cannot recommend a solution until more work is done.

    They then have some choices... a) Wait until you have done your work; b) Hire consultants who have done this type of work before and can give quick answers (If they do not know anybody, start with your Microsoft account manager); c) They can mandate a solution for you to implement, with the knowledge that it might not work.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • One aspect you need to consider is DR. We used to hold user documents outside the database in the file system, but as the document count grew to a few million our DR strategy started to unwind.

    At that stage we were on W2003 and used WANSYNC to keep the file system data aligned across two sites. Basically, WANSYNC could not cope with the number of files we had and our latency requirements.

    The WANSYNC problem was one of the drivers for us moving to keeping the user documents inside the database. SQL Server is very capable of replicating a many 100 million row table across sites with minimal latency, so our 5m or so user documents when held within the database suddenly became a non-issue.

    The situation may be different with W2008 R2, but do not be surprised if the tool you use to synchronise the file system across sites has an upper limit on the number of files it can cope with.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply