Parallel file operations

  • That's what I'm going to do.

    I was just not sure how many files and how configured would be the optimal config, because I was not sure how SQL assigns threads to service IO requests.

    The first GilaMonster's answer satisfied me in 100% - pointed me to very valuable source. Prior to reading that's blog I've dig probably few dozens articles, a few books, and all information I was able to found was not precise, and inconsistent. Just exactly what Bob Dorr said in his blog - a myth spreading around, even in some MS materials.

    I'm still not sure whether if I shall use drive letters or mount point. I would like to use mount points but I don't know how to check if SQL uses more than thread when doing IO. Any clue ?

    Regards,

    Slawek

  • .. and this:

    Well, if you convince me that fixing indexes will improve SAN performance, then I'll drop all my tests with multiple files and start fixing indexes 🙂

    was just a joke...

    Regards,

    Slawek.

  • Hi Slawek,

    You might also want to look at Paul Randal's take on multiple data files

    http://www.sqlskills.com/blogs/paul/CommentView,guid,e7bbe28b-a876-4e17-ac94-deec72e19b29.aspx#commentstart

    plus, might want to look at this:

    http://www.sqlskills.com/blogs/paul/2008/01/29/TheFrustrationOfSweepingGeneralizationsFollowOnFromSearchEngineQA12.aspx

    Paul's basically saying that unless you have specific issues at the very high end, then really this sort of creating multiple files will be of no benefit. Also, the original articles on spliting the data files was based on tempdb.

    There are also other issues you may face in regard to uneven file growth, which can actually force single threading on just one of the files, actually adversely affecting performance. There's more about this issue here (at the SQLCAT team):

    http://sqlcat.com/technicalnotes/archive/2008/03/07/How-many-files-should-a-database-have-part-1-olap-workloads.aspx

    HTH

  • Slawek (5/22/2008)


    That's what I'm going to do.

    I was just not sure how many files and how configured would be the optimal config, because I was not sure how SQL assigns threads to service IO requests.

    The first GilaMonster's answer satisfied me in 100% - pointed me to very valuable source. Prior to reading that's blog I've dig probably few dozens articles, a few books, and all information I was able to found was not precise, and inconsistent. Just exactly what Bob Dorr said in his blog - a myth spreading around, even in some MS materials.

    I'm still not sure whether if I shall use drive letters or mount point. I would like to use mount points but I don't know how to check if SQL uses more than thread when doing IO. Any clue ?

    Regards,

    Slawek

    Sorry, I don't know if there are any IO differences under the covers when mount points are used as opposed to drive letters. My 'guess' is no difference, but it is just that - a guess.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hi Slawek,

    I for one think you're going down a wrong path in trying to get more I/O throughput by splitting onto multiple file groups. In my limited experience, multiple file groups are only useful if they get you more spindles or if you're experiencing very specific types of resource conflicts - and there's just no way you're getting those conflicts at 30mbps and 2k IOPS. I'm just guessing here, but if you've got 800mbps and 70kIOPS bandwidth, you're not low on spindles 😉

    The fact that you were able to saturate the SAN with SQL/IO isn't really meaningful. You can saturate anything with SQL/IO, but your choice of settings (read vs. write, sequential vs. random) obviously doesn't reflect what's happening in your application.

    So help us understand what exactly the configuration is:

    How many arrays are you using?

    What type of RAID is each array?

    How many spindles are in each array?

    What are you using each array for (e.g. logs, tempdb, data files)?

    What type of application are you optimizing for (OLTP or warehouse)?

    If a warehouse, are you trying to saturate the SAN during a large data load or during reporting?

    Here's why we need to know this. Let's say you have a 13 disk RAID 5, and you're throwing your tempdb, logs, and data files on that array. Now SQL/IO will slam this with a few threads doing large sequential I/Os. But in your real world application, logs shouldn't be on a RAID 5. Log writes are usually really small, so the overhead of the checksum will kill you. Even if it's a lot of RAID 10 spindles, having logs and warehouse data files on the same array will hurt, because the characteristics of logs are small writes and the characteristics of warehouse data IOs are large sequential reads/writes. Why move the heads to do a small write when they're already positioned to do the next 64kb read? So just moving logs onto their own RAID 1 or RAID 10 could dramatically increase SQL's ability to saturate that data file array.

    Now the reason you're not making any progress with these posts is that the solution to "saturating" your SAN almost certainly won't be file groups. Go ahead and try - it won't hurt - but I doubt it will change anything at all. If you're trying to saturate OLTP or warehouse reporting, then try a load testing suite to generate sufficient simultaneous requests to the database. If you're trying to saturate your SAN for warehouse loading, then you'll probably have to make some code changes - such as switching to bulk-logged recovery mode and using select into instead of insert/select.

  • And to answer your specific question, you do not need to do multiple file groups to get SQL to parallelize your IO. We run a DW on a single file group, logs on their own RAID 10, tempdb on its own RAID 10, and data files on their own RAID 5. With a 2x quad core, we have optimized the load to do nearly everything with bulk-logged operations, we have pre-sized our data files, logs, and tempdb, and we have no problems saturating the IO channel.

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply