Parallel file operations

  • Hi All,

    Having database spreaded across several files is it enough for SQL Server to assign separate worker thread per file if all the files are in the same filegroup, or do I need to put files in separate filegroups ?

    In other words - does SQL assign worker thread per file, or per filegroup ? Having 4+ CPU machine and database in 4 files located in 2 filegroups does SQL do parallel file operations using 2 or 4 processors ?

    Regards,

    Slawek

  • MS SQL 2005 will use multiple threads even with a single filegroup. You can increase performance in some instances by spreading your queries over several filegroups because you can reduce the amount of contention for drive heads.

    If you are trying to figure out how to configure your filegroups and drives, be less concerned about the threading and more concerned about trying to find queries that you can reduce contention by moving tables or indexes away from each other. For example, if you have an insert happening regularly on one table, but a query running against another set of tables, you may want to move the table being regularly inserted into to another filegroup located on another set of drives. Yes, you would need enough processor power for your SQL server to be able to run both operations at the same time, but it handles the threading pretty well so this should not be your chief concern.

    Just don't go crazy - creating more and more file groups gets a diminishing return.

  • Make sure you're read this:

    http://blogs.msdn.com/psssql/archive/2007/02/21/sql-server-urban-legends-discussed.aspx

    SQL does not use 1 thread for I/O per data file.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks GilaMonster - that was exactly what I was looking for.

    To sum up:

    Single worker per separate IO path, up to maximum of number of CPU cores or I/O affinity. Unique disk drive letter is that what counts as separate IO path. No confirmation whether mount point counts as separate IO path or not.

    Regards,

    Slawek

  • Looks about right.

    I wouldn't worry about splittng files/filegroups unless you're noticing some form of IO contention

    high numbers of pending IOs within SQL, high disk queue length (though that counter's fairly hard to interpret properly), high sec/read or sec/write, etc.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Fixing poorly written queries and finding indexing opportunities will almost certainly provide WAY more benefit than worrying about how many files to use in a file group. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (5/20/2008)


    Fixing poorly written queries and finding indexing opportunities will almost certainly provide WAY more benefit than worrying about how many files to use in a file group. 🙂

    QFT.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi,

    Well, if you convince me that fixing indexes will improve SAN performance, then I'll drop all my tests with multiple files and start fixing indexes 🙂

    My problem is that single thread is not able to saturate SAN network. With 3 threads or more I'm starting to see SAN hardware bottleneck (tests were done using SQLIO and SQLSIM). So I believe I really need to split the database across multiple files. And I don't want to split the data per object - which is the case with FILEGROUPS.

    Anyway - thank you all for your help.

    Regards,

    Slawek

  • Slawek (5/20/2008)


    Hi,

    Well, if you convince me that fixing indexes will improve SAN performance, then I'll drop all my tests with multiple files and start fixing indexes 🙂

    My problem is that single thread is not able to saturate SAN network. With 3 threads or more I'm starting to see SAN hardware bottleneck (tests were done using SQLIO and SQLSIM). So I believe I really need to split the database across multiple files. And I don't want to split the data per object - which is the case with FILEGROUPS.

    Anyway - thank you all for your help.

    Regards,

    Slawek

    Nope... that's not what's being said... what is being said is that you shouldn't try to do hardware fixes for a software problem. Sure, you may get a little gain and a temporary reprieve, but it's usually poorly written code that looks like an I/O bottle neck... no matter how good the SAN or how well configured it is, non-performant code will suck the life out of it. Once the bad code has been rewritten to work with performance and scalability in mind, then you can add some indexes that will actually do something for you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • What we're saying is that you should only do hardware fixes if you know it's a hardware bottleneck, and you've already optimised the queries as much as possible and tuned the indexes so that they are effective for the workload.

    Lots of people I've seen throw expensive hardware at a performance problem and then complain when the performance doesn't improve.

    The correct order for fixing performance problems is something like this

    1) Application - make sure it's not doing stupid things, like joining tables client side or retrieving the same data 10 times, etc

    2) Queries - Make sure your queries are as fast and as optimal as possible

    3) Indexes - Ensure they support the workload

    4) Hardware - Throw better hardware at the problem.

    Are you seeing IO botlenecks with regular application workloads? SQLIOSim is supposed to saturate the IO system. that's the point of it.

    What IO stats are you seeing at peak workload?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You are definitely software ortodox 🙂

    I have hardware which is capable of doing 800 MBPS and 70k IOPS (at least that's what manufacturer claims, which I don't believe) why should I leave it as it is, operating at 30 MBps and 2k IOPS ? By using 4 thread I can easily push to 80-100 MBps and 8k IOPS without even touching internal DB objects. LOG writing performance will not be improved, but at least long scans will be much faster.

    Index/query tuning has nothing to do with that, although I know that it gives much better results - with few exceptions.

    Regards,

    Slawek

  • Slawek (5/21/2008)


    You are definitely software ortodox 🙂

    No, we've just done a fair bit of performance tuning in the past and know where the issues usually lie and where the improvements come from.

    I have hardware which is capable of doing 800 MBPS and 70k IOPS (at least that's what manufacturer claims, which I don't believe) why should I leave it as it is, operating at 30 MBps and 2k IOPS ?

    Because a well-tuned database (unless you're running amazon or yahoo) doesn't usually need to be doing 70k IOPS.

    What's your average sec/read, sec/write? What do you see the disk queue length at (sustained, not spikes)? What's your disk % idle time

    Are you seeing pending IOs in SQL?

    What RAID level are you running on the SAN?

    Move the log file onto separate physical devices from the data files. If you're seeing signs of IO bottlenecks, then look at splitting your data files up onto separate devices. If you're not seeing signs of IO bottlenecks during normal operation, then splitting the data files up and apart is not going to achieve much

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi,

    1. Regardless the database requires 70k IOPS or not leaving hardware operating at 10% or 25% of its limits is a simple waste of resources, especially I don't need to spend single penny for pushing the limits.

    2. Spreading the database across multiple files is in my case the simplest possible 'tuning' just because I have free HW resources - and in my opinion leaving the DB as it is now and starting digging and altering tables and/or indexes is again waste of time, especially I will be moving the DB from one server to another.

    I will have to recreate the empty database, and then I will load it using app backup and not SQL, because this is requirement of another upgrade process.

    Recreating empty database in multiple files is the simplest possible option to get a few more horsepower, I just wanted to create better config using minimum efforts, and not to solve some particular performance problems.

    3. Queries are generated by ERP applicatin an I don't have any possibility to change the syntax, except adding hints or plan guides.

    4. There are queries (like loading DW tables ones) which simply cannot be optimized because:

    a) the whole record is copied

    b) timestamp field is used to determine which records shall be copied and which not.

    What do you think - is it better to scan the entire table every DW load, or is it better to create non clustered index on timestamp column covering all table fields, and force server to update this index every single record update ? Which will have bigger impact on server performance ?

    I know that playing with queries and indexes or optimizing queries is the best method.

    I really appreciate your help but I just have different goal - to create better performing HW+SW configuration and not to solve any particular performance problems.

  • Slawek (5/21/2008)


    What do you think - is it better to scan the entire table every DW load, or is it better to create non clustered index on timestamp column covering all table fields, and force server to update this index every single record update ? Which will have bigger impact on server performance ?

    Would have to see the queries to say anything for sure. Generally, index update overheads are small, unless you have 20-30 indexes.

    Also, depends on the choice of cluster (if any), on the covering/noncovering nature of the index and the selectivity of the data.

    Give it a try and see. If the index doesn't help, or hinders updates, remove it.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I am not sure anyone here is espousing NOT using multiple files to spread the IO load and capacity. By all means do it since you seem to be lucky enough to have excess IOPS available. We were just saying that there is usually more to be gained (as in orders of magnitude improved performance) by refactoring bad code and implementing needed indexes.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply