Multiple files / filegroups

  • Hi All

    Some background:

    I'm in the process of migrating a DW from SQL server 2000, to SQL 2005 SP2. On the new server I have new hardawre HPbl45p 4*AMD Opteron dual core CPU's, 48GB RAM, W2K3 64bit SP2 and SQL 2005 64 bit Sp2. All this in a 2 node cluster setup, with three SQL instances installed as "virtual serveres". So one cluster node has two instances running while the other cluster node has one instance running.

    Disks are SAN connected each instance has 3 logical drives, data for all db's, log for all db's and temp (data & log).

    The SAN LUN's are shared with all other servers, and that is a constraint that cannot change for now. But we have around 120 spindles, and VRAID1 for all LUN's on the production server.

    The question:

    The DW db's are between 40 and 70 GB, so not extremely big. On the old server we run with one file group and one file.

    Will there from a performance point of view be any benefits in splitting the DB's over severeal data files on the same LUN comapred to having one large file?

    Secondly could there be a performance benefit, in creating several filegroups and placing logically seperate table objects in each group. Even though the files would end up on the same LUN.

    I have two seperate load proceeses running which have seperate object access patterns. So these objects would be candidates for file group seperation.

    Any one here with real life experience in this, is several filegroups really worth the hazzle and effort, compared to the simplicity of a one file group one file setup?

    Best regards

    //SUN

  • If they're on the same LUN, not much. You'll only see a real performance gain (in physical IO speed) if the files are on separate LUNs

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • With your SAN an HP EVA - single disk group with 120 spindles, it won't make any difference at all. All LUN's are presented from the same set of spindles - so even having multiple LUN's won't make a difference.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • 1) I believe you can benefit from multiple files/groups. At least consider placing indexes in one and data in another. Also consider isolating out any very large single tables you may have. Your load group splitting could be helpful also.

    2) Try placing tempdb logs on the log partition. In addition do you have the ability to modify SAN cache and queue lengths per LUN? If so they should be different for data and log stripes.

    3) Ashamed you can do explicit RAID configurations for data/log sets.

    4) Place tempdb on multiple identical files - one per CPU.

    5) Did you sector align the partitions? What SAN stripe size was used? What NTFS format size was used?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hi All

    Thanks for the input, in regards to splitting tempdb into several files then I have done this, but only into four files and only for the data files. Four files since I have two instances running on the same box, so together the utilize the 8 CPU's. Is it also beneficial to do the same for the log files of tempdb? Putting tempdb log files on the logfile LUN might also be interesting!

    SCCrazy I have 9 LUN's three per instance but all on the same SAN disk group with around 120 spindles.

    To Jeff: The key to me is, if the DB engine in some way will utilize the fact that it has several files i.e OS file handles for performing multiple disk IO's at the same time. Or would all this just be queueing up in the SAN fabric, or SAN (controller / disks) where all the IO anyway would have to go throgh in the end?

    What are windows 2003 server's limitations and posibilities in this regard, lets say one is writing a 2 TB file to a single file on 1 LUN on a single SAN disk group with 100 spindles

    compared to

    Writing 2 1 TB files to 2 LUN's on a single SAN disk group with 100 spindles

    Which one would complete fastest?

    To SQLGuru: in regards to your last question the the NTFS format size is just windows 2003 default, by this do you mean the block size?

    In reagard to the options on the HP EVA 8000, then this is a entry to mid range SAN with limited low level optimization options, so I don't think I can do much in that regard.

    //SUN

  • 1) tempdb should have only 1 log file.

    2) 64K is typically the 'generically best' option for NTFS allocation unit size when formatting the partitions. Default isn't good (4K IIRC). 8K could be optimal for log partitions in some cases.

    3) Unless you explicitly used affinity mask for each of your instances you have no control over which cpu/core is used by which sql executable.

    4) File groups aren't just about writing a single 2TB file vice 2 1TB files. There is a lot that goes on under the covers to optimize I/O throughput. Think writing to 100 20GB files at once. I would bet that would be significantly faster than 1 2TB file on your SAN due to the number of spindles you have.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hi SQL Guru

    I'll look into the allocation unit size, but a bit late for changing that since the cluster boxes are 5 min from going live 🙁

    In regards to the disk write example then it was just made for explaining the idea, I have no doubt that spreading the write into many small files might make it faster. But I guess that could be tested easily with SQLIO in different scenarios. But I think the key here is the multi threading of any application that uses multiple files. I guess SQL servers storage engine have this kind of multi threading build in.

    Anyway my plan is to make a SQL IO baseline test, before going live with the cluster boxes. So I at least have a picture of how the disk performance looked like when going live.

    //SUN

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply