Database Design - Disk/File placement

  • Hey all,

    Thanks for looking.

    I am looking at ways of improving the performance on a large database (550GB).

    I have a few questions around file placement and disk allocation, which I have not been able to find a definitive answer on.

    To give you a brief overview of my particular scenario, poor initial design on one area of the database, has resulted in a single table accounting for approximately 1/3 of the database size (200GB). I am unfortunately not able to change this, so am trying to manage it as best as possible.

    The program accessing the underlying database also sends a lot of cursors which are very poor performing, and with the volume of transactions going to the larger table, there tends to be disk I/O issues primarily stemming from this table and tempdb.

    I already separated tempdb onto a dedicated drive.

    My first question is that I have read there is an overhead from the regular growth and expansion of the tempdb, which can be overcome by simply setting the tempdb to a large initial size.

    While I can appreciate the theory behind this, is there any degredation in performance from simply having the large default tempdb?

    Secondly, we have 4 underlying 250GB disks making up one single logical (1TB~) drive which the database mdf file has been placed on.

    Is it possible to get enhanced performance by splitting the database onto 4 files (one on each disk), and leaving them as 4 physical/logical drives?

    Or would the diskheads be used to the same effect with the existing configuration anyway?

    Is it better to have 4 equally sized partitions, or would I be better placed, adding the most highly used tables onto their own disk also?

    I will be trialling these changes in our test environment, but I would appreciate some advice/recommendations anyway!

    Thanks in advance for any responses.

    πŸ™‚

  • Just to clarify, are you speaking of striping the tempdb mdf over the drives to match the number of processors?

  • Instead of having one big TempDB file, you could use multiple TempDB mdf files. One for each CPU you have in your server. That should help the TempDB.

    Your User Db will still have IO issues since you have just 4 disks. If you could get an external drive it could help quite a bit. Maybe you can look at compression. Check this white paper.

    Just my 2 cents

    -Roy

  • my personal view.....

    first check your database that which are the tables consuming most of the space. google it, you will find the query how to find the tables sort by size.

    I am sure there would be some tables which are really high on space usage having historical data.

    Move those tables to one or more ndf files and then again you can configure the backup as well either to take the full backup or any perticular filegroup backup.

    hope this is what you were looking for

    ----------
    Ashish

  • Sorry, I should have said the disks for the data partition (which the mdf is sitting on at present) are on RAID 10 configuration, not just 4 joined disks...the usable part of the config is 4 disks however.

    We have 4 quad core processers, does this theoretically mean I can split the tempdb across 16 drives if I wanted to, or is the performance improvement limited to physical processers (i.e. 4?)

    Ashish, yeah I have the list of largest tables (largest one is 200GB mentioned in OP). There is a lot of historical transactional data in this table, but there isn't a simple way of partitioning the table due to the nature of the data and I am also not able to change the structure of the data.

    Katy, my intention was to split the user db (mdf, new ndf, new ndf, new ndf) onto multiple files and spread it across 4 disks. It does not solve the main problem which is the cursors being thrown at it from the application, but I suspect that it would help in the disk I/O to some extent if I choose the right tables to go in the right files?

    Roy, is there a big improvement from splitting up the tempdb to match the processers?

    I'd like to trial them anyway to see what improvements can be added to the environment.

    Thanks πŸ™‚

  • I have had my own issues with tables that are disproportionately larger than the rest of the data tables so I sympathize. The tack I took to mitigate this (translated to your data file):

    1) You want four physically separate disks. One will be for log, one will be for the main .mdf with all BUT the large table and the other two will be for the large table.

    2) Create two new filegroups. Put one disk each into each for the large table.

    3) Create a clustered index on one of the two new filegroups for the large table. This will remove it from the original .mdf onto its own volume.

    4) Drop and recreate all non-clustered indexes onto the other filegroup. This way you have the table on one of the two filegroups/disks and the other indexes on the other one. Make sure that the new disks are at minimum equal to double plus about 10 percent the size of the table being moved/the size reserved for the indexes (depending upon which drive we are talking).

    What this does for you is give you parallel I/O for hitting the table versus the rest of the system. And since bookmark lookups are inevitable, separating the indexes from the table speeds up the access to the underlying rows.

    Don't know if you have the physical disks to do this but that's what we did and it worked great for performance.

  • Ian, It does help with performance. What you gain by doing this is reduce the contention of tempDb. But make sure that all the files are of equal size. As a general thump of rule, create one file for each CPU.

    I would really recommend trying to get an external drive along with an IO controller.

    Log files as a general rule should be kept on a separate drive. basically because it is Sequential read and write while the user data files are random.

    -Roy

  • Excellent, thanks guys.

    Sorry, I really didn't give enough info earlier.

    I do have separate physical drives for the log files (ldf's) and other separate drives for the operating system, backups and the system swap file.

    The disks I mentioned were only for managing the data portion of the single user database (or for doing something with the system drives if it can improve performance).

    Many thanks for the tips Jeff, it should come in very handy for trialling some new configurations in the test system, and thanks again Roy, I will look at splitting the tempdb onto separate files to see if that can give some success πŸ™‚

    Does anyone know whether the quad core processers in my system should mean that I go with 4 separate files per physical processor or just 1 per physical processor?

  • Quad core processor is considered as 4 processors.

    Check this link for more tips on tempDb performance.

    -Roy

  • Ian,

    A couple points here;

    First, before jumping out and creating lots of tempdb files it would be good to read up on what this solves and to understand what it is doing. These links will help;

    http://www.sqlskills.com/BLOGS/PAUL/post/Search-Engine-QA-12-Should-you-create-multiple-files-for-a-user-DB-on-a-multi-core-box.aspx

    http://www.sqlskills.com/BLOGS/PAUL/post/A-SQL-Server-DBA-myth-a-day-(1230)-tempdb-should-always-have-one-data-file-per-processor-core.aspx

    http://blogs.msdn.com/b/sqlserverstorageengine/archive/2009/01/11/tempdb-monitoring-and-troubleshooting-allocation-bottleneck.aspx?wa=wsignin1.0

    Regarding the file split, are you going to have more disk to throw at this or are you only going to partition the same # of physical disks differently so that you see it as multiple drives? Curious.

    David

    @SQLTentmaker

    β€œHe is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • Ok, one more link that will be helpful with both the tempdb and file splitting information;

    http://blogs.msdn.com/b/psssql/archive/2007/02/21/sql-server-urban-legends-discussed.aspx

    Hope this helps.

    David

    @SQLTentmaker

    β€œHe is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • Thanks David,

    Yeah it will be the same number of physical disks for the user db, but by splitting particular tables or parts onto alloted disks, I'm assuming greater efficiency simply by better management of the files/data?

    For the tempdb, I may be able to stretch to an additional disk if I can verify that it gives a big improvement.

    I have read up on splitting the tempdb onto multiple files, but it is still good to get other people's experiences...I'll check out the links you posted though, thanks! πŸ™‚

  • Some very nice links there, thanks πŸ™‚

    I have read a few things contradicting the sqlskills article too, just as the author suggests, so the sqlskills link clears it up nicely...

  • My experience with this kind of thing is that allowing RAID to handle the load balancing across multiple spindles is USUALLY better than trying to manually allocate the data extents to specific file systems.

    That said, If you are copying data from one large table into several smaller ones then it might be worth splitting it into 2 x logical disks (RAID 10 across 4 physical disks) and splitting the database into 2 filegroups, one for each logical drive and dedicating one of the filegroups to the large table.

    Make sure that your log file is on a different physical spindle from the data files, and that needs to be RAID 1. As log files are generally written sequentially you may get performance degradation using striping.

    All in all, this is an area where you need to get a test platform and try a number of configurations. You may be surprised by some of the results!

    Edit: change the can to may to emphasise that this is only a potential risk

  • Ian Ritchie (8/12/2010)


    Some very nice links there, thanks πŸ™‚

    I have read a few things contradicting the sqlskills article too, just as the author suggests, so the sqlskills link clears it up nicely...

    Being that Paul wrote the storage engine code I tend to agree with him over any of the other stuff out there. πŸ˜›

    I will say too that splitting tempdb is not always going to help. Running some of the queries specified in the other links will help analyze that a bit.

    It would be good if you could post back as you progress through this. The experience you gain will be good to share. Thanks!

    David

    @SQLTentmaker

    β€œHe is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply