Database Design - Disk/File placement

Question

Post reply

Database Design - Disk/File placement

McSQL

SSCarpal Tunnel

Points: 4936
More actions
August 12, 2010 at 5:06 am

#227722

Hey all,

Thanks for looking.

I am looking at ways of improving the performance on a large database (550GB).

I have a few questions around file placement and disk allocation, which I have not been able to find a definitive answer on.

To give you a brief overview of my particular scenario, poor initial design on one area of the database, has resulted in a single table accounting for approximately 1/3 of the database size (200GB). I am unfortunately not able to change this, so am trying to manage it as best as possible.

The program accessing the underlying database also sends a lot of cursors which are very poor performing, and with the volume of transactions going to the larger table, there tends to be disk I/O issues primarily stemming from this table and tempdb.

I already separated tempdb onto a dedicated drive.

My first question is that I have read there is an overhead from the regular growth and expansion of the tempdb, which can be overcome by simply setting the tempdb to a large initial size.

While I can appreciate the theory behind this, is there any degredation in performance from simply having the large default tempdb?

Secondly, we have 4 underlying 250GB disks making up one single logical (1TB~) drive which the database mdf file has been placed on.

Is it possible to get enhanced performance by splitting the database onto 4 files (one on each disk), and leaving them as 4 physical/logical drives?

Or would the diskheads be used to the same effect with the existing configuration anyway?

Is it better to have 4 equally sized partitions, or would I be better placed, adding the most highly used tables onto their own disk also?

I will be trialling these changes in our test environment, but I would appreciate some advice/recommendations anyway!

Thanks in advance for any responses.

🙂

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

katy.park Old Hand Points: 337 More actions · Answer 1

katy.park

Old Hand

Points: 337

August 12, 2010 at 10:08 am

#1206162

Just to clarify, are you speaking of striping the tempdb mdf over the drives to match the number of processors?

Roy Ernest SSC-Dedicated Points: 38780 More actions · Answer 2

Instead of having one big TempDB file, you could use multiple TempDB mdf files. One for each CPU you have in your server. That should help the TempDB.

Your User Db will still have IO issues since you have just 4 disks. If you could get an external drive it could help quite a bit. Maybe you can look at compression. Check this white paper.

Just my 2 cents

-Roy

crazy4sql SSCoach Points: 19590 More actions · Answer 3

my personal view.....

first check your database that which are the tables consuming most of the space. google it, you will find the query how to find the tables sort by size.

I am sure there would be some tables which are really high on space usage having historical data.

Move those tables to one or more ndf files and then again you can configure the backup as well either to take the full backup or any perticular filegroup backup.

hope this is what you were looking for

----------
Ashish

McSQL SSCarpal Tunnel Points: 4936 More actions · Answer 4

Sorry, I should have said the disks for the data partition (which the mdf is sitting on at present) are on RAID 10 configuration, not just 4 joined disks...the usable part of the config is 4 disks however.

We have 4 quad core processers, does this theoretically mean I can split the tempdb across 16 drives if I wanted to, or is the performance improvement limited to physical processers (i.e. 4?)

Ashish, yeah I have the list of largest tables (largest one is 200GB mentioned in OP). There is a lot of historical transactional data in this table, but there isn't a simple way of partitioning the table due to the nature of the data and I am also not able to change the structure of the data.

Katy, my intention was to split the user db (mdf, new ndf, new ndf, new ndf) onto multiple files and spread it across 4 disks. It does not solve the main problem which is the cursors being thrown at it from the application, but I suspect that it would help in the disk I/O to some extent if I choose the right tables to go in the right files?

Roy, is there a big improvement from splitting up the tempdb to match the processers?

I'd like to trial them anyway to see what improvements can be added to the environment.

Thanks 🙂

jeff.mason SSCrazy Eights Points: 9555 More actions · Answer 5

I have had my own issues with tables that are disproportionately larger than the rest of the data tables so I sympathize. The tack I took to mitigate this (translated to your data file):

1) You want four physically separate disks. One will be for log, one will be for the main .mdf with all BUT the large table and the other two will be for the large table.

2) Create two new filegroups. Put one disk each into each for the large table.

3) Create a clustered index on one of the two new filegroups for the large table. This will remove it from the original .mdf onto its own volume.

4) Drop and recreate all non-clustered indexes onto the other filegroup. This way you have the table on one of the two filegroups/disks and the other indexes on the other one. Make sure that the new disks are at minimum equal to double plus about 10 percent the size of the table being moved/the size reserved for the indexes (depending upon which drive we are talking).

What this does for you is give you parallel I/O for hitting the table versus the rest of the system. And since bookmark lookups are inevitable, separating the indexes from the table speeds up the access to the underlying rows.

Don't know if you have the physical disks to do this but that's what we did and it worked great for performance.

Roy Ernest SSC-Dedicated Points: 38780 More actions · Answer 6

Ian, It does help with performance. What you gain by doing this is reduce the contention of tempDb. But make sure that all the files are of equal size. As a general thump of rule, create one file for each CPU.

I would really recommend trying to get an external drive along with an IO controller.

Log files as a general rule should be kept on a separate drive. basically because it is Sequential read and write while the user data files are random.

-Roy

McSQL SSCarpal Tunnel Points: 4936 More actions · Answer 7

Excellent, thanks guys.

Sorry, I really didn't give enough info earlier.

I do have separate physical drives for the log files (ldf's) and other separate drives for the operating system, backups and the system swap file.

The disks I mentioned were only for managing the data portion of the single user database (or for doing something with the system drives if it can improve performance).

Many thanks for the tips Jeff, it should come in very handy for trialling some new configurations in the test system, and thanks again Roy, I will look at splitting the tempdb onto separate files to see if that can give some success 🙂

Does anyone know whether the quad core processers in my system should mean that I go with 4 separate files per physical processor or just 1 per physical processor?

Roy Ernest SSC-Dedicated Points: 38780 More actions · Answer 8

Roy Ernest

SSC-Dedicated

Points: 38780

August 12, 2010 at 1:29 pm

#1206299

Quad core processor is considered as 4 processors.

Check this link for more tips on tempDb performance.

-Roy

David Benoit SSC-Dedicated Points: 34562 More actions · Answer 9

Ian,

A couple points here;

First, before jumping out and creating lots of tempdb files it would be good to read up on what this solves and to understand what it is doing. These links will help;

http://www.sqlskills.com/BLOGS/PAUL/post/Search-Engine-QA-12-Should-you-create-multiple-files-for-a-user-DB-on-a-multi-core-box.aspx

http://www.sqlskills.com/BLOGS/PAUL/post/A-SQL-Server-DBA-myth-a-day-(1230)-tempdb-should-always-have-one-data-file-per-processor-core.aspx

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2009/01/11/tempdb-monitoring-and-troubleshooting-allocation-bottleneck.aspx?wa=wsignin1.0

Regarding the file split, are you going to have more disk to throw at this or are you only going to partition the same # of physical disks differently so that you see it as multiple drives? Curious.

David

@SQLTentmaker

“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

David Benoit SSC-Dedicated Points: 34562 More actions · Answer 10

David Benoit

SSC-Dedicated

Points: 34562

August 12, 2010 at 1:48 pm

#1206308

Ok, one more link that will be helpful with both the tempdb and file splitting information;

http://blogs.msdn.com/b/psssql/archive/2007/02/21/sql-server-urban-legends-discussed.aspx

Hope this helps.

David

@SQLTentmaker

“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

McSQL SSCarpal Tunnel Points: 4936 More actions · Answer 11

Thanks David,

Yeah it will be the same number of physical disks for the user db, but by splitting particular tables or parts onto alloted disks, I'm assuming greater efficiency simply by better management of the files/data?

For the tempdb, I may be able to stretch to an additional disk if I can verify that it gives a big improvement.

I have read up on splitting the tempdb onto multiple files, but it is still good to get other people's experiences...I'll check out the links you posted though, thanks! 🙂

McSQL SSCarpal Tunnel Points: 4936 More actions · Answer 12

McSQL

SSCarpal Tunnel

Points: 4936

August 12, 2010 at 4:52 pm

#1206353

Some very nice links there, thanks 🙂

I have read a few things contradicting the sqlskills article too, just as the author suggests, so the sqlskills link clears it up nicely...

Toby Harman SSCarpal Tunnel Points: 4159 More actions · Answer 13

My experience with this kind of thing is that allowing RAID to handle the load balancing across multiple spindles is USUALLY better than trying to manually allocate the data extents to specific file systems.

That said, If you are copying data from one large table into several smaller ones then it might be worth splitting it into 2 x logical disks (RAID 10 across 4 physical disks) and splitting the database into 2 filegroups, one for each logical drive and dedicating one of the filegroups to the large table.

Make sure that your log file is on a different physical spindle from the data files, and that needs to be RAID 1. As log files are generally written sequentially you may get performance degradation using striping.

All in all, this is an area where you need to get a test platform and try a number of configurations. You may be surprised by some of the results!

Edit: change the can to may to emphasise that this is only a potential risk

David Benoit SSC-Dedicated Points: 34562 More actions · Answer 14

Ian Ritchie (8/12/2010)

Some very nice links there, thanks 🙂

I have read a few things contradicting the sqlskills article too, just as the author suggests, so the sqlskills link clears it up nicely...

Being that Paul wrote the storage engine code I tend to agree with him over any of the other stuff out there. 😛

I will say too that splitting tempdb is not always going to help. Running some of the queries specified in the other links will help analyze that a bit.

It would be good if you could post back as you progress through this. The experience you gain will be good to share. Thanks!

David

@SQLTentmaker

“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot