SQL Server stripe and allocation size

  • I've read many articles recommending 64K stripe size along with 64K allocation size when formatting the volume in Windows. I have a couple of questions:

    1. The document below recommends 256K stripe size in some scenarios. How do I determine if that pertains to us? Our databases are around 500GB in size and the data only stays in the DB for about 2 weeks before it is gone. We have high IO tables on their own RAID10 sets and the tempdb and transaction logs on RAID1 sets. The OS and pagefile are on a RAID1 set. These are all 6Gbps internal SCSI drives with a 512 or 1024MB cache battery backed up raid controller from DELL.

    http://download.microsoft.com/download/4/7/a/47a548b9-249e-484c-abd7-29f31282b04d/PhysDBStor.doc

    2. The 64K and 64K recommendation almost always seems to be talking about the database volume. Is the stripe and allocation size recommendation the same for:

    a. transaction log volume

    b. tempdb volume

    c. os/page file volume?

    3. our servers receive data from clients that average around 150K in size (and won't exceed 300K). Our database stores info about that data and then we store the data itself on an iSCSI SAN volume we present to the server. How should the stripe size and allocation size differ for that volume? Typically it's a RAID10 volume on 14 drives with 2 hotspares.

    Our environment is Windows Server 2008 R2 x64 Enteprise; SQL Server 2008 R2 Enterprise;

  • We have the same questions as listed above. We are planning on implementing the 256k stripe architecture for SQL Server 2008 R2 recommendation -- definitely for our SQLUserData drive E: (RAID 5) -and- our SQLUserLog drive F: (RAID 10).

    --> our primary question, for TEMPDB -- we are setting our tempdb data & log files on seperate RAID 10 drives (L: and M: respectively). SHOULD THESE 2 drives be carved out w/ 256k stripe as well?

    BT
  • Turns out we did some io analysis and our tempdb is barely touched compared to our other indexes, so we actually ended up not giving it it's own spindles. I was told on another site to use 64k for everything except OS/pagefile drive (use default - 4k - allocation size for that volume).

    What are you using for drive hardware? if you have separate tempdb and log files on raid10, then that's 8 drives used up. not much left for other stuff. I'd recommend you runs some stats and monitor Average Disk Seconds/transfer (or read and write respectively) and see if it's really worth giving those files their own.

    Express12 (5/31/2011)


    We have the same questions as listed above. We are planning on implementing the 256k stripe architecture for SQL Server 2008 R2 recommendation -- definitely for our SQLUserData drive E: (RAID 5) -and- our SQLUserLog drive F: (RAID 10).

    --> our primary question, for TEMPDB -- we are setting our tempdb data & log files on seperate RAID 10 drives (L: and M: respectively). SHOULD THESE 2 drives be carved out w/ 256k stripe as well?

  • I can only answer part of this.. But you are basically right, everything but the OS/pagefile volume(s) should be 64K allocation units. The reason is that SQL reads data 1 extent at a time and 1 extent is 64K, so instead of 16 x 4K reads it reads the entire extent in a single read. As far as the SAN striping I can't take a position.. Sorry.

    CEWII

  • thx for this feedback.. we are using EMC VNX. We've been granted a ton of space.. 10 TB for a 4 TB need.. and we have not implemented SQL data compression (2008 R2) on any objects yet. I typically see between a 4-to-1 to 7-to-1 space reclaimation after compressing my large tables and indexes. (eg. 2 TB data would compress to 500 GB) Granted this utilizes lots of CPU then the data needes to be decompressed on retrievel.

    I am planning on putting SQL on a Cluster:

    - O/S on C: (RAID 5)

    - SQL Server install on D: drive (RAID 5)

    - SQL User Data on drives E: F: G: (RAID 5)

    - SQL Log Data on drives H: I: J: (RAID 10)

    - TEMPDB data on drive L: (RAID 10)

    - TEMPDB log on drive M: (RAID 10)

    - Distribution DB on drive N: (RAID 5)

    - Distribution Log on drive O: (RAID 10)

    - Snapshot folder on drive P: (RAID 5)

    - Quorum on Q: drive (RAID 5)

    - System DB and Logs on drive S: (RAID 5)

    Do you see any serious flaws w/ this implementation?

    BT
  • If you have the space and the IO, then there's potentially no need to be frugal. But to get the best out of the system, I'd set up perfmon scheduling daily and watch the disk counters at the physical level and see if your granularity is necessary and beneficial.

    Express12 (5/31/2011)


    thx for this feedback.. we are using EMC VNX. We've been granted a ton of space.. 10 TB for a 4 TB need.. and we have not implemented SQL data compression (2008 R2) on any objects yet. I typically see between a 4-to-1 to 7-to-1 space reclaimation after compressing my large tables and indexes. (eg. 2 TB data would compress to 500 GB) Granted this utilizes lots of CPU then the data needes to be decompressed on retrievel.

    I am planning on putting SQL on a Cluster:

    - O/S on C: (RAID 5)

    - SQL Server install on D: drive (RAID 5)

    - SQL User Data on drives E: F: G: (RAID 5)

    - SQL Log Data on drives H: I: J: (RAID 10)

    - TEMPDB data on drive L: (RAID 10)

    - TEMPDB log on drive M: (RAID 10)

    - Distribution DB on drive N: (RAID 5)

    - Distribution Log on drive O: (RAID 10)

    - Snapshot folder on drive P: (RAID 5)

    - Quorum on Q: drive (RAID 5)

    - System DB and Logs on drive S: (RAID 5)

    Do you see any serious flaws w/ this implementation?

  • I tend not to consider compress unless it is really need, if you have the space available why spend the cycles compressing and expanding?

    CEWII

  • the performance gained in READ I/O is substantial.. many less pages to READ thru during SELECT/UPDATE/DELETE processing.. and from what I understand, only qualifying pages are un-compressed at READ time... we have 24 CPU's on this box w/ 128 GB RAM (x64)

    we have 2 weeks to test in parallel -- so I will be running our batch & reporting cycle for 3 or 4 days against UNcompressed data -- then compressing the data and running batch & reporting against compressed data. Hopefully, we'll get a good benchmark during this testing... Worst caase scenario, we can always UN-compress data if need be.. (during an outage of course)

    BT
  • Hm, I could see that.. I'd want to performance test it but I could see it help..

    CEWII

  • Hi

    both 64k and 256k are optimal stripe sizes for SQL Server. The theory behind the 256K stripe size is that a larger stripe size generally incurs less I\O with 64k extent operations. Common SQL Server block sizes of 8k, 64k and 256k, these all fit neatly into the 256k stripe. All depends on your system and what it is used for.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • a key factor regarding optimal disk striping for the new Minerva SQL Server SAN disk -- general recommendation is for SQL Server 2008 R2 --- 256k stripe size. This is a tremendous performance enhancement… as opposed to the default WINDOWS drive block size at 4k or 8k.

    http://download.microsoft.com/download/0/7/7/0778c0bb-5281-4390-92cd-ec138a18f2f9/ws08_r2_vhd_performance_whitepaper.docx

    (Currently VHD support is made as part of Windows Server 2008 R2 and high-end client ..... A 256K stripe unit size was chosen based on internal performance .... That also aligns with SQL Server performance best practices when a large ...)

    http://qa.sqlservercentral.com/blogs/sqlmanofmystery/archive/2010/01/04/fundamentals-of-storage-systems-stripe-size-block-size-and-io-patterns.aspx

    SQL Server IO Patterns and Array Performance

    SQL Server works with two specific IO request size 8K and 64K in general. If you did your due diligence earlier you could also add any other request size that you saw come through. Focusing on the page size and extent size is a good place to start. Using the raid calculator tool I chose a Seagate Savvio 15K.2 drive as my base. One of the things my calculator can’t take into consideration is your system and RAID HBA. This is where testing is essential. You will find there are anomalies in every card, physical limits on throughput and IO’s. Since my RAID card won’t do a stripe bigger than 256k that is my cap for size. Reading through several IO white papers on SQL Server the general recommendation is for 2000/2005 a 64k or 128k stripe size and for SQL Server 2008 a 256k stripe size. I’ve found as general guidance, this is a good place to start as well.

    The calculator tells me for a RAID 10 array with 24 drives at a 256k stripe size and 8k IO request I should get 9825 IOs/Sec and 76.75 MB/Sec on average, across reads, writes, sequential and random IO requests. That’s right, 76 MB/Sec throughput for 24 drives rated at 122 MB/sec minimum. That is 2.5 MB/Sec per drive.

    The same array at a 64k IO request size yields 8102 IOs/Sec and 506 MB/Sec. A huge difference in throughput just based on the IO request size. Still, not anywhere near 122 MB/Sec. As an estimate, I find that these numbers are “good enough” to start sizing my arrays. If I needed to figure out how big the array needs to be to support say 150 MB/sec throughput or 10000 IOs/Sec you can do that with the calculator as well. Armed with our estimates it’s time to actually test our new RAID arrays. I use SQLIO to do synthetic benchmarking before running any actual data loads.

    After doing a round of testing I found that in some cases the numbers were a little high or a little low. Other factors that are hard to calculate are cache hit ratios. Enterprise RAID HBA’s usually disable the write cache on the local disk controller and just use their own batter backed cache for all write operations. This is safer but with more and more disks on a single controller the amount of cache per disk can get pretty low. The HBA will also want you to split that between read and write operations. On my HP RAID HBA’s the default is 25% read and 75% write. In an older study I found on disk caches and cache size saw diminishing returns above 2 MB gaining between 1 and 2 percent additional cache hits per megabyte of cache. I expect that to flatten out even more as the caches get larger, you simply can’t get 100% cache ratios that would mean the whole drive fit in the ram cache or your IO request are the same over and over. Generally if that is the case you will find SQL Server won’t have to go to disk it will have what it needs in the buffer pool for reads. I find that if you have less than 20 percent write activity leaving the defaults is fine. If I do have a write heavy load I will set the cache to 100% writes.

    BT
  • a key factor regarding optimal disk striping for the new Minerva SQL Server SAN disk -- general recommendation is for SQL Server 2008 --- 256k stripe size. This is a tremendous performance enhancement… as opposed to the default WINDOWS drive block size at 4k or 8k.

    http://download.microsoft.com/download/0/7/7/0778c0bb-5281-4390-92cd-ec138a18f2f9/ws08_r2_vhd_performance_whitepaper.docx (Currently VHD support is made as part of Windows Server 2008 R2 and high-end client ..... A 256K stripe unit size was chosen based on internal performance .... That also aligns with SQL Server performance best practices when a large ...)

    http://qa.sqlservercentral.com/blogs/sqlmanofmystery/archive/2010/01/04/fundamentals-of-storage-systems-stripe-size-block-size-and-io-patterns.aspx

    SQL Server IO Patterns and Array Performance

    SQL Server works with two specific IO request size 8K and 64K in general. If you did your due diligence earlier you could also add any other request size that you saw come through. Focusing on the page size and extent size is a good place to start. Using the raid calculator tool I chose a Seagate Savvio 15K.2 drive as my base. One of the things my calculator can’t take into consideration is your system and RAID HBA. This is where testing is essential. You will find there are anomalies in every card, physical limits on throughput and IO’s. Since my RAID card won’t do a stripe bigger than 256k that is my cap for size. Reading through several IO white papers on SQL Server the general recommendation is for 2000/2005 a 64k or 128k stripe size and for SQL Server 2008 a 256k stripe size. I’ve found as general guidance, this is a good place to start as well.

    The calculator tells me for a RAID 10 array with 24 drives at a 256k stripe size and 8k IO request I should get 9825 IOs/Sec and 76.75 MB/Sec on average, across reads, writes, sequential and random IO requests. That’s right, 76 MB/Sec throughput for 24 drives rated at 122 MB/sec minimum. That is 2.5 MB/Sec per drive.

    The same array at a 64k IO request size yields 8102 IOs/Sec and 506 MB/Sec. A huge difference in throughput just based on the IO request size. Still, not anywhere near 122 MB/Sec. As an estimate, I find that these numbers are “good enough” to start sizing my arrays. If I needed to figure out how big the array needs to be to support say 150 MB/sec throughput or 10000 IOs/Sec you can do that with the calculator as well. Armed with our estimates it’s time to actually test our new RAID arrays. I use SQLIO to do synthetic benchmarking before running any actual data loads.

    After doing a round of testing I found that in some cases the numbers were a little high or a little low. Other factors that are hard to calculate are cache hit ratios. Enterprise RAID HBA’s usually disable the write cache on the local disk controller and just use their own batter backed cache for all write operations. This is safer but with more and more disks on a single controller the amount of cache per disk can get pretty low. The HBA will also want you to split that between read and write operations. On my HP RAID HBA’s the default is 25% read and 75% write. In an older study I found on disk caches and cache size saw diminishing returns above 2 MB gaining between 1 and 2 percent additional cache hits per megabyte of cache. I expect that to flatten out even more as the caches get larger, you simply can’t get 100% cache ratios that would mean the whole drive fit in the ram cache or your IO request are the same over and over. Generally if that is the case you will find SQL Server won’t have to go to disk it will have what it needs in the buffer pool for reads. I find that if you have less than 20 percent write activity leaving the defaults is fine. If I do have a write heavy load I will set the cache to 100% writes.

    BT
  • Express12 (5/31/2011)


    the performance gained in READ I/O is substantial.. many less pages to READ thru during SELECT/UPDATE/DELETE processing.. and from what I understand, only qualifying pages are un-compressed at READ time... we have 24 CPU's on this box w/ 128 GB RAM (x64)

    we have 2 weeks to test in parallel -- so I will be running our batch & reporting cycle for 3 or 4 days against UNcompressed data -- then compressing the data and running batch & reporting against compressed data. Hopefully, we'll get a good benchmark during this testing... Worst caase scenario, we can always UN-compress data if need be.. (during an outage of course)

    Express, when you do this comparison, please either post us up your results here or at least a link to the white paper on your findings? I'm very curious about this.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig - here is an interesting post on a SQL 2008 Data Compression benchmark. http://qa.sqlservercentral.com/blogs/glennberry/archive/2010/02/22/experiments-with-data-compression-in-sql-server-2008.aspx

    Due to extreme time limitations on our project, I may not be able to perform a BEFORE data compression // AFTER data compression benchmark comparison. I have 18 databases (primarily used for backend REPORTING) ranging from 2 GB to 800 GB, that I'll be migrating from SQL 2005 to SQL 2008 R2 sp1 in the next couple of weeks. In addition -- I'll be migrating all the SSIS, SSRS, Jobs, Maint Plans, Logins, Linked Servers, u name it.. it's migrating!

    Using Idera's Diagnostic Manager, I quickly identified all tables & indexes > 50 MB space utilized ----- I've pre-scripted ALTER statements to compress these objects (includes tables, clustered indexes, and non-clustered indexes).

    We will have daily INSERT processes that will incur overhead compressing data on it's way into the DB ---- BUT we're looking forward to significantly LESS READ I/O gathering data via SELECT's on the way out of the DB.

    If we encounter any major GOT'CHA's here, I will try to post them accordingly.

    BT

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply