Data File Sizes - Best Practice?

  • Anyone have any info regarding Best Practices relating to max data file size for large DB's?

    The scenario is a database of 1 TB in size. We normally keep data files limited to 80GB each before we create a new one. I'm curious about the thoughts of others on this topic.

    Thanks

  • Reply deleted.

    Tara Kizer
    Microsoft MVP for Windows Server System - SQL Server
    Ramblings of a DBA (My SQL Server Blog)[/url]
    Subscribe to my blog

  • Hi Tara,

    Do you have a document from MS and says less than 64 GB?

    Thank you

  • I removed my reply as it was a bad recommendation by me. I misunderstood the MS engineer.

    Tara Kizer
    Microsoft MVP for Windows Server System - SQL Server
    Ramblings of a DBA (My SQL Server Blog)[/url]
    Subscribe to my blog

  • SouthernConfusion (7/19/2010)


    Anyone have any info regarding Best Practices relating to max data file size for large DB's?

    The scenario is a database of 1 TB in size. We normally keep data files limited to 80GB each before we create a new one. I'm curious about the thoughts of others on this topic.

    Thanks

    Well, a Data file(.mdf) can grow up to 16 TB and a database can grow up to maximum of 524,258 TB. So, 80GB per data file is not big at all, however, if you are looking for performance wise it is recommended to create multple secondary(.ndf) files. I am not really sure what's the best practice, I will see what others have to say about this topic.

    Thanks,

    TA

    Regards,
    SQLisAwe5oMe.

  • SQLCrazyCertified (7/23/2012)


    SouthernConfusion (7/19/2010)


    Anyone have any info regarding Best Practices relating to max data file size for large DB's?

    The scenario is a database of 1 TB in size. We normally keep data files limited to 80GB each before we create a new one. I'm curious about the thoughts of others on this topic.

    Thanks

    Well, a Data file(.mdf) can grow up to 16 TB and a database can grow up to maximum of 524,258 TB. So, 80GB per data file is not big at all, however, if you are looking for performance wise it is recommended to create multple secondary(.ndf) files. I am not really sure what's the best practice, I will see what others have to say about this topic.

    Thanks,

    TA

    Do you have a reference for that NDF thing?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I guess the recommendation would depend on the RIAD configuration and what other operation you intend to perform on it. The primary reason i say this is becuase the more data files that can spread across disks the better IO you achieve. Also if you have compression etc and partitioning invovled then these numbers are likley to be very different.

    Jayanth Kurup[/url]

  • Jeff Moden (7/23/2012)


    SQLCrazyCertified (7/23/2012)


    SouthernConfusion (7/19/2010)


    Anyone have any info regarding Best Practices relating to max data file size for large DB's?

    The scenario is a database of 1 TB in size. We normally keep data files limited to 80GB each before we create a new one. I'm curious about the thoughts of others on this topic.

    Thanks

    Well, a Data file(.mdf) can grow up to 16 TB and a database can grow up to maximum of 524,258 TB. So, 80GB per data file is not big at all, however, if you are looking for performance wise it is recommended to create multple secondary(.ndf) files. I am not really sure what's the best practice, I will see what others have to say about this topic.

    Thanks,

    TA

    Do you have a reference for that NDF thing?

    Jeff, I do not have any reference for that. Basically as Jayanath mentioned, the more secondary files the better performance you get. Let me know your thoughts, if you disagree.

    Thanks,

    TA

    Regards,
    SQLisAwe5oMe.

  • Here is a good reference on the multiple data files thingy...

    http://www.sqlskills.com/BLOGS/PAUL/post/Benchmarking-do-multiple-data-files-make-a-difference.aspx

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • SQLRNNR (7/23/2012)


    Here is a good reference on the multiple data files thingy...

    http://www.sqlskills.com/BLOGS/PAUL/post/Benchmarking-do-multiple-data-files-make-a-difference.aspx

    As Paul stated, "Now, this is a very contrived test, with a single operation in my workload - it's definitely NOT representative of a mixed-operation OLTP workload." Paul also stated, "In the best case, the eight-file case on two arrays was just over 6% faster than the single-file case on the single array." Note also that going from 1 file to 2 files on the single 8-drive DATA file array, and all 16 file configurations, resulted in a performance decrease. It's clearly not as easy as "more is better".

    I would suggest that unless you're doing fairly serious levels of benchmarking across all levels of your hardware and operating system, a maximum of 6% (and quite possibly less for an arbitrarily chosen configuration) either way may well be lost in the noise of other configuration choices (RAID stripe size, choice of hardware RAID controller, RAID level variations, number of drives per RAIDSET, how much buffer you allocated SQL Server, Cost Threshold of Parallelism and default MAXDOP choices, VLF counts, which data and log files are on which spindle sets, IO depth settings on your SAN HBA drivers, etc.) If you're running enough tests to optimize all of these values, then yes, run tests on different data file configurations as well.

    If you're going with a "best practice" approach, I suspect right-sizing files (i.e. capacity planning), defragmentation within SQL Server and at the OS filesystem level, estimating your IOPS and throughput requirements, separating data and log files on different spindle sets, and other basic steps will be more valuable uses of your time.

    If I'm going to have more than one data file, in general I want them to be on separate, dedicated spindle sets, so sequential reads and writes stay that way. Note that I haven't worked with data files or tables/clustered indexes over the half-terabyte mark, either.

  • SQLCrazyCertified (7/23/2012)


    if you are looking for performance wise it is recommended to create multple secondary(.ndf) files. I am not really sure what's the best practice, I will see what others have to say about this topic.

    I believe that information is a bit outdated and refers to Direct Attached Storage as does the article from Paul.

    Its a realistic approach and works but you can get even more benefit by intelligently looking at your logical database structure for "hotspot" objects and look to move those onto different arrays. This can apply to indexes too.

    SQLRNNR (7/23/2012)


    Here is a good reference on the multiple data files thingy...

    http://www.sqlskills.com/BLOGS/PAUL/post/Benchmarking-do-multiple-data-files-make-a-difference.aspx

    What kind hardware are you running on and does it include a SAN? I don't bother splitting any files until I reach drive capacity and have got 4TB files knocking around.

  • SQLCrazyCertified (7/23/2012)


    Basically as Jayanath mentioned, the more secondary files the better performance you get. Let me know your thoughts, if you disagree.

    I would disagree.

    It's far from as simple as 'more data files = more performance". It's perfectly possible to split into multiple data files and have no improvement. It's possible to split and have a degradation in performance.

    Whether or not a DB would be split for performance reasons requires investigating things like the current primary bottleneck, the underlying IO subsystem design and layout, the IOPS the drives can manage and their latency, the access pattern of the tables and indexes, etc.

    That's not even considering splitting for recoverability purposes.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply