getting ready for a terabyte of data

  • We have been told to get ready for a database containing a terabyte of data. I have not seen the design yet, but it's only 3 tables. That raises concerns right there in that one table at a minimum could be 333+ GB's. What kind of performance is there going to be with one table that large? I would think partitioning is a must for this situation. The data will be static except for a monthly load process. We will be using a SAN for storage.

    Some questions we have...

    - Index maintenance on something this large - how long could it take (luckily we will only have to do it once a month).

    - Any rules of thumb for how many partitions make sense?

    - Is there anything in particular to do on the SAN side to optimize performance? You really don't know where the partitions are physically being stored with a SAN, right? It's just one big storage device?

  • - Index maintenance on something this large - how long could it take (luckily we will only have to do it once a month).

    - Any rules of thumb for how many partitions make sense?

    - Is there anything in particular to do on the SAN side to optimize performance? You really don't know where the partitions are physically being stored with a SAN, right? It's just one big storage device?

    Index maintenance - This depends on how fragmented your indexes are, only run Rebuilds/Reorganize on indexes that need it.

    Make sure that your data files are spread for load sharing

    As for the partitioning, not too sure.

  • If the data is static except for a monthly load process you just drop and rebuild the indixes as a part of the load process.

    In this day of terrabyte drives it isn't mandatory to partition a table that is statically loaded once a month. The best thing you can do is go ahead and create the file group file sizes to be 350 GB (or more) when you create the database on the new system so the data on disk will be contiguous. This will help not only query performance but index rebuilds also. Additional file groups on other drives can increase performance. How much depends on the type of queries being run and amount of batch requests the server needs to handle.

    It would help for you to have a large tempdb on a separate set of spindles or array that way you can use the sort-in-tempdb option for index rebuilds. It will probably be used a lot in the different queries that will be run against this large database so tempdb performance is important for you in any case.

    The probability of survival is inversely proportional to the angle of arrival.

  • You could split out the data files into different file groups and drives, then configure your SAN and switches for multipathing. This could potentially double or at least increase the I/O bandwidth from your SAN to your server, but optimised for SQL Server. Brent Ozar gives an excellent explanation here -> http://www.brentozar.com/archive/2009/05/san-multipathing-part-2-what-multipathing-does/ and SQLPass - 24 hours of PASS session recordings have a session on it somewhere, too.

    ---

    Note to developers:
    CAST(SUBSTRING(CAST(FLOOR(NULLIF(ISNULL(COALESCE(1,NULL),NULL),NULL)) AS CHAR(1)),1,1) AS INT) == 1
    So why complicate your code AND MAKE MY JOB HARDER??!:crazy:

    Want to get the best help? Click here https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help (Jeff Moden)
    My blog: http://uksqldba.blogspot.com
    Visit http://www.DerekColley.co.uk to find out more about me.

  • This was removed by the editor as SPAM

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply