SQL Server 2000 and MSA 1000 performance problems

  • Hi All

    We have setup SQL Server 2000 cluster running on Windows 2003 and are using HP MSA 1000(SAN) for hosting our databases. The performance have been horrible since we migrated to SAN , especially the Avg. Queue length and Avg. Read Queue length are very high.

    I collected following perf counters (listing the avg. values)

    Memory\Available MBytes : 3622

    Memory\Pages/sec : 92.183

    Buffer cache hit ratio : 98.718

    General Statistics\User Connections :280

    SQL Statistics\Batch Requests/sec : 81.766

    PhysicalDisk\Avg. Disk Queue Length : 51

    PhysicalDisk\Avg. Disk Read Queue Length :44

    PhysicalDisk\Avg. Disk Write Queue Length: 6

    Processor(_Total)\% Processor Time :20

    System\Processor Queue Length : 0

    We do run reindexing on selected tables as part of maintenance plan.

    The SQL box is running two instances of SQL Server , AWE is enabled and each instance have 3 GB of RAM .There is no other application running on this box. Following is the server hardware configuration.

    1. RAM 8 GB + 8 GB of PF

    2. 8 processors

    3. A total of 14 physical drives.

    Array 1 : 2 146 GB Drives with RAID 1+0

    Array 2: 12 146 GB drives with RAID 6, (Note RAID 6 consumes 2 drives , so actual data is written on 10 drives)

    4. For SQL Databases I have created 4 logical drives

    i. Data_drive_Defaultinstance : For hosting DB Data files of Default instance , the disk is configured with 16 kb stripe size on RAID 6.

    ii. Log_drive_Defaultinstance: For hosting DB log of Default instance , the disk is RAID (1+0) with stripe size of 128 KB.

    iii. Data_drive_NamedInstance: RAID 6 , stripe size 16 kb.

    iv. Log_drive_NamedInstance: RAID 1+0 , strip size 128 KB

    Any help will be greatly appreciated.

    Regards

  • according to Microsoft, if you use a SAN you need some different counters

    PhysicalDisk\Average Disk Queue Length

    Indicates the average number of both read and write requests that were queued for the selected disk during the sample interval.

    The average should be less than the number of spindles of the disk. If a SAN is being used, ignore this counter and concentrate on the latency counters: PhysicalDisk\Average Disk sec/Read and PhysicalDisk\Average Disk sec/Write.

    below: link for disk problems with exchange (the mail server)

    http://www.microsoft.com/technet/prodtechnol/exchange/guides/TrblshtE2k3Perf/5bcdd349-dcc6-43eb-9dc3-54175f7061ad.mspx

     

    What is the current patch level of your sql servers?

  • I use a similar set-up.  I have no problems at all.  I do envy your cpu count

    Did you perform any IO testing before you installed SQL server?  I do know that RAID6 is a "pig" when it comes to write performance.  I don't remember exactly how many, but every logical write will require some physical reads, followed by at least 3 physical writes.  (data and 2 parity). 

    Usually striped-parity is okay for a database that sustains low enough writes to be handled by the SAN's cache, as long as you have a fast sink for the log files (you = 1+0, check) BUT!

    If you are writing LOTS of data, striped parity will hurt you. It's also bad if you have to restore a db from backup.

    Also check your cache read/write ratio.  I use 100% write for my SQL MSA1000s.

    Can you post back some disk throughput counters?  Bytes per second on the physical disk counters, along with which RAID level they match up with?

     

  • Aw geez.  Now I read a little more carefully and I can see what the problem might be....

    You've got a two-drive mirror for your log files!  That's not so good.

    What I would do is start over and create 1 array, with 12 drives and a spare. Use 1+0 for at least the log files, and 1+0 for data if you can afford the disk.  (At 20K per CPU for SQL EE, that would be a drop in the ocean)

    The concept of putting log files on different "spindles" is a little different now that you have a SAN.  In fact, with a decent-sized cache, you can almost always use the same "array" for log and data.  Especially when you are using 1+0.  The worst you can do in that situation under any but the most extreme circumstances is break even when compared to making 2 arrays which will each have 1/2 the throughput of the single large array.

    jg

     

     

  • Jo : Thanks for the reply , i have SP3 patch installed for SQL server.

    The Disks write are ok but the disks read are horrible , i have tried Microsoft Support ( there is a long paused waiting for more than hour now....)

    I have added the two  counters you mentioned in my performance logs.

    Will gathering LogicalDisk\%idle time help identify the problem ?

     

  • No clue, I'm total new to SAN'S.

    Does HP mayhaps have new firmware/software updates?

  • Have you come up with any physical disk IO statistics? 

  • I observed similar problems with an MSA1000 and 2 compaq servers (both 4gb ram, 1 had 2cpu the other 4cpu). The performance degraded tenfold when both servers were accessing databases on the same array. If you have both instances running (and they are both active production not devel+prod instances) then sql may be competing with itself to access the arrays. I never got round to taking the 2TB off and creating 2 seperate arrays, one for each server/instance so i dont know if it would help but it might be worth considering.

  • Hi JB: I only have one server accessing the MSA 1000 but have two sql instances running. I upgraded the MSA Firmware (the latest) , updated HBA drivers for QLogic and installed Storport patch for Windows 2003. But the performance stands at the same place.

    The system is in production and i cant do any config changes now.

    We are looking into the application queries and trying to tune them.

    Is there any othe options i can try ?

     

  • I'm not familiar with HP SAN, but I've worked with EMC, Clarion and Hitachi quite a bit (we currently use EMC). I was wondering about a few of things that might affect performance:

    • How much cache does your SAN have ? Can you increase it ?
    • Are you using dual (2) QLogic HBAs in the server or just one ?
    • If you have dual HBAs are they duel channeled and duel pathed ?

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • Cache on the SAN is 256 MB , with 60/40 Read /Write Ratio.

    We are using Daul QLogic HBA cards for two controllers on the SAN , but they are not for load balancing purpose.

    Can you elaborate how can i check if HBA are dual channeled and duel pathed ?

    Thanks for your reply

     

  • Cache on the SAN is 256 MB ????

    What can you upgrade it to ? If the max cache supported is 8 Gb, I'd at least put in 4 Gb. 256 Mb seems infinitesimal small. As for the HBAs, they are dual channel now whether dual pathing is enabled is dependent on you fabric, fabric switch, SAN SP (storage processor) and SAN LUN configurations.

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply