Urgent Question re. SQL 2005 Performance: Replication vs Clustering

  • Hi,

    I have inherited the following scenario:

    A website that is running on IIS6, ASP.Net system with backend on SQL 2005.

    IIS is on one server, SQL on another, both 64-bit servers with 16GB Ram. The IIS is 2 dual-core processors, the SQL box is 2 quad-core processors.

    The problem is that both the application AND the database were not designed, ever. There was no DBA, and no project manager. Developers just did whatever they liked to get things working. AGGHHH!

    The CEO recognizes that the entire application needs to be re-written from the ground-up, and that is goind to take at least a year. In the meantime, I have to improve performance.

    Based on my initial analysis, it would take months of work to tune queries, re-write Stored Procs, tune indexes etc. so management want a QUICK FIX. So they've already decided to add another IIS Server to help the front end cope, but the database is really struggling.

    As I see it there are only 3 options:

    1) Increase the DB hardware spec (ie. add more RAM to 32GB, and add 2 more processors)

    2) Transactional Replication and split the application so that half of the traffic goes to one database server and half to the other (both servers would have to be publisher/subscribers - ie. peer-to-peer replicatin).

    3) Clustering

    What do people think I should do?. Currently (2) is the favourite option since we have a 2nd server available but I've never done replication before so don't really know what kind of performance improvement to expect (obviously there is an overhead). I know it's normally used for High Availability (which they also want!) but as I understand it, performance will be improved because we would be splitting the front-end traffic across 2 boxes.

    Thanks,

    Richard

  • Hi

    Can you give more details around the performance issues, have you identified that there are too many transactions per second, causing other queries / sessions to wait?

    Have you identified that IIS is also a bottleneck that you want to create a second IIS server.

    Is the Database performance - maybe related to long running queries maxing out the CPU?

    What sort of Disk Drives are used? and what configuration.

    Is your Database, Log and Temp DBs split across different spindels?

    Let us know

    Thanks

    Kevin

  • Firstly, there is no quick fix for performance. If the DB and queries are badly designed, adding hardware won't help much, if at all.

    Replication may help, but note that replication (even transactional) is not instantaneous and as such the two databases may get out of sync. If you're considering 2-way replication, it can get very messy. You'll have to consider what happens if two different user connections to the two servers change the same row at the same time. Which one wins?

    Plus there's the load that the replication will add to the system to consider. It you want minimal impact, you'll need a third server to be the distributor.

    Clustering is not a solution to performance. It's a high-availability solution. If you cluster the server you will still only have one active instance of SQL. It's not RAC.

    My suggestion to you is to run some profile traces and look for the 3-5 worst performing stored procedures. Go through them, rewrite, add indexes, etc. Fixing just the top 3 worst performing procs often has a much higher impact than expected.

    Once you've fixed those, take traces again, find the top 3, fix, repeat.

    It's the methodology I've used for perf tuning for the last few years. Tackling the entire DB in one go is often not possible and often too much work. An iterative process is much better.

    Lastly, if the boss is screaming, ask him if you can get someone in to help out.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Agreed with Gail, you can spread the load with replication effort but that has to be done very carefully and well architected. I have only seen a few environments where that is done correctly.

    A quick way to get some of the worst performing queries is by using the Performance Dashboard reports that MS put out with 2005. You should download and install them. They're not perfect and you still have to do some work around them but they will help you pick out some of the worst offending queries and some potential missing indexes. Be careful in just taking what is there and making changes. Review the information and start digging a bit deeper. The link is - http://www.microsoft.com/downloads/details.aspx?familyid=1d3a4a0d-7e0c-4730-8204-e419218c1efc&displaylang=en

    You can also do some searches on the MS site as there is a great white paper on troubleshooting performance issues with 2005 that provides some great queries for the sys.dm views. Another great place to start.

    As already stated, this is going to be an effort but chances are you will be able to find 10 or more quick hits that will bring things down quickly if it really is application issues. You'll be surprised. 🙂

    I'll try to remember to post the link to the white paper when I get into work later. Try.....

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • Thanks for the replies. They already had a performance guru in for 2 weeks and he did some of the things you mentioned, but it was not enough.

    I suspected that Clustering would not help, so thanks for confirming that.

    The application is very easy to split. The idea of using replication was to basically provide two database servers and to split the users 50/50 - once a user logs in, all their requests get sent to only one of the servers, and always the same server. So in fact we could probably use merge replication to minimize the overhead.

    There simply isn't time to tune any further. We need a solution implemented within the next 5-10 days, because the user base is going to increase about 30-40% in the next month.

    The solution has got to be throwing more hardware at the problem asap. After that we will have time to go back to tuning further.

    So, do you think option (1) or (2) is better. Since we have an identical server sitting here doing nothing, we came up with the idea to replicate to it.

    Everything I read about replication says it is useful for scaling out an application to increase performance. I don't see why it won't work (yes I know it's complicated but I am confident in my ability to learn it and implement it).

  • Richard McSharry (9/11/2008)


    Hi,

    I have inherited the following scenario:

    A website that is running on IIS6, ASP.Net system with backend on SQL 2005.

    IIS is on one server, SQL on another, both 64-bit servers with 16GB Ram. The IIS is 2 dual-core processors, the SQL box is 2 quad-core processors.

    The problem is that both the application AND the database were not designed, ever. There was no DBA, and no project manager. Developers just did whatever they liked to get things working. AGGHHH!

    The CEO recognizes that the entire application needs to be re-written from the ground-up, and that is goind to take at least a year. In the meantime, I have to improve performance.

    Based on my initial analysis, it would take months of work to tune queries, re-write Stored Procs, tune indexes etc. so management want a QUICK FIX. So they've already decided to add another IIS Server to help the front end cope, but the database is really struggling.

    As I see it there are only 3 options:

    1) Increase the DB hardware spec (ie. add more RAM to 32GB, and add 2 more processors)

    2) Transactional Replication and split the application so that half of the traffic goes to one database server and half to the other (both servers would have to be publisher/subscribers - ie. peer-to-peer replicatin).

    3) Clustering

    What do people think I should do?. Currently (2) is the favourite option since we have a 2nd server available but I've never done replication before so don't really know what kind of performance improvement to expect (obviously there is an overhead). I know it's normally used for High Availability (which they also want!) but as I understand it, performance will be improved because we would be splitting the front-end traffic across 2 boxes.

    Thanks,

    Richard

    Hi Richard,

    Ok, as Gila Monster has already mentioned, there is no quick fix to performance as such. Throwing hardware is not the best strategy, as it will only "hide" the underlaying problem(s). You may want to approach in a modular fashion, focusing on your key performance bottlenecks one at a time for database tier to application tier to presentation tier, as this provides you with a focused and measured effort.

    I would plan using the standard 3-tier model and work from DB backend to:

    1. Identify performance bottleneck(s)

    2. Re-code and optimize queries into storage procedures, where possible

    3. Split db activity, where appropriate.

    What are you current performance challenges?

    Thanks,

    Phillip Cox

  • I have not had a chance to investigate too much, but do know that long running queries are maxing out the CPU.

    Someone else identified that IIS was under pressure, hence the 2nd IIS server.

    Disk I/O is less of an issue:

    Log and Temp are on same disk (RAID 1).

    Data on separate disk (RAID 5).

    I will probably move Tempdb to its own RAID 1 array also.

  • Richard McSharry (9/11/2008)


    I have not had a chance to investigate too much, but do know that long running queries are maxing out the CPU.

    Someone else identified that IIS was under pressure, hence the 2nd IIS server.

    Disk I/O is less of an issue:

    Log and Temp are on same disk (RAID 1).

    Data on separate disk (RAID 5).

    I will probably move Tempdb to its own RAID 1 array also.

    Hi Richard,

    Ok, first thing is to identify things such as long running or CPU intensive queries using sql profiler or query below.

    --This script will help to identify CPU intensive queries---

    SELECT TOP 10 SUBSTRING(text,

    (statement_start_offset/2) + 1,

    ((CASE statement_end_offset

    WHEN -1

    THEN DATALENGTH(text)

    ELSE statement_end_offset

    END - statement_start_offset)/2) + 1)

    AS query_text, *

    FROM sys.dm_exec_query_stats

    CROSS APPLY sys.dm_exec_sql_text(sql_handle)

    CROSS APPLY sys.dm_exec_text_query_plan

    (plan_handle, statement_start_offset,

    statement_end_offset)

    ORDER BY total_worker_time/execution_count DESC;

    --This script will help to identify long running queries--

    -- get connection information if last_batch time is older that 60 minutes

    select spid, ecid, loginame, last_batch, cpu, physical_io,

    DATEDIFF(mi, last_batch, GETDATE()) as minSinceLastComand

    into #pass1

    from master.dbo.sysprocesses with (nolock)

    where loginame <> 'sa'

    and DATEDIFF(mi, last_batch, GETDATE()) > 60

    -- pause for 10 seconds to detect any cpu or IO usage

    WAITFOR DELAY '000:00:10'

    -- get connection information if last_batch time is older that 60 minutes

    select spid, ecid, loginame, last_batch, cpu, physical_io,

    DATEDIFF(mi, last_batch, GETDATE()) as minSinceLastComand

    into #pass2

    from master.dbo.sysprocesses with (nolock)

    where loginame <> 'sa'

    and DATEDIFF(mi, last_batch, GETDATE()) > 60

    -- display long running connections

    select p1.*

    from #pass1 p1

    inner join #pass2 p2

    on p1.spid = p2.spid AND

    p1.ecid = p2.ecid AND

    p1.loginame = p2.loginame AND

    (p1.cpu <> p2.cpu OR

    p1.physical_io <> p2.physical_io)

    -- display idle connections

    select p1.*

    from #pass1 p1

    inner join #pass2 p2

    on p1.spid = p2.spid AND

    p1.ecid = p2.ecid AND

    p1.loginame = p2.loginame AND

    p1.cpu = p2.cpu AND

    p1.physical_io = p2.physical_io

    drop table #pass1

    drop table #pass2

    I would suggest you run profiler and place a 30 second filter on duration column to identify all you long running queries to establish why they run so long. As you'll know, a long running query on an OLTP system is not ideal, but is fine on a decision support system. To that note, you may want to look at spliting your reporting requirements out from core oltp activity using either replication or log shipping.

    If you can split tempdb onto its own raid volumn, this is a good start. In addition, you'll need to understand benefit of your data volumn running on RAID 5 and whether this is providing what you need in terms of performance.

    In regards to IIS, can you provide more details as to why its considered to be under stress?

    Again, break your challenge into smaller bits.

    Thanks,

    Phillip Cox

  • You are asking the forum’s readers to do the impossible thing – to help improve performance of a database that we know nothing about. If tuning the database and improving its performance was that easy, most of us would have to find a new profession. Both approaches might help and might not. For example if the database’s size is less then 16GB and this is the only database on the server, then adding memory to the server will not help at all. On the other hand if the database is much larger and most of the queries end up doing physical reads then adding memory would help. The same argument can be done with your second approach. If the database has about 98% of read operations and about 2% of write operations then your replication approach might help, but if the database has about 98% of write operations that there is a good chance that you’ll only heart performance because the 2 databases will be busy synchronizing themselves (not to mention that the synchronization takes some time and at each giving time you’ll have different data at both databases). The point is that no one here knows anything about the database so I don’t think that anyone here can tell you which one of the 2 options will be more beneficial for you. The only advice that I can give you is the one that you already got – Instead of trying to find a way around the problem, investigate what causes most of the problems and start fixing it by creating correct indexes and by rewriting the code.

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • To everyone has replied, I really do appreciate the info.

    I have only been here a week so really have not had time to investigate properly. I totally understand that proper investigation and tuning is required, and I will get to that.

    I will post more info. about the database when I have time, hopefully later today.

    I have no idea why/who decided about the IIS server and what they used to decide that.

  • Merge replication != minimal overhead. In my experience, it's the type of replication that puts the most load onto the servers. It sounds like what you want is peer to peer transactional replication. It should be adequate, unless you're expecting to need complex conflict resolution then you may need to use merge.

    Don't underestimate replication. It's not that easy to set up if you've never done it before. If you can, try it out somewhere before implementing it on the prod servers.

    Here's a good article on peer-peer replication

    http://msdn.microsoft.com/en-us/library/ms151196.aspx

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi - I agree with Gila.

    Find the top 3 queries in the next few hours / days. Then run them through DB tuning wizard on your DEV or offline DB and see if there is any suggested improvement.

    If an index can improve a query by 60% or even 90% - then it is worth investigating (it is not a silver bullet but in your situation it may be a quick fix that helps out the current crisis).

    I too feel that IIS or hardware will not help the amount you are hoping it will (with the info I have).

    You have mentioned that long running queries are maxing out the CPU - maybe investigate the following after indexes:

    * Number of CPU's & what setting is your degree of parallelism and cost threshold set to?

    * Is there a plan cache for this long running query?

    * Can you test the plan that is in the cache and is it making use of the best plan for performance?

    Let us know

    Thanks

    Kevin

  • Adi Cohn (9/11/2008)


    You are asking the forum’s readers to do the impossible thing – to help improve performance of a database that we know nothing about. If tuning the database and improving its performance was that easy, most of us would have to find a new profession. Both approaches might help and might not. For example if the database’s size is less then 16GB and this is the only database on the server, then adding memory to the server will not help at all. On the other hand if the database is much larger and most of the queries end up doing physical reads then adding memory would help. The same argument can be done with your second approach. If the database has about 98% of read operations and about 2% of write operations then your replication approach might help, but if the database has about 98% of write operations that there is a good chance that you’ll only heart performance because the 2 databases will be busy synchronizing themselves (not to mention that the synchronization takes some time and at each giving time you’ll have different data at both databases). The point is that no one here knows anything about the database so I don’t think that anyone here can tell you which one of the 2 options will be more beneficial for you. The only advice that I can give you is the one that you already got – Instead of trying to find a way around the problem, investigate what causes most of the problems and start fixing it by creating correct indexes and by rewriting the code.

    Adi

    I think you are missing the point of my question. I am not trying to avoid the problem. I am trying to buy me the time to fix the application database properly in all the ways suggested by everyone.

    Surely throwing more hardware at the system will buy me that time....that's what I'm trying to ask. And which is the best way to do that? Increase the server capacity, or add another server by using replication.

    However, Adi, you have helped by pointing out a few things (eg. size of DB related to RAM and % of reads vs. writes). Thanks again.

  • Thanks Gail, I read that article too and it was the one that made me think peer-to-peer replication is what I need to do.

    Please everyone understand that I am not looking for info. about performance tuning (otherwise I would have posted in that forum!).

    I am simply trying to buy some time so I can do some tuning. I can do that, I know how to do that I just don't have the time to do it!

  • Richard - If you can really split easily then that is going to be your quickest gain. Additionally, you will immediately split the IO which again is a bonus. The scenario you described is possible especially if you can guarantee that they are going to go back to the same server then that makes it even easier.

    So, start planning if that is the route that you have to go.

    I stated that I would provide a link to a tuning guide earlier. You can find it here - http://www.microsoft.com/technet/prodtechnol/sql/2005/tsprfprb.mspx

    Just a thought, you might want to contact MS. The support cost is minimal and they will be able to analyze log and trace information for you to provide you some recommendations. The cost is pretty cheap when you consider it.

    Ultimately at this point it seems like you are pretty focused on going the scale out road which is fine, just make sure that you do the tuning as well. That 30 - 40% that you are adding next week will be another 30 - 40% in another couple of months and will throttle both servers if you don't get things tuned up. Know what I mean?

    I know that there are some on this site that consult so, that may be an option as well. Just a thought. It's a lot easier to help when eyes are on things in situations like yours.

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply