Server Crashing

  • We have a SQL database behind a couple of heavily hit websites. Despite rewriting 90% of the SP's, and 100% of the data refresh processes, the DB server will still fail daily - grinding to an absolute halt. Trouble is you can't get on the server to look at activity etc - you have to stop the SQL services from another box to even get logged on (we have to use Terminal Services as the boxes are remote). We have 2 such setups, and both suffer from the same issues. It is as if there is a build up of something that eventually tips the box over the edge.

    We have had locking issues in the past, but have cleared up and simplified the workings to get round this - the sites are much faster now. All pages are ASP, using ADO and Command objects to exec the SP's that provide the data for the pages. All pages have been rewritten to pull the data into arrays (GetRows) so the connections etc can be closed off ASAP.

    We use a lot of randomly generated data which is done using #temp tables in the SP's to produce the random selections. Are temp tables something which cause a lot of SQL overhead?

    Anyone had similar problems, or any ideas?

  • Depends. Sorry, it really does depend on how many temp tables, how big (column size), how many rows get added, if you're indexing, etc. I'd recommend running a fairly lightweight server side trace for a couple hours to get a sampling of what is costing you the most. What version of SQL and what kind of hardware?

    Andy

    http://qa.sqlservercentral.com/columnists/awarren/

  • Sounds like a resource issue. Usually if the box grounds to a halt using QA with sp_who will be the only thing that runs unless you have really maxed it out.

    However to check for specific events that lead up to this you should use a blackbox trace (yes like a flight data recorder).

    SQL 2000

    quote:


    Use sp_trace_create with the TRACE_PRODUCE_BLACKBOX option to define a trace that appends trace information to a blackbox.trc file in the \Data directory. Once the trace is started, trace information is recorded in the blackbox.trc file until the size of the file reaches 5 megabytes (MB). The trace then creates another trace file, blackbox_1.trc, and trace information is written to the new file. When the size of blackbox_1.trc reaches 5 MB, the trace reverts to blackbox.trc. Thus, up to 5 MB of trace information is always available.


    SQL 7

    quote:


    xp_trace_setqueryhistory (T-SQL)

    Specifies the enabling or disabling of query history using the SQL Server Profiler extended stored procedures. Query history consists of a trace of the last 100 events that occurred before a high-severity SQL Server exception. Query history traces SQL:BatchStarting, RPC:Starting and Exception events and captures all data columns. When an exception of severity 17 or greater occurs, the query history is written to the file Blackbox.trc in \Mssql7\Log.

    Syntax

    xp_trace_setqueryhistory [[@local_variable =] value]

    Arguments

    @local_variable

    Is a local variable of bit data type.

    value

    Is whether query history is enabled (1) or disabled (0). When query history is enabled, SQL Server captures the last 100 SQL Server events. If value is not specified, SQL Server returns the current value for xp_trace_setqueryhistory.

    Return Code Values

    0 (success) or >1 (failure)

    Result Sets

    If value is specified, xp_trace_setqueryhistory returns this message:

    The command(s) completed successfully.

    If value is not specified, xp_trace_setqueryhistory returns this result set.

    Column name Description

    QueryHistoryConfigValue Whether the query history trace is configured to start on start up:

    1 = True

    0 = False

    QueryHistoryRunValue Whether the query history trace is running:

    1 = True

    0 = False

    Permissions

    Execute permissions for xp_trace_setqueryhistory default to members of the sysadmin fixed server role but can be granted to other users.


    The only thing to note is you will see usually exceptions that you want invesitigate what happend right before and immediately after them.

    "Don't roll your eyes at me. I will tape them in place." (Teacher on Boston Public)

  • Sorry - I know that was a bit vague!!

    We are using SQL 2000 Standard on Win 2K Advanced - all patched to date. It all runs on a box with dual PIII 1Ghz and about 4Gb of RAM - set up on RAID 5 array. The actual sites are running on a seperate box.

    The temp tables are all very small - used to sort the data randomly, and then dropped again fairly quickly. I don't apply any indexes to them or anything like that.

    What do you mean by a resource issue? Is it building up locks or something?

  • Could be Memory issues with the number of transactions taking place. It could be hard drive contention (rarer these days but depending on cofiguration it still does happen). Could be a CPU bottleneck or even the NIC causing the lockups.

    Memory and CPU should be fairly easy to look at by going to the server the TS and view thru perfmon. Check out http://www.sql-server-performance.com/performance_monitor_tips.asp and http://www.sql-server-performance.com/performance_monitor_counters.asp for some great info on what to look for using perfmon. You can even set to record a log until happens again. Now as for the NIC you could have a large backlog of packets going one way or the other and the CPU can bottleneck from this. If you have set to auto detect parameters such as duplexing, change to the actual duplexing of the network and see if relieves any (note this may not help any but it is a spot to check anyway). You may also find that some tweaking of the OS might help such as opening the system properties. Right click My Computer icon and propeties, goto the advanced tab and click performance option button. There make sure set to optimize for background services. Also, make sure the swap file is at least 2.5x the actual memory size on you fastest drive not on C: drive partition or on the dirves where SQL log or data files reside.

    Now the whole temp table thing. If the tables are fairly small they may reside in memory more so than being switched to the tempdb or the page file. If a lot they will add up to memory overhead. You may have too many transactions for your current configuration and methods to work properly.

    "Don't roll your eyes at me. I will tape them in place." (Teacher on Boston Public)

  • Thanks for this - I will start to monitor more closely. It's difficult because the server is several hundred miles away, and we access via Terminal Services. Also - it always seems to crash in the evenings when we are at home so it's difficult to keep watch at all times.

    When the server locks up, the processor is maxed out at 100% usage, and doesn't fluctuate. Stopping and starting the SQL Server services usually frees it back up, but we always reboot to make sure all is cleared out.

  • Don't know if you read the thread on service packs but I had a very similar problem. The server would run at 100% for no apparent reason. Our Server is used to run a Business Objects data warehouse with some of the users accessing it via a web client. At the time the only thing I could think of as a possible cause was that we had recently applied service pack 2. What service pack are you running?.

    I've since monitored the server as suggested but have been unable to find any problems, equally the server hasn't crashed either. Just as another thought are your users logging out of the database properly as this might cause the connections to remain open?

  • If it occurrs around a specific time then look at and consider processes/tasks run at that time or shortly before. Even things such as a bad driver for a backup device can be the root cause (happened to us, the device would lock the whole machine but it was easy to pinpoint since backup was all that occurred at the time it would happen). You just have to pinpoint what it is that should be running at that point and logically test and eliminate each.

    "Don't roll your eyes at me. I will tape them in place." (Teacher on Boston Public)

  • If you have an idea of what time period you usually get hammered, you could always create a trace of what's going on. You can schedule a trace to begin before the time you typically have issues and write the information to a trace file via a SQL Server job. Perhaps you can capture the statement(s) which are processing at the time the processor utilization shoots up to 100%.

    You can get a time correlation if you're running counter logs on your server. Check things like available memory, processor utilization, the various processor queues, pages/sec, amount of memory SQL Server is using, etc.

    K. Brian Kelley

    http://www.truthsolutions.com/

    Author: Start to Finish Guide to SQL Server Performance Monitoring

    http://www.netimpress.com/shop/product.asp?ProductID=NI-SQL1

    K. Brian Kelley
    @kbriankelley

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply