701 Error: Insufficient System Memory

  • We are hitting a crippling 701 "insufficient System Memory" error intermittently in out production environment. I haven’t gotten anywhere with PSS in two weeks. The error has occurred 4 times over the past two weeks, crippling our SQL server and application each time. When the error occurs it lasts for 5 to 20 minutes, causing the app to time out, refusing new connections, and a massive slow-down of anything that is running. SQL has recovered on its own two of these times. It recovered following a Kill of hundreds of threads reporting “SEMAPHORE WAIT”. The most recent occurrence nailed all 16 processors at 100%. We were forced to issue shutdown with nowait. I have been monitoring Perfmon very closely; there are no symptoms that precede the error. Each occurrence captures a different query. Any of the queries, when run from Management Studio, complete in under a second.  DBCC MEMORY STATUS reports all memory as being in an unstressed state. The first time the error occurred there were 10 GB still available on the server.

     

    Has anyone else experienced this problem or anything similar? We don’t use linked servers or table valued functions (there are known memory bugs related to each of these items)

     

    The following server and configuration has been running in production for 6 weeks with no issues:

     

    .SQL 2005 EE SP1 Post SP1 Hotfix kb918222

    .Win 2003 SP1 (dedicated box)

    .Quad Dual Core 3GHz

    .32 GB memory

    .AWE enabled

    .No memory related flags in boot.ini

    ."Lock Pages in memory" set for SQL Startup account

    .1 Instance (default)

    .1,994 OLTP databases avg less than 100MB each

    .1,200 active user threads on average (from connection pool of avg 4,000 concurent users)

     

    Any comments would be appraciated

  • Yes, with SQL Server 2000 where a single instance has over 4,000 user databases.

    Solution was to add "-g392" to the startup parameters and then restart SQL Server.

    See http://support.microsoft.com/kb/316749

    "There may not be enough virtual memory when you have a large number of databases in SQL Server". The article's "applies to" section includes SQL Server 2005.

    P.S. On the server in question, for auditing purposes, a trace is always running that writes to a file to capture connection events, security events ( e.g. add login) and schema changes (create, alters and drops)

    SQL = Scarcely Qualifies as a Language

  • Thanks for the respones. I ran that idea past PSS the first time the error occured. They said it's not necessary in 2005. I'm not convinced that they are right

  • There are conflicting KBs for the non-buffer pool memory which results in confusion for DBAs and PSS.

    In my case, as the SQL Server was hanging every few day anyway, the risk small, so I went ahead and included the -g and on the next restart, the problem stopped occuring. This will not be the first time that PSS is wrong 😉

    The real diffuculty was I did not know about the problem and only discovered it when Windows was being restarted during the database maintenance window. The OS guy had scheduled a restart every night at 7 PM! Great fun trying to support a server that is 500 miles away.

    SQL = Scarcely Qualifies as a Language

  • Thanks again, We have plenty of memory, and agree any risk is very low. I am going to apply the parameter.

  • Not sure if it's relevant but have you looked at the underlying storage subsystems?  "SEMAPHORE WAIT" could very well indicate that you've got an disk I/O problem or other hardware problem as the system is waiting for some request/thread to complete.  Last time I saw that particular error in SQL 2000 the RAID controller in the affected server was in the process of failing and was not answering I/O requests in a timely fashion - processors were all pegged while awaiting their turn at the single busted I/O channel as well (no, I did not build that one, just got to clean up the mess).

    Joe

     

     

  • Hi Steve Goodwin ,

    I'm facing the same problem with you...

    have you found a solution for this case?

    All the specification (SQL, OS, Mem, and processor) have the same capacity like you.

    please let me know, how did you solve it..

    I'm still ask microsoft people to fix it.... but they still figure it out.. without any reasonable explanation.

  • Yes, we were running out of memory in the visible region (MemToLeave). In short the amount of space allocated to this region of memory in Win32 is limited (~1.2GB/instance). Having a very large number of active databases (1500 to 2000) consumed the majority of this memory, leaving to little to support larger multi-page allocations that we periodically are encounter (large plan optimization, backup, etc). The quick fix is to distribute the databases over multiple instance and or servers. The ideal solution is to migrate to 64 bit where this are of memory is limited only by the physical memory limitations of the server.

    As a temporray fix we redunced the number of databases on the sever from ~1900 to ~1500 and have been running error free for two weeks. We also dissabled index accounting with DBCC TraceOn(2330, -1), which effectively reduced processor utilization by 20% or better.

  • I know it's much later - but I was searching for the error - you mention 16 procs on a 4 way dual core , you should really disable HT at the bios as this can give some problems. I only have 3 databases so my error is more confusing - sp2 3161 32 bit I'll look into the -g, run into this one before but couldn't get the -g through the change board!!

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • We only have 20 databases on our server and we are getting the memory and timeout errors.

    SQL Server 2005 64-bit

    64gb ram

    We started applying step 1 in http://support.microsoft.com/kb/918483 How to reduce paging of buffer pool memory in the 64-bit version of SQL Server

    which applies hotfix 905865

    We are still getting the errors and the DBAs are insisting it is an application issue even though a number of applications experience the problem all at the same time.

    If anyone has found how to fix this I am interested. If we find a fix I will post.

    Thanks!

  • did you configure lock pages in memory?

  • No not yet, the max memory is not set either. The DBAs say they are contacting MS to try and figure out the issue. One of the other posts on SQL Server Central talks about the memory errors and how setting the max memory resolved the issues. Also some of MS KB stuff kind of says the same thing especially for 64 bit systems.

    Before that DBAs kept pointed at the app owners and told them to increase their timeouts which I find very funny :hehe: since it really has nothing to do with that at all - broken SQL Server.

    So now I get to wait and see what the DBAs and MS come up with.

  • we had this when we first went to 64bit. it wasn't in any documentation, but a blog posting by one of the db engine developers. the max memory didn't work very well, but the lock pages in memory worked

  • Excellent - thank for the info !

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply