Production Server Random Crashing - Need help!

  • We have a production SQL Server 2000 SP3a install that has hung and forced us to reboot 9 times since July 2006.

    Typically the first and second error generated on the server in the Application Log are:

    Error: 17882, Severity: 18, State: 1

    Error accepting connection request via Net-Library 'SSNETLIB'. Execution continuing..

    Error: 17059, Severity: 18, State: 0

    Operating system error 10055: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full...

    After those messages occur, we are unable to connect to SQL Server with a new connection, but typically the live connections are left unharmed.

    Usually about 30 seconds after this message we get the following System Warning regarding our dual HBA connection to the SAN:

    Event Type: Warning

    Event Source: rdacdisk

    Event Category: None

    Event ID: 801

    Computer: [Server]

    Description: Failover succeeded to Server-D280-Production:1:0:0.

    Data:

    0000: 00 00 00 00 05 00 56 00   ......V.

    0008: 00 00 00 00 21 03 04 80   ....!..?

    0010: 21 03 00 00 00 00 00 00   !.......

    0018: 00 00 00 00 00 00 00 00   ........

    0020: 00 00 00 00 00 00 00 00   ........

    When the server comes back up, we get a warning from Quest Foglight saying that the max configured memory is greater than the max physical memory on the server, our minimum is set to 0.  Sure enough, if I run sp_configure it's showing that the max config_value (and max run_value) of memory is 5887, whereas the physical memory is only 3839.  Now, honestly I don't know why this would matter since the OS doesn't have the memory available to give to it, but perhaps it's causing the problem?  BTW, when I go into Enterprise Manager in the Memory settings it is showing a max of 3839, so I don't know how or where that 5887 is coming from.

    Since July the server has had the SSNETLIB error 8 times.  The server has also had the rdacdisk failover on all of those instances plus one time where we didn't have the 'SSNETLIB' error, even though we did lose connectivity to the server.

    There are two general times that this error is occuring, though two times it has been random.  Again, it isn't consistently happening at these times, but there does apper to be a couple of patterns:

    1) (3 Times) 2:30AM - which is when I do a nightly (M-F) 100% stats update

    2) (4 Times) 6AM before DST and 7AM after DST on a Sunday (Not sure if DST had anything to do with it.) - Which doesn't have anything scheduled on the SQL side

    3) (2 Times) Random (one of which didn't generate the SSNETLIB error)

    We are looking to start testing this application on SQL 2005 in a few weeks with a rollout to production in around May.  They would prefer not to upgrade to SP4 since that would require the same testing cycle as upgrading to SQL 2005.  We also do not want to remove the /3GB flag in the boot.ini file because we would lose a GIG of usable RAM.  BTW, this is running on a 2003 Enterprise Edition of Windows Server.

    We are about to open a call with Microsoft since this is a mission critical application, but I wanted to see if anyone out here has any ideas.  We can't bring the system down during the day to make any changes and since the problem doesn't occur with any regularity, troubleshooting is one of those, "Try this and wait to see if it crashes again.  If it does, try the next thing."

     Please let me know if you have any thoughts or ideas that we should explore.

    Thanks!

    Mike

  • We don't use sp_OACreate, so the first article you mentioned doesn't look like it applies.  I tried going to your second link but it was blocked by our Firewall.  Trying to get the information another way, though...

  • -- exec sp_configure 'show advanced options',1

    -- reconfigure

    -- go

    - Can you give us the result of exec sp_configure ?

    - are the windows eventlogs showing any anomalities ?

    - maybe http://support.microsoft.com/kb/815209 can guide a bit ..

    - http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=562047&SiteID=1 proposes the unwanted removel of /3gb.

      IMO you should limit the max memory usage for sqlserver to 3Gb. This way you get 300Mb extra as without the 3gb switch, but gowing over th 3Gb may get you short on windows system resources when you only have 3.8Gb. You should at least leave 500Mb to windows !

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • I'm interested in how you have memory configured ?  if you run sp_configure what does it return for min and max memory. I used to be ok with the 3gb switch but i've learned that it can give problems.  How much memory does your box have in total ?

     

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • From your second link:

    "...Since you are using PAE, it is not recommended you still turn on /3GB switch.

    As we know, using /3GB will cause the OS to have only 1GB memory for system use. The paged pool and non-paged pool size will be half the default size. These memory pools are used by OS to store network buffers, etc. Thus, when /3GB is turned on, it is more likely that OS will run out of non-paged pool and be unable to serve socket calls.

    OleDB Provider relies on socket calls to communicate with the remote server. In this case, you may see the error.

    As you are using /PAE, the user application already can use the additional 4GB memory. It is recommended to turn off /3GB so that OS could run more smoothly."

    Excellent stuff.  If we have 4Gb on the server but don't use the /3GB flag, won't we only have 2GB of ram available to the DB since the OS will use the other 2GB?

     

    Here's our sp_configure:

    name minimum maximum config_value run_value

    affinity mask -2147483648 2147483647 0 0

    allow updates 0 1 0 0

    awe enabled 0 1 0 0

    c2 audit mode 0 1 0 0

    cost threshold for parallelism 0 32767 5 5

    Cross DB Ownership Chaining 0 1 0 0

    cursor threshold -1 2147483647 -1 -1

    default full-text language 0 2147483647 1033 1033

    default language 0 9999 0 0

    fill factor (%) 0 100 0 0

    index create memory (KB) 704 2147483647 0 0

    lightweight pooling 0 1 0 0

    locks 5000 2147483647 0 0

    max degree of parallelism 0 32 1 1

    max server memory (MB) 4 2147483647 5887 5887

    max text repl size (B) 0 2147483647 65536 65536

    max worker threads 32 32767 255 255

    media retention 0 365 0 0

    min memory per query (KB) 512 2147483647 1024 1024

    min server memory (MB) 0 2147483647 0 0

    nested triggers 0 1 1 1

    network packet size (B) 512 65536 4096 4096

    open objects 0 2147483647 0 0

    priority boost 0 1 0 0

    query governor cost limit 0 2147483647 0 0

    query wait (s) -1 2147483647 -1 -1

    recovery interval (min) 0 32767 1 1

    remote access 0 1 1 1

    remote login timeout (s) 0 2147483647 20 20

    remote proc trans 0 1 0 0

    remote query timeout (s) 0 2147483647 600 600

    scan for startup procs 0 1 0 0

    set working set size 0 1 0 0

    show advanced options 0 1 1 1

    two digit year cutoff 1753 9999 2049 2049

    user connections 0 32767 0 0

    user options 0 32767 0 0

  • sorry closed the original window as I was posting, as you've not got much memory you should have memory settings to dynamic with the 3gb switch, please don't try to use fixed memory settings. I'd be careful with foglight too, nice as these products are they consume resource and quest products are known for being quite hungry. I think the 3gb switch is your likely problem, I'd suggest you invest in awe and leave the lower memory alone.

    Did you get any error log messages about mem to leave ? or dbcc memorystatus dumps in the error log?

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • We have 4GB on the server.  I've been hesitant to recommend going higher than 4GB and using the /PAE flag since I've read that it can have adverse affects to performance and unfortunately I'm not in a situation where I can accurately test how this change would affect performance without Testing in Production.    The box itself can hold more RAM.

  • We do have the memory set to be dynamic currently...I don't have a configured min or max.

     

    If I had enough assurances that AWE/PAE wouldn't have a diverse affect, I'd give it a try...But...

     

    We are not getting any other messages with regards to dbcc memorystatus, etc.

  • "...Since you are using PAE, it is not recommended you still turn on /3GB switch.

    As we know, using /3GB will cause the OS to have only 1GB memory for system use. The paged pool and non-paged pool size will be half the default size. These memory pools are used by OS to store network buffers, etc. Thus, when /3GB is turned on, it is more likely that OS will run out of non-paged pool and be unable to serve socket calls.

    OleDB Provider relies on socket calls to communicate with the remote server. In this case, you may see the error.

    As you are using /PAE, the user application already can use the additional 4GB memory. It is recommended to turn off /3GB so that OS could run more smoothly."

     

    Really interesting stuff. We were recently told to do the same; put the /3gb switch onto our production database server. Our DB server also has 4gb of memory. After reading this, I am starting to wonder if I should change this because of the possible effects it may cause on our production database.

    I have room for more memory and im wondering if I should start looking into that.

    Excellent information on the /3gb PAE switch.

    -Jason

  • As always, I think each situation is going to be different.  If you aren't having any issues but need the performance boost from having more RAM available to SQL Server, then the 3GB flag should be "OK" to use.  Since I have the option of going to move RAM I have less heartburn about turning if off since I can go to 6GB or 8GB of RAM on the server...Keep in mind that with SQL 2000 you have to have Enterprise Edition running to use more than 2GB anyway...So, if you are using SQL 2000 Standard Edition your memory use is probably at around 1.8GB and removing that flag will only make the OS run better, which would make that a smart change to make.

     

    Another thing this is making me consider for my system is going to 64-bit...Then I wouldn't have to worry about PAE/AWE.

  • I run servers with up to 32gb ram currently, with no worries. With 4gb ram you have two options, set the 3gb switch and leave memory dynamic. set pae switch enable awe and set max memory to 3gb. Yes both work but in different ways. You can use the 3gb switch and pae with awe up to 16gb of installed memory, above 16gb you must turn off the 3gb switch.

    The issue with the 3gb switch isn't so much o/s memory ( which probably needs no more than around 128mb ) but out of sql server processes, dts, xml stuff, sqlmaint.exe , etc. these eat into the other memory and can cause problems - but not very often. I have run a 4gb server with sql taking 3.5gb of ram without any issues - but as always it just depends.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • We are switching the test environment over tonight to disable the /3GB flag, enable PAE/AWE and will have 6GB installed.

    If things go well we'll do it to production tomorrow night or this weekend, depending on how things go.

  • - check out AWE Memory SQL Server Performance Tuning Tips how to handle the /3Gb /pae switches !

    - max server memory (MB) 4 2147483647 5887 5887

    Someone did set the max servermemory to 5.8Gb, so that's why you are getting the warning.

    If you want to avoid the warning, set it back to a lower figure (easiest with EM) and then set it to back to whatever is your max

    or just execute from QA

    exec SP_CONFIGURE 'max server memory', 2147483647

    RECONFIGURE

    GO

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • note that using awe disables dynamic memory, YOU MUST set the max memory option, DO NOT use EM,  script it

    e.g.

    exec dbo.sp_configure 'min server memory',3750

    exec dbo.sp_configure 'max server memory',3750

    exec dbo.sp_configure 'awe enabled',1

    reconfigure with override

    go

    The setting is in MB remember the 1024 multiplier.

    run this script afterwards to check actual memory being used.

    select counter_name,cntr_value from dbo.sysperfinfo with (nolock) where counter_name like '%server memory%'  

    Note that BOL is incorrect in its description of memory changes with awe, although you can make the settings change awe ignores it/them. I sometimes wish EM did not have the ability to make server config changes, the number of issues I've fixed beacuse of incorrect config - still it keeps me in work so I shouldn't really complain!

     

     

     

     

     

     

     

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply