[jboss-jira] [JBoss JIRA] (JGRP-1432) OutOfMemoryError in GMS

Wed Feb 29 09:46:37 EST 2012

    [ https://issues.jboss.org/browse/JGRP-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672138#comment-12672138 ] 

Peter Nerg commented on JGRP-1432:
----------------------------------

Sorry for not answering sooner on your post.
By accident we triggered a new error message stating an issue with creating a native thread, after which we started to look into file descriptors. 
Just as you seem to have concluded too.
As it seems we're bordering the limit of FD's which may be due to upgrade of the application server (it's eating FD's as it they were cookies).
The unfortunate in the situation is that it always was JGroups that reported the OOM and we didn't immediately do the connection to the file descriptors. 
Largely because it has not been a problem earlier.

Sorry for the inconvenience but thanks for the tips on the configuration.

> OutOfMemoryError in GMS
> -----------------------
>
>                 Key: JGRP-1432
>                 URL: https://issues.jboss.org/browse/JGRP-1432
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 2.12.2
>         Environment: Modified SLES
>            Reporter: Peter Nerg
>            Assignee: Bela Ban
>         Attachments: tcp-fileping.xml
>
>
> When running in a cluster with only two nodes we every now and then see issues that JGroups fails to start a thread due to OOM.
> The stack trace always points to the same place hence so it should rule out any other part of the application.
> Also taking a heap dump immediately after the OOM yields no obvious cause to the OOM.
> It makes we wonder if there is a scenario where JGroups goes wild and starts to create lots of threads.
> The stack trace looks like this (often a number of OOM exceptions in a row)
> 2012-02-21 08:56:52,679 [     OOB-1,null] ERROR [org.jgroups.protocols.TCP] failed handling incoming message
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:640)
>         at org.jgroups.protocols.pbcast.GMS$ViewHandler.start(GMS.java:1297)
>         at org.jgroups.protocols.pbcast.GMS$ViewHandler.add(GMS.java:1260)
>         at org.jgroups.protocols.pbcast.GMS.up(GMS.java:801)
>         at org.jgroups.protocols.VIEW_SYNC.up(VIEW_SYNC.java:170)
>         at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246)
>         at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:703)
>         at org.jgroups.protocols.BARRIER.up(BARRIER.java:101)
>         at org.jgroups.protocols.FD.up(FD.java:275)
>         at org.jgroups.protocols.MERGE2.up(MERGE2.java:210)
>         at org.jgroups.protocols.Discovery.up(Discovery.java:294)
>         at org.jgroups.stack.Protocol.up(Protocol.java:413)
>         at org.jgroups.protocols.TP.passMessageUp(TP.java:1109)
>         at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1665)
>         at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1647)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> The above stack trace is often preceded by the following printout:
> 2012-02-21 04:39:28,949 [ Timer-2,<ADDR>] WARN  [org.jgroups.protocols.FILE_PING] failed reading 9875802e-272a-0bcc-d1db-466d80f188b2.node: removing it

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira