[
https://issues.jboss.org/browse/JGRP-1432?page=com.atlassian.jira.plugin....
]
Peter Nerg commented on JGRP-1432:
----------------------------------
Sorry for not answering sooner on your post.
By accident we triggered a new error message stating an issue with creating a native
thread, after which we started to look into file descriptors.
Just as you seem to have concluded too.
As it seems we're bordering the limit of FD's which may be due to upgrade of the
application server (it's eating FD's as it they were cookies).
The unfortunate in the situation is that it always was JGroups that reported the OOM and
we didn't immediately do the connection to the file descriptors.
Largely because it has not been a problem earlier.
Sorry for the inconvenience but thanks for the tips on the configuration.
OutOfMemoryError in GMS
-----------------------
Key: JGRP-1432
URL:
https://issues.jboss.org/browse/JGRP-1432
Project: JGroups
Issue Type: Bug
Affects Versions: 2.12.2
Environment: Modified SLES
Reporter: Peter Nerg
Assignee: Bela Ban
Attachments: tcp-fileping.xml
When running in a cluster with only two nodes we every now and then see issues that
JGroups fails to start a thread due to OOM.
The stack trace always points to the same place hence so it should rule out any other
part of the application.
Also taking a heap dump immediately after the OOM yields no obvious cause to the OOM.
It makes we wonder if there is a scenario where JGroups goes wild and starts to create
lots of threads.
The stack trace looks like this (often a number of OOM exceptions in a row)
2012-02-21 08:56:52,679 [ OOB-1,null] ERROR [org.jgroups.protocols.TCP] failed
handling incoming message
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at org.jgroups.protocols.pbcast.GMS$ViewHandler.start(GMS.java:1297)
at org.jgroups.protocols.pbcast.GMS$ViewHandler.add(GMS.java:1260)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:801)
at org.jgroups.protocols.VIEW_SYNC.up(VIEW_SYNC.java:170)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:703)
at org.jgroups.protocols.BARRIER.up(BARRIER.java:101)
at org.jgroups.protocols.FD.up(FD.java:275)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:210)
at org.jgroups.protocols.Discovery.up(Discovery.java:294)
at org.jgroups.stack.Protocol.up(Protocol.java:413)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1109)
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1665)
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1647)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
The above stack trace is often preceded by the following printout:
2012-02-21 04:39:28,949 [ Timer-2,<ADDR>] WARN [org.jgroups.protocols.FILE_PING]
failed reading 9875802e-272a-0bcc-d1db-466d80f188b2.node: removing it
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:
http://www.atlassian.com/software/jira