[jboss-jira] [JBoss JIRA] (JGRP-1432) OutOfMemoryError in GMS

Wed Feb 29 06:49:37 EST 2012

    [ https://issues.jboss.org/browse/JGRP-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672010#comment-12672010 ] 

Peter Nerg commented on JGRP-1432:
----------------------------------

I have performed a heap dump just after the OOM and there is nothing to be found in the dump.
The issue is always on the same spot (same stack trace) so it cannot so it cannot be a coincidence.

Since the stack trace always seems to be on the same spot, i.e. where GMS receives an UP event I'm wondering if this possibly has something do with the intermittent printouts we see:
2012-02-21 04:39:28,949 [ Timer-2,<ADDR>] WARN [org.jgroups.protocols.FILE_PING] failed reading 9875802e-272a-0bcc-d1db-466d80f188b2.node: removing it

It seems the members loose contact and then re-join/merge and it can take quite some time to stabilize.
I can't figure out why the nodes loose contact every now and then. We're running blades in the same chassi with Gb ethernet so the network connection should not be the cause of these warnings.

Also we have a number (5-6) of Jgroups stacks that share the same TCP connection using singleton config on the TCP.
I've added the config file we use to setup JGroups.

> OutOfMemoryError in GMS
> -----------------------
>
>                 Key: JGRP-1432
>                 URL: https://issues.jboss.org/browse/JGRP-1432
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 2.12.2
>         Environment: Modified SLES
>            Reporter: Peter Nerg
>            Assignee: Bela Ban
>         Attachments: tcp-fileping.xml
>
>
> When running in a cluster with only two nodes we every now and then see issues that JGroups fails to start a thread due to OOM.
> The stack trace always points to the same place hence so it should rule out any other part of the application.
> Also taking a heap dump immediately after the OOM yields no obvious cause to the OOM.
> It makes we wonder if there is a scenario where JGroups goes wild and starts to create lots of threads.
> The stack trace looks like this (often a number of OOM exceptions in a row)
> 2012-02-21 08:56:52,679 [     OOB-1,null] ERROR [org.jgroups.protocols.TCP] failed handling incoming message
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:640)
>         at org.jgroups.protocols.pbcast.GMS$ViewHandler.start(GMS.java:1297)
>         at org.jgroups.protocols.pbcast.GMS$ViewHandler.add(GMS.java:1260)
>         at org.jgroups.protocols.pbcast.GMS.up(GMS.java:801)
>         at org.jgroups.protocols.VIEW_SYNC.up(VIEW_SYNC.java:170)
>         at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246)
>         at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:703)
>         at org.jgroups.protocols.BARRIER.up(BARRIER.java:101)
>         at org.jgroups.protocols.FD.up(FD.java:275)
>         at org.jgroups.protocols.MERGE2.up(MERGE2.java:210)
>         at org.jgroups.protocols.Discovery.up(Discovery.java:294)
>         at org.jgroups.stack.Protocol.up(Protocol.java:413)
>         at org.jgroups.protocols.TP.passMessageUp(TP.java:1109)
>         at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1665)
>         at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1647)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> The above stack trace is often preceded by the following printout:
> 2012-02-21 04:39:28,949 [ Timer-2,<ADDR>] WARN  [org.jgroups.protocols.FILE_PING] failed reading 9875802e-272a-0bcc-d1db-466d80f188b2.node: removing it

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira