[jboss-jira] [JBoss JIRA] (JGRP-1667) OutOfMemoryError - messages are piling up

Wed Jul 24 08:24:26 EDT 2013

Matthew Lowe created JGRP-1667:
----------------------------------

             Summary: OutOfMemoryError - messages are piling up
                 Key: JGRP-1667
                 URL: https://issues.jboss.org/browse/JGRP-1667
             Project: JGroups
          Issue Type: Bug
    Affects Versions: 2.12.3
         Environment: Windows 7
            Reporter: Matthew Lowe
            Assignee: Bela Ban
            Priority: Critical

One of our customers encountered OOME in their production running 5 node Infinispan cluster. Crash happened after ~month of runtime. 

Stacktrace:

Timer-19,_threadNameOmmitted_32726 tid=188 [RUNNABLE] [DAEMON] <--- OutOfMemoryError happened in this thread
java.lang.OutOfMemoryError.<init>()
org.jgroups.blocks.TCPConnectionMap$TCPConnection.send(byte[], int, int)
org.jgroups.blocks.TCPConnectionMap$TCPConnection.access$100(TCPConnectionMap$TCPConnection, byte[], int, int)
org.jgroups.blocks.TCPConnectionMap.send(Address, byte[], int, int)
org.jgroups.protocols.TCP.send(Address, byte[], int, int)
org.jgroups.protocols.BasicTCP.sendUnicast(PhysicalAddress, byte[], int, int)
org.jgroups.protocols.TP.sendToSingleMember(Address, byte[], int, int)
org.jgroups.protocols.TP.doSend(Buffer, Address, boolean)
org.jgroups.protocols.TP.send(Message, Address, boolean)
org.jgroups.protocols.TP.down(Event)
org.jgroups.protocols.Discovery.down(Event)
org.jgroups.protocols.TCPPING.down(Event)
org.jgroups.protocols.MERGE2.down(Event)
org.jgroups.protocols.FD_SOCK.down(Event)
org.jgroups.protocols.FD.down(Event)
org.jgroups.protocols.VERIFY_SUSPECT.down(Event)
org.jgroups.protocols.pbcast.NAKACK.down(Event)
org.jgroups.protocols.UNICAST.retransmit(long, Message)
org.jgroups.stack.AckSenderWindow.retransmit(long, long, Address)
org.jgroups.stack.DefaultRetransmitter$SeqnoTask.callRetransmissionCommand()
org.jgroups.stack.Retransmitter$Task.run()
org.jgroups.util.TimeScheduler2$MyTask.run()
org.jgroups.util.TimeScheduler2$Entry.execute()
org.jgroups.util.TimeScheduler2$1.run()
java.lang.Thread.run()

When I take a look into memory dump, I can see that there is ~470MB of retained heap space held by instance "org.jgroups.blocks.TCPConnectionMap$TCPConnection$Sender". Most memory is reatained by instances of "byte[]". Therefore, I'm assuming that messages are somehow piling up and eventually causing OOME, but I don't know what conditions might trigger such behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira