It's hard to tell cause from effect in this kind of situation. Your log shows node1
being suspected by node2 and properly starting the process of closing down the channel to
rejoin. Then a few ms later the vm runs out of memory.
Most likely whatever was going on that eventually led to the OOME was also making node1
unresponsive enough that node2 suspected it.
The question is why the OOME occurred. First, it's *extremely* unlikely the process of
handling the suspicion and closing the channel is itself what caused the OOME. Second, the
fact that UDP is what threw the OOME doesn't really mean it or JGroups was the
underlying cause -- it just means UDP was the code trying to allocate an object when the
heap was finally out of space.
JGroups 2.4.1.SP3 has an improvement to the FC (flow control) protocol that prevents an
OOME condition that could occur when the channel is running under sustained overload.
That may help. But, IMHO the odds are pretty low that that was the cause of your OOME.
You're better off trying to profile your application to confirm you have no memory
leaks.
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4042805#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...