Bela Ban created JGRP-1464:
------------------------------
Summary: TCPConnectionMap: message from different JGroups version may cause
OOME
Key: JGRP-1464
URL:
https://issues.jboss.org/browse/JGRP-1464
Project: JGroups
Issue Type: Bug
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 3.0.10, 3.1
If discard_incompatible_packets is enabled at the transport level, we discard packets from
other JGroups versions. However, this is not done when TCP is used, at the
TCPConnectionMap level.
When we read a packet from a different version, we first read the version, then the
length. If the length is garbage, and we interpret it as a long, it can be huge, leading
to an OOME.
[email from
http://old.nabble.com/ArrayIndexOutOfBoundsException-in-3.09-ts33781725.h...]
I'd like to add some additional information to this post since I am seeing this crash
the entire JVM. I've added some debugging statements to the TCPConnectionMap -
ConnectionPeerReceiver.run method to try an understand the issue and recompiled the 3.0.9
jar (I am printing the len of the DataInputStream). It appears that when an incompatible
message is received JG will discard some messages appropriately but not others. When some
of these messages make it through they end up being of enormous size 1192331780 (or ~1.1
GB) which is enough to crash the JVM due to OOM (catching the OOMError in not sufficient
in our configuration). Some are of size 0 as well... curious if they is some
signed/unsigned conversion going on here? At any rate this is prohibiting us from moving
to the latest version of JGroups in hopes of receiving better support.
2012-05-14 09:27:05,623 [Connection.Receiver [135.9.96.63:59953 -
135.9.128.31:7800],135.9.148.15_InterCluster-2.12,asmblade23-47979]
jgroups.blocks.TCPConnectionMap$TCPConnection WARN - TG: DEBUG len=70
2012-05-14 09:27:05,624 [Connection.Receiver [135.9.96.63:59953 -
135.9.128.31:7800],135.9.148.15_InterCluster-2.12,asmblade23-47979]
jgroups.blocks.TCPConnectionMap$TCPConnection WARN - TG: DEBUG len=70
2012-05-14 09:27:05,624 [OOB-1,135.9.148.15_InterCluster-2.12,asmblade23-47979]
jgroups.protocols.TCP WARN - packet from 135.9.128.31:7800 has different version (2.6.10)
from ours (3.0.10). Packet is discarded
2012-05-14 09:27:05,625 [OOB-2,135.9.148.15_InterCluster-2.12,asmblade23-47979]
jgroups.protocols.TCP WARN - packet from 135.9.128.31:7800 has different version (2.6.10)
from ours (3.0.10). Packet is discarded
2012-05-14 09:27:12,028 [ConnectionMap.Acceptor,null,null]
jgroups.blocks.TCPConnectionMap$TCPConnection WARN - packet from /135.9.96.59:56077 has
different version (2.6.10) from ours (3.0.10). This may cause problems
2012-05-14 09:27:12,030 [Connection.Receiver [135.9.96.63:7800 -
135.9.96.59:56077],135.9.148.15_InterCluster-2.12,asmblade23-47979]
jgroups.blocks.TCPConnectionMap$TCPConnection WARN - TG: DEBUG len=0
2012-05-14 09:27:12,032 [Connection.Receiver [135.9.96.63:7800 -
135.9.96.59:56077],135.9.148.15_InterCluster-2.12,asmblade23-47979] jgroups.protocols.TCP
ERROR - failed handling data from 135.9.96.59:7800
java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 2
at org.jgroups.protocols.TP.receive(TP.java:1200)
at org.jgroups.protocols.BasicTCP.receive(BasicTCP.java:104)
at
org.jgroups.blocks.TCPConnectionMap$TCPConnection$ConnectionPeerReceiver.run(TCPConnectionMap.java:603)
at java.lang.Thread.run(Thread.java:769)
2012-05-14 09:27:12,034 [Connection.Receiver [135.9.96.63:7800 -
135.9.96.59:56077],135.9.148.15_InterCluster-2.12,asmblade23-47979]
jgroups.blocks.TCPConnectionMap$TCPConnection WARN - TG: DEBUG len=1192331780
SOLUTION: discard packets from different JGroups versions in TCPConnectionMap.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:
http://www.atlassian.com/software/jira