]
Bela Ban commented on JGRP-812:
-------------------------------
NULL_MSGs have been removed in 2.7; when a seqno greater than the highest received is
received, we do *not* add NULL_MSGs any longer, but simply update highest_received. We
still add elements to the Retransmitter though...
NakReceiverWindow: possible race condition
------------------------------------------
Key: JGRP-812
URL:
https://jira.jboss.org/jira/browse/JGRP-812
Project: JGroups
Issue Type: Bug
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 2.7
[Randy Watler]
I just ran into a really rare race condition that appears to be in
NakReceiverWindow that occurred on a cluster member when another member
was shutdown. I believe what happened here was the NakReceiverWindow for
the node that left was reset, but then a late message was received from
that node with an old sequence number that was greater than 200000000.
The window then attempted to create that many NULL_MSG entries in the
xmit_table and blew out the heap.
Here is my somewhat circumstantial evidence for this claim. I managed to
catch a thread adding a NULL_MSG to the xmit_table when the cluster
should have been generally quiescent:
"Incoming,cache,10.129.1.202:9931" prio=10 tid=0x000000005dac5800 nid=0x790d
runnable [0x00000000412e5000..0x00000000412e5a80]
java.lang.Thread.State: RUNNABLE
at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:883)
at org.jgroups.stack.NakReceiverWindow.add(NakReceiverWindow.java:294)
at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:802)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:672)
at org.jgroups.protocols.BARRIER.up(BARRIER.java:122)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
at org.jgroups.protocols.FD.up(FD.java:322)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:300)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:145)
at org.jgroups.protocols.Discovery.up(Discovery.java:246)
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1535)
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1479)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)
Here is the related NakReceiverWindow code:
287 // Case #4: we received a seqno higher than expected: add NULL_MSG
values for missing messages, add to Retransmitter
288 if(seqno > next_to_add) {
289 if(listener != null) {
290 try {listener.messageGapDetected(next_to_add, seqno,
msg.getSrc());} catch(Throwable t) {}
291 }
292
293 for(long i=next_to_add; i < seqno; i++) { // add all msgs
(missing or not) into table
294 xmit_table.put(i, NULL_MSG);
295 }
296 xmit_table.put(seqno, msg);
297 retransmitter.add(old_next, seqno -1); // BUT: add only null
messages to xmitter
298 return true;
299 }
By itself this certainly could be circumstantial, but this thread was
captured while the JVM heap was being actively consumed and GC was
taking over the machine resources. The following heap profile was
captured along with the thread above:
num #instances #bytes class name
----------------------------------------------
1: 57613383 2765442384 java.util.concurrent.ConcurrentHashMap$HashEntry
2: 57578762 1381890288 java.lang.Long
3: 1824 1117553680 [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
4: 104423 11514120 [C
Obviously, that is a big and growing ConcurrentHashMap keyed by Long...
seems like it could be the xmit_table above. We have logging on for
jgroups protocols. Here is the NAKACK messages log line BEFORE the heap
started to be consumed:
cache NAKACK msgs: 10.129.1.241:9931: [214528761 : 214531351 (214531351)];
10.129.1.131:9931: [487104 : 487111 (487111)]; 10.129.1.203:
9931: [130428 : 130429 (130429)]; 10.129.1.202:9931: [123991 : 123994 (123994) (size=3,
missing=0, highest stability=123991)]; 10.129.1.132:
9931: [487597 : 487601 (487601)]; 10.129.1.201:9931: [147919 : 147923 (147923)];
10.129.1.133:9931: [487203 : 487209 (487209)]; 10.129.1.134
:9931: [487666 : 487668 (487668)]
and after:
cache NAKACK msgs: 10.129.1.131:9931: [487126 : 487138 (487138)]; 10.129.1.203:9931:
[130433 : 130434 (130434)]; 10.129.1.202:9931: [12
3997 : 123998 (123998) (size=1, missing=0, highest stability=123997)]; 10.129.1.132:9931:
[487619 : 487636 (487636)]; 10.129.1.201:9931: [14
7936 : 147940 (147940)]; 10.129.1.133:9931: [487222 : 487244 (487244)];
10.129.1.134:9931: [487678 : 487696 (487696)]
So, its pretty clear that the 10.129.1.241 node left the view and its
last sequence number was approximately 214531351. That number is
sufficiently large to explain the fact I was able to catch this
behavior, the fact that it was consuming the heap, and apparently had
already added 57M entries to some ConcurrentHashMap. Logging in GMS and
NAKACK also verified the view change:
2008-08-12 03:52:51,088 DEBUG [FD_SOCK pinger,cache,10.129.1.202:9931]:
org.jgroups.protocols.FD_SOCK - peer closed socket normally
2008-08-12 03:52:51,184 DEBUG [FD_SOCK pinger,cache,10.129.1.202:9931]:
org.jgroups.protocols.FD_SOCK - pinger thread terminated
2008-08-12 03:52:51,189 DEBUG [Incoming,cache,10.129.1.202:9931]:
org.jgroups.protocols.pbcast.GMS - view=[10.129.1.131:9931|8] [10.129.1.1
31:9931, 10.129.1.132:9931, 10.129.1.133:9931, 10.129.1.134:9931, 10.129.1.203:9931,
10.129.1.201:9931, 10.129.1.202:9931]
2008-08-12 03:52:51,193 DEBUG [Incoming,cache,10.129.1.202:9931]:
org.jgroups.protocols.pbcast.GMS - [local_addr=10.129.1.202:9931] view is
[10.129.1.131:9931|8] [10.129.1.131:9931, 10.129.1.132:9931, 10.129.1.133:9931,
10.129.1.134:9931, 10.129.1.203:9931, 10.129.1.201:9931, 10
.129.1.202:9931]
2008-08-12 03:52:51,194 WARN [Incoming,cache,10.129.1.202:9931]:
org.jgroups.protocols.pbcast.NAKACK - 10.129.1.202:9931] discarded message
from non-member 10.129.1.241:9931, my view is [10.129.1.131:9931|7] [10.129.1.131:9931,
10.129.1.132:9931, 10.129.1.133:9931, 10.129.1.134:
9931, 10.129.1.203:9931, 10.129.1.201:9931, 10.129.1.202:9931, 10.129.1.241:9931]
2008-08-12 03:52:51,195 INFO [Incoming,cache,10.129.1.202:9931]:
com.getmedium.medium.cache.lateral.endpoint.LateralCacheEndpoint - Current
membership for cache group: [10.129.1.131:9931, 10.129.1.132:9931, 10.129.1.133:9931,
10.129.1.134:9931, 10.129.1.203:9931, 10.129.1.201:99
31, 10.129.1.202:9931]
2008-08-12 03:52:51,201 DEBUG [Incoming,cache,10.129.1.202:9931]:
org.jgroups.protocols.pbcast.GMS - sending VIEW_ACK to 10.129.1.131:9931
2008-08-12 03:52:51,201 DEBUG [Timer,null,null]: org.jgroups.protocols.FD_SOCK -
VIEW_CHANGE received: [10.129.1.131:9931, 10.129.1.132:993
1, 10.129.1.133:9931, 10.129.1.134:9931, 10.129.1.203:9931, 10.129.1.201:9931,
10.129.1.202:9931]
2008-08-12 03:52:51,201 WARN [Incoming,cache,10.129.1.202:9931]:
org.jgroups.protocols.pbcast.NAKACK - 10.129.1.202:9931] discarded message
from non-member 10.129.1.241:9931, my view is [10.129.1.131:9931|8] [10.129.1.131:9931,
10.129.1.132:9931, 10.129.1.133:9931, 10.129.1.134:
9931, 10.129.1.203:9931, 10.129.1.201:9931, 10.129.1.202:9931]
2008-08-12 03:52:51,201 WARN [Incoming,cache,10.129.1.202:9931]:
org.jgroups.protocols.pbcast.NAKACK - 10.129.1.202:9931] discarded message
from non-member 10.129.1.241:9931, my view is [10.129.1.131:9931|8] [10.129.1.131:9931,
10.129.1.132:9931, 10.129.1.133:9931, 10.129.1.134:
9931, 10.129.1.203:9931, 10.129.1.201:9931, 10.129.1.202:9931]
2008-08-12 03:52:51,201 DEBUG [FD_SOCK pinger,cache,10.129.1.202:9931]:
org.jgroups.protocols.FD_SOCK - pinger_thread started
Upon inspecting the code in NakReceiverWindow, it seems plausible to me
that a message arrived in add() just after the view was adjusted and
reset() was called. Given that the sequence numbers are fairly high in
this case, its easy to see why that tight loop in add() might attempt
what is trying to do here. Does this scenario seem possible as I present
it? If so, is there a way to bound "case#4" and/or shutdown the add()
after a reset?
Unfortunately, this is the ONLY time I have seen this behavior in over 2
years of working with JGroups, so I do not expect to be able to
reproduce it again for another year. I just happened to catch it this
time around in a state where I was able to diagnose it to some degree
before the JVM went dark. Please let me know if I can help diagnose this
further.
Randy Watler
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: