]
Bela Ban commented on JGRP-664:
-------------------------------
Investigate all occurrences of CopyOnWriteArrayList and CopyOnWriteArraySet
CopyOnWrite collections should be synchronized
----------------------------------------------
Key: JGRP-664
URL:
http://jira.jboss.com/jira/browse/JGRP-664
Project: JGroups
Issue Type: Bug
Reporter: Bela Ban
Assigned To: Bela Ban
Fix For: 2.7, 2.4.2, 2.6.2
[email from Rick Pike]
Sometimes when we have nodes start suspecting each other we see intermittent:
2008-01-14 06:49:37,464 [ERROR] UDP failed handling incoming message
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at
java.util.concurrent.CopyOnWriteArrayList.rangeCheck(CopyOnWriteArrayList.java:708)
at
java.util.concurrent.CopyOnWriteArrayList.get(CopyOnWriteArrayList.java:328)
at org.jgroups.protocols.FD.getPingDest(FD.java:151)
at org.jgroups.protocols.FD.up(FD.java:305)
at org.jgroups.protocols.FD_ICMP.up(FD_ICMP.java:108)
at org.jgroups.protocols.MERGE3.up(MERGE3.java:126)
at org.jgroups.protocols.Discovery.up(Discovery.java:246)
at
org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1535)
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1484)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
It looks like there's 4 places in FD that modify the pingable_mbrs list:
1. up(HEARTBEAT_ACK)
2. up(SUSPECT)
3. down(VIEW_CHANGE)
4. down(UNSUSPECT)
Only #1 and #3 appear to synchronize{} around the edits, and I believe that
a concurrent (UN)SUSPECT message is emptying that list whille the
HEARTBEAT_ACK is looping and repeatedly reading - inside getPingDest().
Every so often, it calls get(0) on an empty list.
We just started seeing this when we started testing with FD_ICMP, but I
imagine this would happen with any flavor, and we're seeing it due to the
high volume of SUSPECT messages happening in our tests.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: