Race condition in TP.handleDownEvent
------------------------------------
Key: JGRP-1190
URL:
https://jira.jboss.org/jira/browse/JGRP-1190
Project: JGroups
Issue Type: Bug
Reporter: Brian Stansberry
Assignee: Brian Stansberry
Fix For: 2.10
There is a race in TP.handleDownEvent's handling of address caches when there is a
view change:
case Event.VIEW_CHANGE:
synchronized(members) {
View view=(View)evt.getArg();
members.clear();
if(!isSingleton()) {
Vector<Address> tmpvec=view.getMembers();
members.addAll(tmpvec);
}
else {
// add all members from all clusters
for(Protocol prot: up_prots.values()) {
if(prot instanceof ProtocolAdapter) {
ProtocolAdapter ad=(ProtocolAdapter)prot;
Set<Address> tmp=ad.getMembers();
members.addAll(tmp);
}
}
}
}
// fix for
https://jira.jboss.org/jira/browse/JGRP-918
logical_addr_cache.retainAll(members);
UUID.retainAll(members);
break;
The two retainAll calls at the end need to be inside the synchronized block. Otherwise if
TP is a shared transport, two threads can simultaneously be carrying down view changes. T1
proceeds through the synchronized block. Then while it is updating the address caches, T2
enters the sync block and begins manipulating 'members' with the result that the
data passed to retainAll is incomplete.
JBoss AS's startup simultaneously connects 2 channels on a shared transport. With
2.10.0-Alpha3 we're seeing problems with addresses for still healthy members being
removed from logical_addr_cache. Looking at the code, the above mechanism is the only way
I can see that this could happen.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira