[jboss-jira] [JBoss JIRA] Updated: (JGRP-1190) Race conditions in logical address caching with shared transport

Wed Apr 21 08:45:11 EDT 2010

     [ https://jira.jboss.org/jira/browse/JGRP-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bela Ban updated JGRP-1190:
---------------------------

    Attachment: ns.patch


> Race conditions in logical address caching with shared transport
> ----------------------------------------------------------------
>
>                 Key: JGRP-1190
>                 URL: https://jira.jboss.org/jira/browse/JGRP-1190
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Brian Stansberry
>            Assignee: Bela Ban
>             Fix For: 2.10
>
>         Attachments: JGRP-1190.patch, ns.patch
>
>
> The logical address caching (i.e. TP.logical_addr_cache) is prone to races when the shared transport is used. JBoss AS's startup simultaneously connects 2 channels on a shared transport. With 2.10.0-Alpha3 we're seeing problems with addresses for still healthy members being removed from logical_addr_cache. 
> An entry is added to the cache when:
> 1) an event comes down the stack, i.e. from Channel.setAddress().
> 2) A Discovery GET_MBRS_REQ comes in, either from a remote node or from receipt of the nodes own message.
> Primary mechanism for marking an entry for removal is a VIEW_CHANGE event comes down the stack, which results in a retainAll() invocation on the cache; only addresses that are part of the views of the channels associated with the shared TP are retained.
> This can lead to the following kind of race where C1 and C2 are 2 channels sharing the TP, ADD is one the events described above that adds to the cache, and VIEW is a VIEW_CHANGE event coming down:
> 1) C1:ADD
> 2) C2:ADD
> 3) C1:VIEW --- oops -- whatever C2:ADD added is marked as removable
> 4) C2:VIEW
> Besides this larger issue, there is also a minor race in TP.handleDownEvent's handling of address caches when there is a view change:
>             case Event.VIEW_CHANGE:
>                 synchronized(members) {
>                     View view=(View)evt.getArg();
>                     members.clear();
>                     if(!isSingleton()) {
>                         Vector<Address> tmpvec=view.getMembers();
>                         members.addAll(tmpvec);
>                     }
>                     else {
>                         // add all members from all clusters
>                         for(Protocol prot: up_prots.values()) {
>                             if(prot instanceof ProtocolAdapter) {
>                                 ProtocolAdapter ad=(ProtocolAdapter)prot;
>                                 Set<Address> tmp=ad.getMembers();
>                                 members.addAll(tmp);
>                             }
>                         }
>                     }
>                 }
>                 // fix for https://jira.jboss.org/jira/browse/JGRP-918
>                 logical_addr_cache.retainAll(members);
>                 UUID.retainAll(members);
>                 break;
> The two retainAll calls at the end need to be inside the synchronized block. Otherwise if TP is a shared transport, two threads can simultaneously be carrying down view changes. T1 proceeds through the synchronized block. Then while it is updating the address caches, T2 enters the sync block and begins manipulating 'members' with the result that the data passed by T1 to retainAll is incomplete.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira