[infinispan-dev] DIST.retrieveFromRemoteSource

Mircea Markus mircea.markus at jboss.com
Wed Jan 25 06:48:28 EST 2012


On 25 Jan 2012, at 08:51, Dan Berindei wrote:

> Hi Sanne
> 
> On Wed, Jan 25, 2012 at 1:22 AM, Sanne Grinovero <sanne at infinispan.org> wrote:
>> Hello,
>> in the method:
>> org.infinispan.distribution.DistributionManagerImpl.retrieveFromRemoteSource(Object,
>> InvocationContext, boolean)
>> 
>> we have:
>> 
>>      List<Address> targets = locate(key);
>>      // if any of the recipients has left the cluster since the
>> command was issued, just don't wait for its response
>>      targets.retainAll(rpcManager.getTransport().getMembers());
>> 
>> But then then we use ResponseMode.WAIT_FOR_VALID_RESPONSE, which means
>> we're not going to wait for all responses anyway, and I think we might
>> assume to get a reply by a node which actually is in the cluster.
>> 
>> So the retainAll method is unneeded and can be removed? I'm wondering,
>> because it's not safe anyway, actually it seems very unlikely to me
>> that just between a locate(key) and the retainAll the view is being
>> changed, so not something we should be relying on anyway.
>> I'd rather assume that such a get method might be checked and
>> eventually dropped by the receiver.
>> 
> 
> The locate method will return a list of owners based on the
> "committed" cache view, so there is a non-zero probability that one of
> the owners has already left.
> 
> If I remember correctly, I added the retainAll call because otherwise
> ClusteredGetResponseFilter.needMoreResponses() would keep returning
> true if one of the targets left the cluster. Coupled with the fact
> that null responses are not considered valid (unless *all* responses
> are null), this meant that a remote get for a key that doesn't exist
> would throw a TimeoutException after 15 seconds instead of returning
> null immediately.
> 
> We could revisit the decision to make null responses invalid, and then
> as long as there is still one of the old owners left in the cluster
> you'll get the null result immediately. 
can't we just directly to a single node for getting a remote value? The main data owner perhaps. If the node is down we can retry to another node. Going to multiple nodes seems like a waist of resources - network in this case.


More information about the infinispan-dev mailing list