[infinispan-dev] DIST.retrieveFromRemoteSource

Wed Jan 25 12:56:27 EST 2012

On 25 January 2012 17:09, Dan Berindei <dan.berindei at gmail.com> wrote:
> On Wed, Jan 25, 2012 at 4:22 PM, Mircea Markus <mircea.markus at jboss.com> wrote:
>>
>> One node might be busy doing GC and stay unresponsive for a whole
>>
>> second or longer, another one might be actually crashed and you didn't
>>
>> know that yet, these are unlikely but possible.
>>
>> All these are possible but I would rather consider them as exceptional
>> situations, possibly handled by a retry logic. We should *not* optimise for
>> that these situations IMO.
>>
>
> As Sanne pointed out, an exceptional situation on a node becomes
> ordinary with 100s or 1000s of nodes.
> So the default policy should scale the initial number of requests with
> numOwners.
>
>>
>> More likely, a rehash is in progress, you could then be asking a node
>>
>> which doesn't yet (or anymore) have the value.
>>
>>
>> this is a consistency issue and I think we can find a way to handle it some
>> other way.
>>
>
> With the current state transfer we always send ClusteredGetCommands to
> the old owners (and only the old owners). If a node didn't receive the
> entire state, it means that state transfer hasn't finished yet and the
> CH will not return it as an owner. But the CH could also return owners
> that are no longer members of the cluster, so we have to check for
> that before picking one owner to send the command to.
>
> In Sanne's non-blocking state transfer proposal I think a new owner
> may have to ask the old owner for the key value, so it would still
> never return null. But it might be less expensive to ask the old owner
> directly (assuming it's safe from a consistency POV).
>
>>
>> All good reasons for which imho it makes sense to send out "a couple"
>>
>> of requests in parallel, but I'd unlikely want to send more than 2,
>>
>> and I agree often 1 might be enough.
>>
>> Maybe it should even optimize for the most common case: send out just
>>
>> one, have a more aggressive timeout and in case of trouble ask for the
>>
>> next node.
>>
>> +1
>>
>
> -1 for aggressive timeouts... you're going to do the same work as you
> do now, except you're going to wait a bit between sending requests. If
> you're really unlucky the first target will return first but you'll
> ignore its response because the timeout already expired.

Agreed, what I meant with "more aggressive timeouts" is not the
overall timeout to fail the get, but we might have a second one which
is more aggressive by starting to send the next GET when the first one
is "starting to not look good"; so we would have a timeout for the
whole operation, and one which decides at which point after a single
GET RPC didn't return yet we start to ask to another node.
So even if the global timeout is something high like "10 seconds", if
after 40 ms I still didn't get a reply from the first node I think we
can start sending the next one.. but still wait to eventually get an
answer on the first.

>
>>
>> In addition, sending a single request might spare us some Future,
>>
>> await+notify messing in terms of CPU cost of sending the request.
>>
>> it's the remote OOB thread that's the most costly resource imo.
>>
>
> I don't think the OOB thread is that costly, it doesn't block on
> anything (not even on state transfer!) so the most expensive part is
> reading the key and writing the value. BTW Sanne, we may want to run
> Transactional with a smaller payload size ;)
>
> We could implement our own GroupRequest that sends the requests in
> parallel instead implementing FutureCollator on top of UnicastRequest
> and save some of that overhead on the caller.
>
> I think we already have a JIRA to make PutKeyValueCommands return the
> previous value, that would eliminate lots of GetKeyValueCommands and
> it would actually improve the performance of puts - we should probably
> make this a priority.

+1 !!

>
>>
>> I think I agree on all points, it makes more sense.
>> Just that in a large cluster, let's say
>> 1000 nodes, maybe I want 20 owners as a sweet spot for read/write
>> performance tradeoff, and with such high numbers I guess doing 2-3
>> gets in parallel might make sense as those "unlikely" events, suddenly
>> are an almost certain.. especially the rehash in progress.
>>
>> So I'd propose a separate configuration option for # parallel get
>> events, and one to define a "try next node" policy. Or this policy
>> should be the whole strategy, and the #gets one of the options for the
>> default implementation.
>>
>> Agreed that having a configurable remote get policy makes sense.
>> We already have a JIRA for this[1], I'll start working on it as the
>> performance results are hunting me.
>
> I'd rather focus on implementing one remote get policy that works
> instead of making it configurable - even if we make it configurable
> we'll have to focus our optimizations on the default policy.
>
> Keep in mind that we also want to introduce eventual consistency - I
> think that's going to eliminate our optimization opportunity here
> because we'll need to get the values from a majority of owners (if not
> all the owners).
>
>> I'd like to have Dan's input on this as well first, as he has worked with
>> remote gets and I still don't know why null results are not considered valid
>> :)
>
> Pre-5.0 during state transfer an owner could return null to mean "I'm
> not sure", so the caller would ignore it unless every target returned
> null.
> That's no longer necessary, but it wasn't broken so I didn't fix it...
>
> Cheers
> Dan
>
>>
>> [1] https://issues.jboss.org/browse/ISPN-825
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev