On 25 January 2012 17:09, Dan Berindei <dan.berindei(a)gmail.com> wrote:
On Wed, Jan 25, 2012 at 4:22 PM, Mircea Markus
<mircea.markus(a)jboss.com> wrote:
>
> One node might be busy doing GC and stay unresponsive for a whole
>
> second or longer, another one might be actually crashed and you didn't
>
> know that yet, these are unlikely but possible.
>
> All these are possible but I would rather consider them as exceptional
> situations, possibly handled by a retry logic. We should *not* optimise for
> that these situations IMO.
>
As Sanne pointed out, an exceptional situation on a node becomes
ordinary with 100s or 1000s of nodes.
So the default policy should scale the initial number of requests with
numOwners.
>
> More likely, a rehash is in progress, you could then be asking a node
>
> which doesn't yet (or anymore) have the value.
>
>
> this is a consistency issue and I think we can find a way to handle it some
> other way.
>
With the current state transfer we always send ClusteredGetCommands to
the old owners (and only the old owners). If a node didn't receive the
entire state, it means that state transfer hasn't finished yet and the
CH will not return it as an owner. But the CH could also return owners
that are no longer members of the cluster, so we have to check for
that before picking one owner to send the command to.
In Sanne's non-blocking state transfer proposal I think a new owner
may have to ask the old owner for the key value, so it would still
never return null. But it might be less expensive to ask the old owner
directly (assuming it's safe from a consistency POV).
>
> All good reasons for which imho it makes sense to send out "a couple"
>
> of requests in parallel, but I'd unlikely want to send more than 2,
>
> and I agree often 1 might be enough.
>
> Maybe it should even optimize for the most common case: send out just
>
> one, have a more aggressive timeout and in case of trouble ask for the
>
> next node.
>
> +1
>
-1 for aggressive timeouts... you're going to do the same work as you
do now, except you're going to wait a bit between sending requests. If
you're really unlucky the first target will return first but you'll
ignore its response because the timeout already expired.
Agreed, what I meant with "more aggressive timeouts" is not the
overall timeout to fail the get, but we might have a second one which
is more aggressive by starting to send the next GET when the first one
is "starting to not look good"; so we would have a timeout for the
whole operation, and one which decides at which point after a single
GET RPC didn't return yet we start to ask to another node.
So even if the global timeout is something high like "10 seconds", if
after 40 ms I still didn't get a reply from the first node I think we
can start sending the next one.. but still wait to eventually get an
answer on the first.
>
> In addition, sending a single request might spare us some Future,
>
> await+notify messing in terms of CPU cost of sending the request.
>
> it's the remote OOB thread that's the most costly resource imo.
>
I don't think the OOB thread is that costly, it doesn't block on
anything (not even on state transfer!) so the most expensive part is
reading the key and writing the value. BTW Sanne, we may want to run
Transactional with a smaller payload size ;)
We could implement our own GroupRequest that sends the requests in
parallel instead implementing FutureCollator on top of UnicastRequest
and save some of that overhead on the caller.
I think we already have a JIRA to make PutKeyValueCommands return the
previous value, that would eliminate lots of GetKeyValueCommands and
it would actually improve the performance of puts - we should probably
make this a priority.
+1 !!
>
> I think I agree on all points, it makes more sense.
> Just that in a large cluster, let's say
> 1000 nodes, maybe I want 20 owners as a sweet spot for read/write
> performance tradeoff, and with such high numbers I guess doing 2-3
> gets in parallel might make sense as those "unlikely" events, suddenly
> are an almost certain.. especially the rehash in progress.
>
> So I'd propose a separate configuration option for # parallel get
> events, and one to define a "try next node" policy. Or this policy
> should be the whole strategy, and the #gets one of the options for the
> default implementation.
>
> Agreed that having a configurable remote get policy makes sense.
> We already have a JIRA for this[1], I'll start working on it as the
> performance results are hunting me.
I'd rather focus on implementing one remote get policy that works
instead of making it configurable - even if we make it configurable
we'll have to focus our optimizations on the default policy.
Keep in mind that we also want to introduce eventual consistency - I
think that's going to eliminate our optimization opportunity here
because we'll need to get the values from a majority of owners (if not
all the owners).
> I'd like to have Dan's input on this as well first, as he has worked with
> remote gets and I still don't know why null results are not considered valid
> :)
Pre-5.0 during state transfer an owner could return null to mean "I'm
not sure", so the caller would ignore it unless every target returned
null.
That's no longer necessary, but it wasn't broken so I didn't fix it...
Cheers
Dan
>
> [
1] https://issues.jboss.org/browse/ISPN-825
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev