[infinispan-dev] DIST-SYNC, put(), a problem and a solution

Bela Ban bban at redhat.com
Wed Jul 30 06:01:23 EDT 2014



On 29/07/14 16:39, Dan Berindei wrote:

>     Investigation:
>     ------------
>     This mitigated the problem somewhat, but when I increased the requester
>     threads to 100, I had the same problem again. Apparently, the Infinispan
>     internal thread pool uses a rejection policy of "run" and thus uses the
>     JGroups (OOB) thread when exhausted.
>
>
> We can't use another rejection policy in the remote executor because the
> message won't be re-delivered by JGroups, and we can't use a queue either.

Yes I'm aware of that and "run" is our only option for the Infinispan 
internal thread pool


>     Suggested solution
>     ----------------
>     The modification RPC needs to be invoked *outside of the lock scope*:
>     - lock K
>     - modify K
>     - unlock K
>     - send modification to backup owner(s) // outside the lock scope
>
>     The primary owner puts the modification of K into a queue from where a
>     separate thread/task removes it. The thread then invokes the PUT(K) on
>     the backup owner(s).
>
>
> Does the replication thread execute the PUT(k) synchronously, or
> asynchronously? I assume asynchronously, otherwise the replication
> thread wouldn't be able to keep up with the writers.


Async would be preferred, but the order of the updates needs to be 
guaranteed. This could be done with the sequence numbers suggested by 
Sanne, or using total order/TOA.


>     The queue has the modified keys in FIFO order, so the modifications
>     arrive at the backup owner(s) in the right order.
>
>
> Sending the RPC to the backup owners asynchronously, while holding the
> key lock, would do the same thing.

Yes - but if there's a chance that the send() system call blocks, e.g. 
on TCP when the send window is full, the async repl should be outside 
the lock scope.

If those update messages to the backup owner(s) are regular (not OOB) 
messages, FIFO would ensure that they're processed in the order in which 
they were sent. If they're OOB messages, we'd have to somehow guarantee 
ordering, e.g. using seq numbers.


>     This requires that the way GET is implemented changes slightly: instead
>     of invoking a GET on all owners of K, we only invoke it on the primary
>     owner, then the next-in-line etc.

> I have a WIP branch for this and it seemed to work fine. Test suite
> speed seemed about the same, but I didn't get to do a real performance test.

Hmm, it should reduce overall traffic, which indirectly should lead to 
better performance. I hope you wrap your work with a JIRA and commit it !

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)


More information about the infinispan-dev mailing list