Re: [infinispan-dev] DIST-SYNC, put(), a problem and a solution

Wednesday, 30 July 2014

On 29/07/14 16:39, Dan Berindei wrote:

...
     Investigation:
     ------------
     This mitigated the problem somewhat, but when I increased the requester
     threads to 100, I had the same problem again. Apparently, the Infinispan
     internal thread pool uses a rejection policy of "run" and thus uses the
     JGroups (OOB) thread when exhausted.

 We can't use another rejection policy in the remote executor because the
 message won't be re-delivered by JGroups, and we can't use a queue either.

Yes I'm aware of that and "run" is our only option for the Infinispan 
internal thread pool

...
     Suggested solution
     ----------------
     The modification RPC needs to be invoked *outside of the lock scope*:
     - lock K
     - modify K
     - unlock K
     - send modification to backup owner(s) // outside the lock scope

     The primary owner puts the modification of K into a queue from where a
     separate thread/task removes it. The thread then invokes the PUT(K) on
     the backup owner(s).

 Does the replication thread execute the PUT(k) synchronously, or
 asynchronously? I assume asynchronously, otherwise the replication
 thread wouldn't be able to keep up with the writers. 

Async would be preferred, but the order of the updates needs to be 
guaranteed. This could be done with the sequence numbers suggested by 
Sanne, or using total order/TOA.

...
     The queue has the modified keys in FIFO order, so the
modifications
     arrive at the backup owner(s) in the right order.

 Sending the RPC to the backup owners asynchronously, while holding the
 key lock, would do the same thing. 
Yes - but if there's a chance that the send() system call blocks, e.g. 
on TCP when the send window is full, the async repl should be outside 
the lock scope.

If those update messages to the backup owner(s) are regular (not OOB) 
messages, FIFO would ensure that they're processed in the order in which 
they were sent. If they're OOB messages, we'd have to somehow guarantee 
ordering, e.g. using seq numbers.

...
     This requires that the way GET is implemented changes slightly:
instead
     of invoking a GET on all owners of K, we only invoke it on the primary
     owner, then the next-in-line etc. 
...
 I have a WIP branch for this and it seemed to work fine. Test suite
 speed seemed about the same, but I didn't get to do a real performance test. 
Hmm, it should reduce overall traffic, which indirectly should lead to 
better performance. I hope you wrap your work with a JIRA and commit it !

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] DIST-SYNC, put(), a problem and a solution