[infinispan-dev] [ISPN-731] - make rehashing state and tx log draining serial

Mircea Markus mircea.markus at jboss.com
Mon Nov 8 07:01:53 EST 2010


On 5 Nov 2010, at 09:39, Manik Surtani wrote:

> 
> On 4 Nov 2010, at 20:26, Vladimir Blagojevic wrote:
> 
>> Mircea and I concluded that it is worth keeping the current pull state 
>> approach and bolting proper tx log draining unless this solution becomes 
>> more complex than the original push state approach that already 
>> serialized state sending and tx log draining.
>> 
>> To summarize, during leave rehash, we need state senders to drain tx log 
>> (InvertedLeaveTask#processAndDrainTxLog) *after* all state receivers 
>> have transferred the state. As things stand right now 
>> (InvertedLeaveTask#performRehash) tx log draining is interleaved with 
>> state transfer leading to problems described in the above mentioned JIRA.
>> 
>> The solution I have in mind is to introduce a 
>> Map<Integer,CountDownLatch> in DistributedManagerImpl. They keys in this 
>> map will be view ids for the leave rehash while CountDownLatch will be 
>> initialized to the number of state receivers. As state receivers pick up 
>> state we countDown on the latch.
> 
> Shouldn't you count down after receivers have applied the state, and have then ack'd that this state has been applied?  Rather than when the state has been picked up?
> 
>> State provider awaits on a latch for a 
>> given view id and a timeout. When await returns it drains the tx log.
> 
> Rather than a countdown latch, wouldn't you rather maintain a synchronized set or something with the addresses of the recipients, and you only proceed when this set is empty?  Tighter control rather than a simple countdown.
> 
> Also, how do you plan on dealing with exceptional circumstances, e.g., a receiver node crashing before sending this ack?  Would that not block the entire rehash process?  Or would the node crashing cause a new view change, which will abort the LeaveTask (interrupt) and cause it to start all over again?
+1 that's something we should be prepared for.
That makes me think about an more interesting problem as well: what happens when a node crashes in the middle of tx log draining? Not sure the tx log can revert itself to the initial state, can it? Again something we should look into. 
> 
> Cheers
> Manik
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
> 
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev




More information about the infinispan-dev mailing list