On 5 Nov 2010, at 09:39, Manik Surtani wrote:
On 4 Nov 2010, at 20:26, Vladimir Blagojevic wrote:
> Mircea and I concluded that it is worth keeping the current pull state
> approach and bolting proper tx log draining unless this solution becomes
> more complex than the original push state approach that already
> serialized state sending and tx log draining.
>
> To summarize, during leave rehash, we need state senders to drain tx log
> (InvertedLeaveTask#processAndDrainTxLog) *after* all state receivers
> have transferred the state. As things stand right now
> (InvertedLeaveTask#performRehash) tx log draining is interleaved with
> state transfer leading to problems described in the above mentioned JIRA.
>
> The solution I have in mind is to introduce a
> Map<Integer,CountDownLatch> in DistributedManagerImpl. They keys in this
> map will be view ids for the leave rehash while CountDownLatch will be
> initialized to the number of state receivers. As state receivers pick up
> state we countDown on the latch.
Shouldn't you count down after receivers have applied the state, and have then
ack'd that this state has been applied? Rather than when the state has been picked
up?
> State provider awaits on a latch for a
> given view id and a timeout. When await returns it drains the tx log.
Rather than a countdown latch, wouldn't you rather maintain a synchronized set or
something with the addresses of the recipients, and you only proceed when this set is
empty? Tighter control rather than a simple countdown.
Also, how do you plan on dealing with exceptional circumstances, e.g., a receiver node
crashing before sending this ack? Would that not block the entire rehash process? Or
would the node crashing cause a new view change, which will abort the LeaveTask
(interrupt) and cause it to start all over again?
+1 that's something we should
be prepared for.
That makes me think about an more interesting problem as well: what happens when a node
crashes in the middle of tx log draining? Not sure the tx log can revert itself to the
initial state, can it? Again something we should look into.
Cheers
Manik
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev