On 4 Nov 2010, at 20:26, Vladimir Blagojevic wrote:
Mircea and I concluded that it is worth keeping the current pull
state
approach and bolting proper tx log draining unless this solution becomes
more complex than the original push state approach that already
serialized state sending and tx log draining.
To summarize, during leave rehash, we need state senders to drain tx log
(InvertedLeaveTask#processAndDrainTxLog) *after* all state receivers
have transferred the state. As things stand right now
(InvertedLeaveTask#performRehash) tx log draining is interleaved with
state transfer leading to problems described in the above mentioned JIRA.
The solution I have in mind is to introduce a
Map<Integer,CountDownLatch> in DistributedManagerImpl. They keys in this
map will be view ids for the leave rehash while CountDownLatch will be
initialized to the number of state receivers. As state receivers pick up
state we countDown on the latch.
Shouldn't you count down after receivers have applied the state, and have then
ack'd that this state has been applied? Rather than when the state has been picked
up?
State provider awaits on a latch for a
given view id and a timeout. When await returns it drains the tx log.
Rather than a countdown latch, wouldn't you rather maintain a synchronized set or
something with the addresses of the recipients, and you only proceed when this set is
empty? Tighter control rather than a simple countdown.
Also, how do you plan on dealing with exceptional circumstances, e.g., a receiver node
crashing before sending this ack? Would that not block the entire rehash process? Or
would the node crashing cause a new view change, which will abort the LeaveTask
(interrupt) and cause it to start all over again?
Cheers
Manik
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org