On 10 Nov 2010, at 13:09, Vladimir Blagojevic wrote:
On 2010-11-08, at 9:01 AM, Mircea Markus wrote:
>>>
>>
>> Shouldn't you count down after receivers have applied the state, and have
then ack'd that this state has been applied? Rather than when the state has been
picked up?
>>
>>> State provider awaits on a latch for a
>>> given view id and a timeout. When await returns it drains the tx log.
>>
>> Rather than a countdown latch, wouldn't you rather maintain a synchronized
set or something with the addresses of the recipients, and you only proceed when this set
is empty? Tighter control rather than a simple countdown.
>>
>> Also, how do you plan on dealing with exceptional circumstances, e.g., a receiver
node crashing before sending this ack? Would that not block the entire rehash process?
Or would the node crashing cause a new view change, which will abort the LeaveTask
(interrupt) and cause it to start all over again?
> +1 that's something we should be prepared for.
> That makes me think about an more interesting problem as well: what happens when a
node crashes in the middle of tx log draining? Not sure the tx log can revert itself to
the initial state, can it? Again something we should look into.
Ok, both of you convinced me. I have a working solution for ISPN-731. There are so many
if scenarios with rehashing that they should be dealt with separately and thoroughly. What
do you say I amend ISPN-493 with your comments above and we make the fix version 4.2 CR2
as well?
We can wait for this for CR1, even. How long do you reckon you need?
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org