On 2010-11-08, at 9:01 AM, Mircea Markus wrote:
>>
>
> Shouldn't you count down after receivers have applied the state, and have then
ack'd that this state has been applied? Rather than when the state has been picked
up?
>
>> State provider awaits on a latch for a
>> given view id and a timeout. When await returns it drains the tx log.
>
> Rather than a countdown latch, wouldn't you rather maintain a synchronized set or
something with the addresses of the recipients, and you only proceed when this set is
empty? Tighter control rather than a simple countdown.
>
> Also, how do you plan on dealing with exceptional circumstances, e.g., a receiver
node crashing before sending this ack? Would that not block the entire rehash process?
Or would the node crashing cause a new view change, which will abort the LeaveTask
(interrupt) and cause it to start all over again?
+1 that's something we should be prepared for.
That makes me think about an more interesting problem as well: what happens when a node
crashes in the middle of tx log draining? Not sure the tx log can revert itself to the
initial state, can it? Again something we should look into.
Ok, both of you convinced me. I have a working solution for ISPN-731. There are so many if
scenarios with rehashing that they should be dealt with separately and thoroughly. What do
you say I amend ISPN-493 with your comments above and we make the fix version 4.2 CR2 as
well?