Discussion of conceptual/design issues related to dealing with shunned nodes rejoining a
cluster.
A fundamental thing to understand is that when your channel is shunned and disconnects,
you don't get a new view. Experiments with HAPartition showed me that. You do get one
when it reconnects. So, any features whose design depends on gettting views may break.
Mainly the issue is to think through the implications of that. Some initial thoughts:
1) While you're disconnected you can't properly replicate. What's the proper
behavior -- hold calls in ReplicationInterceptor until you reconnect, which will happen
automatically? Or throw exceptions when you try to send messages on the channel (which
I'd expect is what happens now)? If the latter, it becomes important to have a way to
communicate to the application that the cache is disconnected, so the application can
decide whether or not to hold calls.
2) When the new view comes in after reconnect, does the node need to reassign it's
buddy and transfer state? During the period of broken communication leading up to the
shunning, the buddy probably wasn't getting replication traffic. If REPL_ASYNC, the
sender would not know this. If we do need to reassign buddies, when the new view comes in
will the BR code recognize it needs to do this?
3) The channel is always AUTO_GET_STATE. So, when it reconnects getState() will be called
on the coordinator and setState() will be called on the node. This is inappropriate in
many configurations; basically most of those that don't do an initial state transfer.
See
http://jira.jboss.com/jira/browse/JBCACHE-805. If it is appropriate, do we handle it
properly?
3) Non-BR case with region-based marshalling. We definitely don't want a single
monolithic state transfer; won't be able to deserialize it. But, we do need to
re-transfer the state, as we're now out of sync with the cache. Right now we
don't do this.
A lot of similar issues apply in the merge case.
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3977305#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...