[jboss-dev-forums] [Design of Clustering on JBoss (Clusters/JBoss)] - Re: Handling cluster state when network partitions occur

Thu Sep 13 14:29:49 EDT 2007

I think the primary partition approach is best.  Caches not in the primary partition purging their in memory state is probably the wrong path though, since as a generic solution, not all installations will be backed by shared databases.

Caches shutting down would be my preferred option.  Perhaps block for a short period, hoping the network would heal, and then throw an exception after a timeout.  Perhaps a specific exception - SplitBrainException or something - so that cache users such as HTTP Replication can react by forcing an HTTP response like 410 (don't know if this is possible - Brian?) such that the load balancer will treat the node as unavailable.  Once the partition heals the cache is made available to requests again after performing a state transfer to come up to speed with the primary partition.

Even the impact of incorrectly identifying a primary partition is low, since at worst case, the larger partition is unresponsive while the smaller one is.  I guess the real problem is more than one partition thinking it is primary.  :-)

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4084139#4084139

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4084139