I'm hoping that JBoss Cache can recover cache consistency following a
transient network outage. From my readings of both Cache and JGroups,
I've ended up thinking that it's JGroups that notices begin-and-end
of network partitions and that JGroups then notifies dependent classes
so that they know that they need to deal with the potential merging of
any higher-level state changes in each sub-cluster during the network partition.
This seems consistent with a recent response from Bela Ban (2007.0914):
But from my understanding of JBoss Cache (the dependent class I'm considering) via:
"JBoss Cache TreeCache: A Structured, Replicated, Transactional Cache"
I can't identify anything as directly addressing JBoss Cache's "application-data" healing from a network partition...
Here's a particular test scenario that would need to handle the situation I'm thinking of:
JBoss Cache ClusterName="A-B" with two members ("A" and "B") is started.
"A" and "B" both view "A" as "coordinator".
To keep things simple, let's have only a single region.
No Cache Loaders are used -- thus all Cache state exists only in-memory (no back-end data store).
Time passes peacefully and "A-B" builds up some amount of Cache state, all of it consistent across A and B.
Then, a network partition occurs (perhaps a backbone router is powered down for 10-15 minutes to add an additional card).
After some amount of time, both A and B finally end-up marking each other "dead" and themselves
as JGroup coordinator for their sub-cluster.
During this partition, both A and B can continue to accept updates to their Cache state, right?
And this would result in the now-independent Caches becoming distinct from each other.
So, once the result of the network partition is resolved, JGroups will realize this and notify JBoss Cache
via a ViewChange.
My question is:
does JBoss Cache itself somehow merge the during-partition state changes together?
Since neither member was itself down and both could potentially have partition-related state changes,
causing either to simply dump their in-memory state and ask for a full state-transfer will result in a
loss of information.