Manik Surtani commented on ISPN-658:
Looking at the details, this is extremely complex to implement as it would involve:
From a series of email discussions:
Essentially the problem is that we don't currently support asymmetric clusters, such
NodeA [c1, c2]
NodeB [c1, c2, c3]
NodeC [c2, c3]
where each cache (cX) is created on a single CacheManager - which in turn starts a single
JGroups channel and is represented in JGroups as a single member.
Note that this only pertains to cache modes DIST and REPL. Other cache modes such as
LOCAL and INVALIDATION are unaffected, the former for obvious reasons and the latter since
invalidation doesn't perform any logic at startup. Invalidation messages are simply
discarded by the receiving transport if the cache in question does not exist on the
So the issue really only has to do with joining and leaving. Normal runtime operation for
both REPL and DIST will work fine (simply discard RPCs targeted to caches that don't
There are 3 specific areas which are challenging:
1) REPL uses a coordinator to determine where to pull state from.
2) DIST relies on a coordinator to coordinate a JOIN process, since JOINs are serialised
(concurrent JOINs not currently supported)
3) DIST relies on view changes to determine whether nodes have joined or left
To get around these, and to have minimal impact on the existing codebase, here is what I
think we can do (very early designs, so may well be invalid!)
* Create a new event: VirtualViewChange. A VVC is specific to a named cache.
* It may be delivered by the CacheManager, which upon receiving a real ViewChange,
would deliver a VVC to named caches to whom this may be relevant.
* Note that this is only useful for leaves. Joins handled separately.
* Cache.start() and Cache.stop() should emit a broadcast RPC to the cluster informing
the cluster that it has started, including the cache name.
* CacheManagers, on seeing remote caches start, may deliver VVCs to matching named
* RpcManager.getMembers() should be specific to the named cache. Transport.getMembers()
returns the entire view, as per the JGroups channel, however RpcManager.getMembers() -
which is named-cache specific - should prune the list from the transport to remove members
that aren't relevant to the named cache.
* Based on a cache-specific member list, we'd have a cache-specific coordinator as
In general, I think virtual views should work. However, here are some special cases we
need to handle, too:
#1 Views and virtual views
- There needs to be a map of associations between views and virtual views, not sure what
the keys will be. Maybe a BiMap...
Yes. The correlation could also me maintained on ViewID (and possibly a corresponding
#2 Broadcasts of CACHE_STARTED / CACHE_STOPPED
- These (*instead* of view changes) are the notifications which trigger rebalancing
- Essentially what Manik discussed below
Yes, but these could be triggered by REAL ViewChanges. E.g., nodes crash, and existing
nodes get a ViewChange. The CacheManager should then dispatch a VirtualViewChange to all
relevant caches (which are affected by this ViewChange; not others)
#3 Where are the consisten hash functions located ?
- We probably cannot have CHs per DistributionManager anymore, but per cache (?) The CHs
are not based on views, but on virtual views
DistributionManager is per-cache anyway so the ConsistentHash can still be owned by the
#4 Handling of left members
- When we get a view change excluding node P, everybody emits *local* notifications
CACHE_STOPPED for all caches associated with P. This is where we need the hashmap of #1
Yes. See my comment on #2 above.
- As an alternative to #4, we could have only the *coordinator* broadcast CACHE_STOPPED
events for P. This might eliminate the need for #5...
Not really, since the coordinator may not know of certain cache instances. I think the
CACHE_STOPPED notification should be delivered locally, once the CacheManager detects a
#5 Handling of new members
- When a new member starts, will it need the state (of #1) copied to it ? If we do #4,
then probably yes. Not nice though...
This becomes tricky. We have a few options (none of which are particularly palatable
A. On Cache.start(), a cache broadcasts CACHE_START and in *response*, gets the Addresses
of nodes which have the same named cache running (or null if not). And use this + a real
View to build a VirtualView.
B. On Cache.start(), broadcast CACHE_START and the virtual coordinator for that cache
sends a VirtualView back
One problem I foresee with both of these is maintaining consistent order of this Virtual
View (esp with option A.). Any other potential solutions?
#6 Handling of MergeViews
- We may need to reconcile (merge) the state of #1, in order to trigger the correct
- Maybe the new coordinator could broadcast (all or just diffs) CACHE_STARTED/STOPPED
notifications again... The recipients would not do anything if they already have a cache
On merge, the way we'd reconcile would depend on what we do for #5 above.
- As an alternative to #6, the new coordinator after a merge could broadcast a
CURRENT_STATE message, which contains a caches and associated views...
Right, but again any given coordinator may not know about all named caches.
Asymmetric clusters should be supported
Issue Type: Feature Request
Components: Distributed Cache, RPC, State transfer
Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2
Reporter: Paul Ferraro
Assignee: Manik Surtani
Fix For: 5.0.0.BETA1, 5.0.0.Final
Note that this would affect both distributed and replicated cache modes.
Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan
assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each
node, and is designed accordingly. This causes problems for applications where caches are
defined and started lazily on each node. For example:
Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes
(N1 and N2).
Currently, the DistributionManager does not properly handle the following scenarios:
1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing
is only triggered via view change. Failure to rehash on stopping of a cache can
inadvertently cause data loss, if all backups of a given cache entry have stopped.
2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join
request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does
not contain a C3 cache.
3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache,
though it's cache manager is started. This prematurely triggers a rehash of C1 and
C2, even though there are no new caches instances to consider.
To solve this, one proposal would involve:
1. Providing a "named cache coordinator" for each distributed named cache,
which would coordinate rehashes. This may or may not be the JGroups coordinator, and
named caches may or may not share the same named cache coordinator.
2. The DistManager would maintain a list of available members, which would be a subset of
all of the members available in the RpcManager.
3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the
same effect as a view change with a member removed, with the exception of affecting only a
single named cache.
With the above 3 in place, a proper solution could be devised to handle asymmetric
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira