[JBoss JIRA] Created: (ISPN-658) DistributionManager not considerate of cache state changes

[JBoss JIRA] (ISPN-1462)...

[JBoss JIRA] (ISPN-1456) Improve...

Paul Ferraro (JIRA)

Monday, 20 September 2010 Mon, 20 Sep '10

12:48 p.m.

DistributionManager not considerate of cache state changes ---------------------------------------------------------- Key: ISPN-658 URL: https://jira.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Bug Components: Distributed Cache Affects Versions: 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Manik Surtani Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

Show replies by date

Manik Surtani (JIRA)

Monday, 27 September Mon, 27 Sep

11:08 a.m.

New subject: [JBoss JIRA] Updated: (ISPN-658) DistributionManager not considerate of cache state changes

[ https://jira.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.sys... ] Manik Surtani updated ISPN-658: ------------------------------- Fix Version/s: 4.2.0.BETA1 4.2.0.Final Affects Version/s: 4.1.0.Final 4.0.0.Final Complexity: High

...

DistributionManager not considerate of cache state changes ---------------------------------------------------------- Key: ISPN-658 URL: https://jira.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Bug Components: Distributed Cache Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Manik Surtani Fix For: 4.2.0.BETA1, 4.2.0.Final Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider.

-- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

Manik Surtani (JIRA)

Wednesday, 29 September Wed, 29 Sep

12:18 p.m.

New subject: [JBoss JIRA] Updated: (ISPN-658) Asymmetric clusters should be supported in DIST mode

[ https://jira.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.sys... ] Manik Surtani updated ISPN-658: ------------------------------- Summary: Asymmetric clusters should be supported in DIST mode (was: DistributionManager not considerate of cache state changes) Issue Type: Feature Request (was: Bug) Fix Version/s: 5.0.0.BETA1 5.0.0.Final (was: 4.2.0.BETA1) (was: 4.2.0.Final) Priority: Blocker (was: Major) Description: Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters. was: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider.

...

Asymmetric clusters should be supported in DIST mode ---------------------------------------------------- Key: ISPN-658 URL: https://jira.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Feature Request Components: Distributed Cache Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Manik Surtani Priority: Blocker Fix For: 5.0.0.BETA1, 5.0.0.Final Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters.

Manik Surtani (JIRA)

12:21 p.m.

New subject: [JBoss JIRA] Updated: (ISPN-658) Asymmetric clusters should be supported in DIST mode

[ https://jira.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.sys... ] Manik Surtani updated ISPN-658: ------------------------------- JBoss Forum Reference: http://lists.jboss.org/pipermail/infinispan-dev/2010-September/006414.html

...

Manik Surtani (JIRA)

1:29 p.m.

New subject: [JBoss JIRA] Updated: (ISPN-658) Asymmetric clusters should be supported

[ https://jira.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.sys... ] Manik Surtani updated ISPN-658: ------------------------------- Summary: Asymmetric clusters should be supported (was: Asymmetric clusters should be supported in DIST mode) Description: Note that this would affect both distributed and replicated cache modes. Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters. was: Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters. Component/s: RPC State transfer

...

Asymmetric clusters should be supported --------------------------------------- Key: ISPN-658 URL: https://jira.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Feature Request Components: Distributed Cache, RPC, State transfer Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Manik Surtani Priority: Blocker Fix For: 5.0.0.BETA1, 5.0.0.Final Note that this would affect both distributed and replicated cache modes. Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters.

Mircea Markus (JIRA)

Monday, 11 October Mon, 11 Oct

12:03 p.m.

New subject: [JBoss JIRA] Commented: (ISPN-658) Asymmetric clusters should be supported

[ https://jira.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.sys... ] Mircea Markus commented on ISPN-658: ------------------------------------ re: problem 1 Stop C1 on N1. Guess we would also need a new notification event, something like TopologyChanged perhaps? Currently we have ViewChanged and MergeView which extends ViewChanged. Perhaps ViewChanged should extends TopologyChange?

...

Mircea Markus (JIRA)

12:07 p.m.

New subject: [JBoss JIRA] Commented: (ISPN-658) Asymmetric clusters should be supported

[ https://jira.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.sys... ] Mircea Markus commented on ISPN-658: ------------------------------------ I would also add problem 4 (relevant for unit testing). if we have C1 running on N1 and N2, but not on N3 then distribution would spread keys over N3 as well, so data might simply get lost (or exception thrown, not for async repl though).

...

Mircea Markus (JIRA)

12:18 p.m.

New subject: [JBoss JIRA] Commented: (ISPN-658) Asymmetric clusters should be supported

[ https://jira.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.sys... ] Mircea Markus commented on ISPN-658: ------------------------------------ Re:solution, might be I am talking about the same thing a you. The solution I have in mind is to add a VirtualView which is a subset of the current view, it is one/cache/cluster and is managed as follows: - N1 leaves => VirtualView(C1) -= {N1} - this can be handled by listening on view changes. It might be an no-op if C1 was not started on C1. - cache C1 started on N3 => VirtualView(C1) =+ {N3} - we would need a cache started RPC - cache C1 stopped on N3 => VirtualView(C1) -+ {N3} - we would need a cache stopped RPC If we replace the View with VirtualView, the all existing code should function correctly.

...

Galder Zamarreño (JIRA)

Tuesday, 12 October Tue, 12 Oct

5:39 a.m.

New subject: [JBoss JIRA] Commented: (ISPN-658) Asymmetric clusters should be supported

[ https://jira.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.sys... ] Galder Zamarreño commented on ISPN-658: --------------------------------------- p.s. I'd like to comment on it :). Just reply to the original thread by Manik in http://lists.jboss.org/pipermail/infinispan-dev/2010-September/006414.html

...

Galder Zamarreño (JIRA)

5:39 a.m.

New subject: [JBoss JIRA] Commented: (ISPN-658) Asymmetric clusters should be supported

[ https://jira.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.sys... ] Galder Zamarreño commented on ISPN-658: --------------------------------------- MIrcea, I think it's better if the solution is discussed in the dev forum where people can comment more easily.

...

Manik Surtani (JIRA)

Saturday, 19 March Sat, 19 Mar

4:15 a.m.

New subject: [JBoss JIRA] Commented: (ISPN-658) Asymmetric clusters should be supported

[ https://issues.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.s... ] Manik Surtani commented on ISPN-658: ------------------------------------ As such, a decision has been made to defer this to a later date, as a more experimental feature since it does threaten to destabilise Infinispan's core.

...

Asymmetric clusters should be supported --------------------------------------- Key: ISPN-658 URL: https://issues.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Feature Request Components: Distributed Cache, RPC, State transfer Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Manik Surtani Priority: Blocker Fix For: 5.0.0.BETA1, 5.0.0.Final Note that this would affect both distributed and replicated cache modes. Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Manik Surtani (JIRA)

4:15 a.m.

New subject: [JBoss JIRA] Commented: (ISPN-658) Asymmetric clusters should be supported

[ https://issues.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.s... ] Manik Surtani commented on ISPN-658: ------------------------------------ Looking at the details, this is extremely complex to implement as it would involve:

...

From a series of email discussions:

{quote} Essentially the problem is that we don't currently support asymmetric clusters, such as: NodeA [c1, c2] NodeB [c1, c2, c3] NodeC [c2, c3] where each cache (cX) is created on a single CacheManager - which in turn starts a single JGroups channel and is represented in JGroups as a single member. Note that this only pertains to cache modes DIST and REPL. Other cache modes such as LOCAL and INVALIDATION are unaffected, the former for obvious reasons and the latter since invalidation doesn't perform any logic at startup. Invalidation messages are simply discarded by the receiving transport if the cache in question does not exist on the receiving node. So the issue really only has to do with joining and leaving. Normal runtime operation for both REPL and DIST will work fine (simply discard RPCs targeted to caches that don't exist). There are 3 specific areas which are challenging: 1) REPL uses a coordinator to determine where to pull state from. 2) DIST relies on a coordinator to coordinate a JOIN process, since JOINs are serialised (concurrent JOINs not currently supported) 3) DIST relies on view changes to determine whether nodes have joined or left To get around these, and to have minimal impact on the existing codebase, here is what I think we can do (very early designs, so may well be invalid!) * Create a new event: VirtualViewChange. A VVC is specific to a named cache. * It may be delivered by the CacheManager, which upon receiving a real ViewChange, would deliver a VVC to named caches to whom this may be relevant. * Note that this is only useful for leaves. Joins handled separately. * Cache.start() and Cache.stop() should emit a broadcast RPC to the cluster informing the cluster that it has started, including the cache name. * CacheManagers, on seeing remote caches start, may deliver VVCs to matching named caches. * RpcManager.getMembers() should be specific to the named cache. Transport.getMembers() returns the entire view, as per the JGroups channel, however RpcManager.getMembers() - which is named-cache specific - should prune the list from the transport to remove members that aren't relevant to the named cache. * Based on a cache-specific member list, we'd have a cache-specific coordinator as well. {quote} and {quote} In general, I think virtual views should work. However, here are some special cases we need to handle, too: #1 Views and virtual views - There needs to be a map of associations between views and virtual views, not sure what the keys will be. Maybe a BiMap... Yes. The correlation could also me maintained on ViewID (and possibly a corresponding VVID) #2 Broadcasts of CACHE_STARTED / CACHE_STOPPED - These (*instead* of view changes) are the notifications which trigger rebalancing - Essentially what Manik discussed below Yes, but these could be triggered by REAL ViewChanges. E.g., nodes crash, and existing nodes get a ViewChange. The CacheManager should then dispatch a VirtualViewChange to all relevant caches (which are affected by this ViewChange; not others) #3 Where are the consisten hash functions located ? - We probably cannot have CHs per DistributionManager anymore, but per cache (?) The CHs are not based on views, but on virtual views DistributionManager is per-cache anyway so the ConsistentHash can still be owned by the DistributionManager. #4 Handling of left members - When we get a view change excluding node P, everybody emits *local* notifications CACHE_STOPPED for all caches associated with P. This is where we need the hashmap of #1 Yes. See my comment on #2 above. #4.1 - As an alternative to #4, we could have only the *coordinator* broadcast CACHE_STOPPED events for P. This might eliminate the need for #5... Not really, since the coordinator may not know of certain cache instances. I think the CACHE_STOPPED notification should be delivered locally, once the CacheManager detects a ViewChange. #5 Handling of new members - When a new member starts, will it need the state (of #1) copied to it ? If we do #4, then probably yes. Not nice though... This becomes tricky. We have a few options (none of which are particularly palatable ...) A. On Cache.start(), a cache broadcasts CACHE_START and in *response*, gets the Addresses of nodes which have the same named cache running (or null if not). And use this + a real View to build a VirtualView. B. On Cache.start(), broadcast CACHE_START and the virtual coordinator for that cache sends a VirtualView back One problem I foresee with both of these is maintaining consistent order of this Virtual View (esp with option A.). Any other potential solutions? #6 Handling of MergeViews - We may need to reconcile (merge) the state of #1, in order to trigger the correct rebalancing task(s) - Maybe the new coordinator could broadcast (all or just diffs) CACHE_STARTED/STOPPED notifications again... The recipients would not do anything if they already have a cache On merge, the way we'd reconcile would depend on what we do for #5 above. #6.1 - As an alternative to #6, the new coordinator after a merge could broadcast a CURRENT_STATE message, which contains a caches and associated views... Right, but again any given coordinator may not know about all named caches. {quote}

...

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Manik Surtani (JIRA)

4:17 a.m.

New subject: [JBoss JIRA] Updated: (ISPN-658) Asymmetric clusters should be supported

[ https://issues.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.s... ] Manik Surtani updated ISPN-658: ------------------------------- Labels: experimental (was: ) Fix Version/s: 6.0.0.Final (was: 5.0.0.BETA1) (was: 5.0.0.Final) Priority: Optional (was: Blocker) Complexity: NP Complete (was: High)

...

Asymmetric clusters should be supported --------------------------------------- Key: ISPN-658 URL: https://issues.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Feature Request Components: Distributed Cache, RPC, State transfer Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Manik Surtani Priority: Optional Labels: experimental Fix For: 6.0.0.Final Note that this would affect both distributed and replicated cache modes. Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

yangju (JIRA)

Tuesday, 26 April Tue, 26 Apr

11:27 a.m.

New subject: [JBoss JIRA] Commented: (ISPN-658) Asymmetric clusters should be supported

[ https://issues.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.s... ] yangju commented on ISPN-658: ----------------------------- Does this mean that programmatically creation of cache won't work, because the new cache can be created from any node? If this is true, I guess this is a show stopper for us, as we rely on dynamically creating many distributed caches at runtime. Predefined named cache has little value for us. Is there any work around for programmatically creation of caches (in jboss AS 6)?

...

Asymmetric clusters should be supported --------------------------------------- Key: ISPN-658 URL: https://issues.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Feature Request Components: Distributed Cache, RPC, State transfer Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Manik Surtani Priority: Optional Labels: experimental Fix For: 6.0.0.Final Note that this would affect both distributed and replicated cache modes. Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Prashanth ga (JIRA)

Thursday, 2 June Thu, 2 Jun

7:23 a.m.

New subject: [JBoss JIRA] Commented: (ISPN-658) Asymmetric clusters should be supported

[ https://issues.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.s... ] Prashanth ga commented on ISPN-658: ----------------------------------- Agree with yangju as we are very much keen on programmatically creating the cache definitions. Infact, we dont keep anything on the Infinispan configuration files and all the named caches would be created during the startup of the cache also at the same on on all the nodes N1, N2 and N3. I guess the startup is working for us. Now coming to the runtime changes to the configurations on the different nodes, we are planning to do the same programmatic config changes simulataneously on all the nodes to avoid any assymmetry. For e.g., assume N1, N2, N3 initially had C1, C2 and during runtime C3 is added simulataneouly (almost) on all the nodes. Whats you opinion on this approach. But there can be some small differnce in the timing of this parallel config change. Does this cause any impact and resulting in the same issues as mentioned above in the thread ?

...

Asymmetric clusters should be supported --------------------------------------- Key: ISPN-658 URL: https://issues.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Feature Request Components: Distributed Cache, RPC, State transfer Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Manik Surtani Priority: Optional Labels: experimental Fix For: 6.0.0.Final Note that this would affect both distributed and replicated cache modes. Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Galder Zamarreño (JIRA)

Monday, 6 June Mon, 6 Jun

5:53 a.m.

New subject: [JBoss JIRA] Commented: (ISPN-658) Asymmetric clusters should be supported

[ https://issues.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.s... ] Galder Zamarreño commented on ISPN-658: --------------------------------------- Creating C1,C2 and C3 at runtime doesn't cause issues. What causes issues is if you start allowing invocations for C1 before C1 has been created on all nodes. Same for C2 and C3, assuming that these are clustered caches (repl, dist or invalidation). If you can somehow guarantee that requests won't come into C1 until all nodes have C1 created, then you should be fine.

...

Asymmetric clusters should be supported --------------------------------------- Key: ISPN-658 URL: https://issues.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Feature Request Components: Distributed Cache, RPC, State transfer Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Manik Surtani Priority: Optional Labels: experimental Fix For: 6.0.0.Final Note that this would affect both distributed and replicated cache modes. Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Galder Zamarreño (JIRA)

5:55 a.m.

New subject: [JBoss JIRA] Commented: (ISPN-658) Asymmetric clusters should be supported

...

Asymmetric clusters should be supported --------------------------------------- Key: ISPN-658 URL: https://issues.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Feature Request Components: Distributed Cache, RPC, State transfer Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Manik Surtani Priority: Optional Labels: experimental Fix For: 6.0.0.Final Note that this would affect both distributed and replicated cache modes. Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

SBS JIRA Integration (JIRA)

Friday, 10 June Fri, 10 Jun

5:52 a.m.

New subject: [JBoss JIRA] Updated: (ISPN-658) Asymmetric clusters should be supported

[ https://issues.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.s... ] SBS JIRA Integration updated ISPN-658: -------------------------------------- JBoss Forum Reference: http://lists.jboss.org/pipermail/infinispan-dev/2010-September/006414.html, http://community.jboss.org/message/602051#602051 (was: http://lists.jboss.org/pipermail/infinispan-dev/2010-September/006414.html)

...

Asymmetric clusters should be supported --------------------------------------- Key: ISPN-658 URL: https://issues.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Feature Request Components: Distributed Cache, RPC, State transfer Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Manik Surtani Priority: Optional Labels: experimental Fix For: 6.0.0.Final Note that this would affect both distributed and replicated cache modes. Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Sanne Grinovero (JIRA)

Wednesday, 6 July Wed, 6 Jul

9:55 a.m.

New subject: [JBoss JIRA] Commented: (ISPN-658) Asymmetric clusters should be supported

[ https://issues.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.s... ] Sanne Grinovero commented on ISPN-658: -------------------------------------- There's an additional use case for this: currently people needing custom components (like Externalizers, TwoWayKey2StringMapper, etc..) have to deploy these jars and all possible dependencies in the commons lib classpath of the application server, or they can't be used in the CacheManager started by the appserver. Especially painfull as people can't add new configurations to the running CacheManager because of this issue, so currently the only viable alternative to use these extensions is to start your own new CacheManager in each application. see also http://community.jboss.org/message/613731#613731

...

Asymmetric clusters should be supported --------------------------------------- Key: ISPN-658 URL: https://issues.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Feature Request Components: Distributed Cache, RPC, State transfer Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Manik Surtani Priority: Optional Labels: experimental Fix For: 6.0.0.Final Note that this would affect both distributed and replicated cache modes. Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

SBS JIRA Integration (JIRA)

Thursday, 7 July Thu, 7 Jul

2:44 a.m.

New subject: [JBoss JIRA] Updated: (ISPN-658) Asymmetric clusters should be supported

[ https://issues.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.s... ] SBS JIRA Integration updated ISPN-658: -------------------------------------- Forum Reference: http://lists.jboss.org/pipermail/infinispan-dev/2010-September/006414.html, http://community.jboss.org/message/612561#612561, http://community.jboss.org/message/602051#602051 (was: http://lists.jboss.org/pipermail/infinispan-dev/2010-September/006414.html, http://community.jboss.org/message/602051#602051)

...

Asymmetric clusters should be supported --------------------------------------- Key: ISPN-658 URL: https://issues.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Feature Request Components: Distributed Cache, RPC, State transfer Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Manik Surtani Priority: Optional Labels: experimental Fix For: 6.0.0.Final Note that this would affect both distributed and replicated cache modes. Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Dan Berindei (JIRA)

Monday, 26 September Mon, 26 Sep

7:34 a.m.

New subject: [JBoss JIRA] Assigned: (ISPN-658) Asymmetric clusters should be supported

[ https://issues.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.s... ] Dan Berindei reassigned ISPN-658: --------------------------------- Assignee: Dan Berindei (was: Manik Surtani)

...

Asymmetric clusters should be supported --------------------------------------- Key: ISPN-658 URL: https://issues.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Feature Request Components: Distributed Cache, RPC, State transfer Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Dan Berindei Priority: Optional Labels: experimental Fix For: 6.0.0.FINAL Note that this would affect both distributed and replicated cache modes. Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Dan Berindei (JIRA)

9:10 a.m.

New subject: [JBoss JIRA] Updated: (ISPN-658) Asymmetric clusters should be supported

[ https://issues.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.s... ] Dan Berindei updated ISPN-658: ------------------------------ Fix Version/s: 5.1.0.BETA1 (was: 6.0.0.FINAL) Forum Reference: http://lists.jboss.org/pipermail/infinispan-dev/2010-September/006414.html, http://community.jboss.org/message/612561#612561, http://community.jboss.org/message/602051#602051 (was: http://lists.jboss.org/pipermail/infinispan-dev/2010-September/006414.html, http://community.jboss.org/message/612561#612561, http://community.jboss.org/message/602051#602051) I started looking into implementing asymmetric caches for 5.1, here's the design document: http://community.jboss.org/wiki/AsymmetricCachesAndManualRehashingDesign

...

Asymmetric clusters should be supported --------------------------------------- Key: ISPN-658 URL: https://issues.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Feature Request Components: Distributed Cache, RPC, State transfer Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Dan Berindei Priority: Optional Labels: experimental Fix For: 5.1.0.BETA1 Note that this would affect both distributed and replicated cache modes. Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Galder Zamarreño (Updated) (JIRA)

Tuesday, 4 October Tue, 4 Oct

7:34 a.m.

New subject: [JBoss JIRA] (ISPN-658) Asymmetric clusters should be supported

[ https://issues.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.s... ] Galder Zamarreño updated ISPN-658: ---------------------------------- Fix Version/s: 5.1.0.BETA2 (was: 5.1.0.BETA1)

...

Asymmetric clusters should be supported --------------------------------------- Key: ISPN-658 URL: https://issues.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Feature Request Components: Distributed Cache, RPC, State transfer Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Dan Berindei Priority: Optional Labels: experimental Fix For: 5.1.0.BETA2 Note that this would affect both distributed and replicated cache modes. Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters.

-- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira

Dan Berindei (Updated) (JIRA)

Wednesday, 12 October Wed, 12 Oct

11:18 a.m.

New subject: [JBoss JIRA] (ISPN-658) Asymmetric clusters should be supported

[ https://issues.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.s... ] Dan Berindei updated ISPN-658: ------------------------------ Status: Pull Request Sent (was: Open) Git Pull Request: https://github.com/infinispan/infinispan/pull/568 I implemented a cache view abstraction on top of the JGroups cluster view to support asymmetric clusters. The cache view design is described at http://community.jboss.org/wiki/AsymmetricCachesAndManualRehashingDesign As asymmetric clusters are now supported I removed the warning message in DefaultCacheManager.getCache()

...

Asymmetric clusters should be supported --------------------------------------- Key: ISPN-658 URL: https://issues.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Feature Request Components: Distributed Cache, RPC, State transfer Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Dan Berindei Priority: Optional Labels: experimental Fix For: 5.1.0.BETA2 Note that this would affect both distributed and replicated cache modes. Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters.

Galder Zamarreño (Updated) (JIRA)

Tuesday, 18 October Tue, 18 Oct

4:13 a.m.

New subject: [JBoss JIRA] (ISPN-658) Asymmetric clusters should be supported

[ https://issues.jboss.org/browse/ISPN-658?page=com.atlassian.jira.plugin.s... ] Galder Zamarreño updated ISPN-658: ---------------------------------- Status: Resolved (was: Pull Request Sent) Labels: (was: experimental) Resolution: Done

...

Asymmetric clusters should be supported --------------------------------------- Key: ISPN-658 URL: https://issues.jboss.org/browse/ISPN-658 Project: Infinispan Issue Type: Feature Request Components: Distributed Cache, RPC, State transfer Affects Versions: 4.0.0.Final, 4.1.0.Final, 4.2.0.ALPHA2 Reporter: Paul Ferraro Assignee: Dan Berindei Priority: Optional Fix For: 5.1.0.BETA2 Note that this would affect both distributed and replicated cache modes. Currently clusters are always symmetric. E.g., assume 5 nodes, N1 ~ N5. Infinispan assumes that each node has the same set of named caches (e.g., C1 ~ C5) deployed on each node, and is designed accordingly. This causes problems for applications where caches are defined and started lazily on each node. For example: Considering a cache manager with 2 caches in DIST mode (C1 and C2) deployed on 2 nodes (N1 and N2). Currently, the DistributionManager does not properly handle the following scenarios: 1. Stop C1 on N1. This ought to trigger a rehash for the C1 cache. Currently, rehashing is only triggered via view change. Failure to rehash on stopping of a cache can inadvertently cause data loss, if all backups of a given cache entry have stopped. 2. A new DIST mode cache, C3, is started on N2. If N1 is the coordinator, the join request sent to N1 will get stuck in an infinite loop, since the cache manager on N1 does not contain a C3 cache. 3. Less critically, a new node, N3 is started. It does not yet have a C1 or C2 cache, though it's cache manager is started. This prematurely triggers a rehash of C1 and C2, even though there are no new caches instances to consider. To solve this, one proposal would involve: 1. Providing a "named cache coordinator" for each distributed named cache, which would coordinate rehashes. This may or may not be the JGroups coordinator, and named caches may or may not share the same named cache coordinator. 2. The DistManager would maintain a list of available members, which would be a subset of all of the members available in the RpcManager. 3. A concept of a LEAVE message, broadcast when a cache stops. This would serve the same effect as a view change with a member removed, with the exception of affecting only a single named cache. With the above 3 in place, a proper solution could be devised to handle asymmetric distributed clusters.

5098

days inactive

5491

days old

infinispan-issues@lists.jboss.org

Manage subscription

24 comments

11 participants

tags (0)

participants (11)

Dan Berindei (JIRA)
Dan Berindei (Updated) (JIRA)
Galder Zamarreño (JIRA)
Galder Zamarreño (Updated) (JIRA)
Manik Surtani (JIRA)
Mircea Markus (JIRA)
Paul Ferraro (JIRA)
Prashanth ga (JIRA)
Sanne Grinovero (JIRA)
SBS JIRA Integration (JIRA)
yangju (JIRA)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[JBoss JIRA] Created: (ISPN-658) DistributionManager not considerate of cache state changes