[JBoss JIRA] (ISPN-2566) TopologyAwareConsistentHashFactory rebalance doesn't redistribute data properly
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-2566?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-2566:
-----------------------------------------------
Martin Gencur <mgencur(a)redhat.com> made a comment on [bug 868832|https://bugzilla.redhat.com/show_bug.cgi?id=868832]
Closing the bug as this was a configuration issue.
> TopologyAwareConsistentHashFactory rebalance doesn't redistribute data properly
> -------------------------------------------------------------------------------
>
> Key: ISPN-2566
> URL: https://issues.jboss.org/browse/ISPN-2566
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.0.Beta4
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 5.2.0.Beta6, 5.2.0.Final
>
>
> Say we have a topology-aware cache with numOwners = 2 and two nodes: A(machine=m1) and B(machine=m1). When node C(machine=m2) joins, it should own every key, either as a primary or as a backup owner. This doesn't happen, node C owns just as many segments as nodes A and B.
> Example:
> {noformat}
> 19:21:17,295 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (undefined) Updating cache topology topology for rebalance:
> CacheTopology{id=3, currentCH=DefaultConsistentHash{numSegments=80, numOwners=2,
> members=[node0/default(primary), node1/default(primary)],
> owners={0: 0 1, 1: 0 1, 2: 0 1, 3: 0 1, 4: 0 1, 5: 0 1, 6: 0 1, 7: 0 1,
> 8: 0 1, 9: 0 1, 10: 0 1, 11: 0 1, 12: 0 1, 13: 0 1, 14: 0 1, 15: 0 1,
> 16: 0 1, 17: 0 1, 18: 0 1, 19: 0 1, 20: 0 1, 21: 0 1, 22: 0 1, 23: 0 1,
> 24: 0 1, 25: 0 1, 26: 0 1, 27: 0 1, 28: 0 1, 29: 0 1, 30: 0 1, 31: 0 1,
> 32: 0 1, 33: 0 1, 34: 0 1, 35: 0 1, 36: 0 1, 37: 0 1, 38: 0 1, 39: 0 1,
> 40: 1 0, 41: 1 0, 42: 1 0, 43: 1 0, 44: 1 0, 45: 1 0, 46: 1 0, 47: 1 0,
> 48: 1 0, 49: 1 0, 50: 1 0, 51: 1 0, 52: 1 0, 53: 1 0, 54: 1 0, 55: 1 0,
> 56: 1 0, 57: 1 0, 58: 1 0, 59: 1 0, 60: 1 0, 61: 1 0, 62: 1 0, 63: 1 0,
> 64: 1 0, 65: 1 0, 66: 1 0, 67: 1 0, 68: 1 0, 69: 1 0, 70: 1 0, 71: 1 0,
> 72: 1 0, 73: 1 0, 74: 1 0, 75: 1 0, 76: 1 0, 77: 1 0, 78: 1 0, 79: 1 0},
> pendingCH=DefaultConsistentHash{numSegments=80, numOwners=2,
> members=[node0/default(primary), node1/default(primary), node2/default(secondary)],
> owners={0: 0 1, 1: 0 1, 2: 0 1, 3: 0 1, 4: 0 1, 5: 0 1, 6: 0 1, 7: 0 1,
> 8: 0 1, 9: 0 1, 10: 0 1, 11: 0 1, 12: 0 1, 13: 0 1, 14: 0 1, 15: 0 1,
> 16: 0 1, 17: 0 1, 18: 0 1, 19: 0 1, 20: 0 1, 21: 0 1, 22: 0 1, 23: 0 1,
> 24: 0 1, 25: 0 1, 26: 0 1, 27: 2 0, 28: 2 0, 29: 2 0, 30: 2 0, 31: 2 0,
> 32: 2 0, 33: 2 0, 34: 2 0, 35: 2 0, 36: 2 0, 37: 2 0, 38: 2 0, 39: 2 0,
> 40: 1 0, 41: 1 0, 42: 1 0, 43: 1 0, 44: 1 0, 45: 1 0, 46: 1 0, 47: 1 0,
> 48: 1 0, 49: 1 0, 50: 1 0, 51: 1 0, 52: 1 0, 53: 1 0, 54: 1 0, 55: 1 0,
> 56: 1 0, 57: 1 0, 58: 1 0, 59: 1 0, 60: 1 0, 61: 1 0, 62: 1 0, 63: 1 0,
> 64: 1 0, 65: 1 0, 66: 1 0, 67: 2 1, 68: 2 1, 69: 2 1, 70: 2 1, 71: 2 1,
> 72: 2 1, 73: 2 1, 74: 2 1, 75: 2 1, 76: 2 1, 77: 2 1, 78: 2 1, 79: 2 1}}
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years
[JBoss JIRA] (ISPN-2566) TopologyAwareConsistentHashFactory rebalance doesn't redistribute data properly
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-2566?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-2566:
-----------------------------------------------
Martin Gencur <mgencur(a)redhat.com> changed the Status of [bug 868832|https://bugzilla.redhat.com/show_bug.cgi?id=868832] from ASSIGNED to CLOSED
> TopologyAwareConsistentHashFactory rebalance doesn't redistribute data properly
> -------------------------------------------------------------------------------
>
> Key: ISPN-2566
> URL: https://issues.jboss.org/browse/ISPN-2566
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.0.Beta4
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 5.2.0.Beta6, 5.2.0.Final
>
>
> Say we have a topology-aware cache with numOwners = 2 and two nodes: A(machine=m1) and B(machine=m1). When node C(machine=m2) joins, it should own every key, either as a primary or as a backup owner. This doesn't happen, node C owns just as many segments as nodes A and B.
> Example:
> {noformat}
> 19:21:17,295 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (undefined) Updating cache topology topology for rebalance:
> CacheTopology{id=3, currentCH=DefaultConsistentHash{numSegments=80, numOwners=2,
> members=[node0/default(primary), node1/default(primary)],
> owners={0: 0 1, 1: 0 1, 2: 0 1, 3: 0 1, 4: 0 1, 5: 0 1, 6: 0 1, 7: 0 1,
> 8: 0 1, 9: 0 1, 10: 0 1, 11: 0 1, 12: 0 1, 13: 0 1, 14: 0 1, 15: 0 1,
> 16: 0 1, 17: 0 1, 18: 0 1, 19: 0 1, 20: 0 1, 21: 0 1, 22: 0 1, 23: 0 1,
> 24: 0 1, 25: 0 1, 26: 0 1, 27: 0 1, 28: 0 1, 29: 0 1, 30: 0 1, 31: 0 1,
> 32: 0 1, 33: 0 1, 34: 0 1, 35: 0 1, 36: 0 1, 37: 0 1, 38: 0 1, 39: 0 1,
> 40: 1 0, 41: 1 0, 42: 1 0, 43: 1 0, 44: 1 0, 45: 1 0, 46: 1 0, 47: 1 0,
> 48: 1 0, 49: 1 0, 50: 1 0, 51: 1 0, 52: 1 0, 53: 1 0, 54: 1 0, 55: 1 0,
> 56: 1 0, 57: 1 0, 58: 1 0, 59: 1 0, 60: 1 0, 61: 1 0, 62: 1 0, 63: 1 0,
> 64: 1 0, 65: 1 0, 66: 1 0, 67: 1 0, 68: 1 0, 69: 1 0, 70: 1 0, 71: 1 0,
> 72: 1 0, 73: 1 0, 74: 1 0, 75: 1 0, 76: 1 0, 77: 1 0, 78: 1 0, 79: 1 0},
> pendingCH=DefaultConsistentHash{numSegments=80, numOwners=2,
> members=[node0/default(primary), node1/default(primary), node2/default(secondary)],
> owners={0: 0 1, 1: 0 1, 2: 0 1, 3: 0 1, 4: 0 1, 5: 0 1, 6: 0 1, 7: 0 1,
> 8: 0 1, 9: 0 1, 10: 0 1, 11: 0 1, 12: 0 1, 13: 0 1, 14: 0 1, 15: 0 1,
> 16: 0 1, 17: 0 1, 18: 0 1, 19: 0 1, 20: 0 1, 21: 0 1, 22: 0 1, 23: 0 1,
> 24: 0 1, 25: 0 1, 26: 0 1, 27: 2 0, 28: 2 0, 29: 2 0, 30: 2 0, 31: 2 0,
> 32: 2 0, 33: 2 0, 34: 2 0, 35: 2 0, 36: 2 0, 37: 2 0, 38: 2 0, 39: 2 0,
> 40: 1 0, 41: 1 0, 42: 1 0, 43: 1 0, 44: 1 0, 45: 1 0, 46: 1 0, 47: 1 0,
> 48: 1 0, 49: 1 0, 50: 1 0, 51: 1 0, 52: 1 0, 53: 1 0, 54: 1 0, 55: 1 0,
> 56: 1 0, 57: 1 0, 58: 1 0, 59: 1 0, 60: 1 0, 61: 1 0, 62: 1 0, 63: 1 0,
> 64: 1 0, 65: 1 0, 66: 1 0, 67: 2 1, 68: 2 1, 69: 2 1, 70: 2 1, 71: 2 1,
> 72: 2 1, 73: 2 1, 74: 2 1, 75: 2 1, 76: 2 1, 77: 2 1, 78: 2 1, 79: 2 1}}
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years
[JBoss JIRA] (ISPN-2738) Joining node ignored by hotrod clients in REPL clustering mode
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-2738?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño commented on ISPN-2738:
----------------------------------------
The problem does indeed look related to ISPN-2632 and I think it's linked to removal of coordination between the address cache and the topology id update. The problem seems to be that the Hot Rod server sends a new topology id before the cache has been updated, so when a new added, it says: here's the new topology ID but the cache has not yet been updated. The client now has a new id but the members are the same. When the cache is eventually updated with the new node, the topology ID is not increased, so clients will never talk to it. Here's a snippet from node01.log that proofs what I say:
{code}12:43:03,137 TRACE [org.infinispan.server.hotrod.HotRodDecoder] (HotRodServerWorker-119) Decoded header HotRodHeader{op=GetRequest, version=12,
messageId=1974, cacheName=testCache, flag=0, clientIntelligence=3, topologyId=8}
...
12:43:03,229 TRACE [org.infinispan.server.hotrod.HotRodDecoder] (HotRodServerWorker-107) Decoded header HotRodHeader{op=GetRequest, version=12,
messageId=2626, cacheName=testCache, flag=0, clientIntelligence=3, topologyId=9}
...
12:43:03,753 TRACE [org.infinispan.container.entries.ReadCommittedEntry] (OOB-197,null) Updating entry (key=node02/default removed=false valid=true
changed=true created=true loaded=false value=172.18.1.3:11222]
...
node01.log:86873:12:43:03,780 TRACE [org.infinispan.server.hotrod.HotRodDecoder] (HotRodServerWorker-119) Decoded header HotRodHeader{op=PutRequest,
version=12, messageId=1992, cacheName=testCache, flag=6, clientIntelligence=3, topologyId=9}{code}
@Dan, this is precisely the reason why the interceptor in HotRodServer was created. To coordinate and make sure that the new topology ID is not sent before the cache has been updated. This is crucial is part of the code I added to deal with resilience testing in previous testing round.
> Joining node ignored by hotrod clients in REPL clustering mode
> --------------------------------------------------------------
>
> Key: ISPN-2738
> URL: https://issues.jboss.org/browse/ISPN-2738
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 5.2.0.CR2
> Reporter: Michal Linhard
> Assignee: Galder Zamarreño
> Fix For: 5.2.0.Final
>
>
> resilience 4-3-4 REPL mode for JDG 6.1.0.ER9 (infinispan 5.2.0.CR2):
> https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/view/EDG-REPOR...
> after rejoin of killed node the load is not redistributed to all three nodes again
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years