[JBoss JIRA] (ISPN-9014) Conflict resolution consistent hash should not include nodes that are not in the merged cluster view
by M S (JIRA)
[ https://issues.jboss.org/browse/ISPN-9014?page=com.atlassian.jira.plugin.... ]
M S edited comment on ISPN-9014 at 7/17/18 6:37 AM:
----------------------------------------------------
Hi.
In one of our environments based on infinispan version 9.3.0 where we have 15 nodes in cloud we got to the point where similar issue occurs on node 22 (Cache zones encountered exception whilst trying to resolve conflicts on merge: java.util.concurrent.CompletionException: org.infinispan.commons.CacheException).
We reproduced it while having 15 nodes in cloud, and then unplugging and plugging node11 back.
I'm attaching infinispan logs from the failed controllers and our cluster config.
Please have a look if the issue is really the same one and the fix from beta version is not sufficient, or new issue must be created.
Thx
was (Author: staho):
Hi.
In one of our environments based on infinispan version 9.1.3 where we have 15 nodes in cloud we got to the point where similar issue occurs on node 22 (Cache zones encountered exception whilst trying to resolve conflicts on merge: java.util.concurrent.CompletionException: org.infinispan.commons.CacheException).
We reproduced it while having 15 nodes in cloud, and then unplugging and plugging node11 back.
I'm attaching infinispan logs from the failed controllers and our cluster config.
Please have a look if the issue is really the same one and the fix from beta version is not sufficient, or new issue must be created.
Thx
> Conflict resolution consistent hash should not include nodes that are not in the merged cluster view
> ----------------------------------------------------------------------------------------------------
>
> Key: ISPN-9014
> URL: https://issues.jboss.org/browse/ISPN-9014
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.2.1.Final
> Reporter: Dan Berindei
> Assignee: Ryan Emerson
> Labels: testsuite_stability
> Fix For: 9.3.0.Beta1
>
> Attachments: 15nodes-merge-issue.zip
>
>
> Conflict resolution fails when trying to read entries from nodes that are not in the JGroups cluster view, and this causes random failures in {{ClusterListenerDistTest.testClusterListenerNodeGoesDown random}}.
> # NodeA leaves the cluster, but still manages to start a rebalance with [NodeB, NodeC] (topology id 11)
> # One node doesn't receive topology 11, so NodeB becomes coordinator and starts conflict resolution with all 3 nodes in the pending CH (topology 12)
> # Conflict resolution fails because NodeB and NodeC can't read the entries from NodeA
> # {{onPartitionMerge}} also queued a rebalance, so NodeB starts a new rebalance without canceling the previous rebalance first (topology 13)
> # Because there is no reset topology, NodeB thinks it already requested all the segments for NodeC's in topology 11, so it doesn't add any new inbound transfer
> # NodeC's state response arrives on NodeB with topology 11, NodeB discards it, and state transfer hangs.
> {noformat}
> 14:52:52,426 INFO (testng-Test:[cluster-listener]) [CLUSTER] ISPN000310: Starting cluster-wide rebalance for cache cluster-listener, topology CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}
> 14:52:52,479 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [ClusterCacheStatus] Recovered 2 partition(s) for cache cluster-listener: [CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}, CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeA-57087: 78+79, Test-NodeB-45145: 90+88, Test-NodeC-20831: 88+89]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeA-57087, Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[48e3ddc7-ee97-42d8-a57d-283e8d28ec25, 301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}]
> 14:52:52,484 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache cluster-listener, topology = CacheTopology{id=12, phase=CONFLICT_RESOLUTION, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]}, unionCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]}, actualMembers=[Test-NodeB-45145, Test-NodeA-57087, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, 48e3ddc7-ee97-42d8-a57d-283e8d28ec25, ae95a681-2ba1-4e04-bfe5-05aa59425149]}, availability mode = null
> 14:52:52,488 ERROR (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [DefaultConflictManager] Cache cluster-listener encountered exception whilst trying to resolve conflicts on merge: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node Test-NodeA-57087 was suspected
> 14:52:52,532 INFO (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [CLUSTER] ISPN000310: Starting cluster-wide rebalance for cache cluster-listener, topology CacheTopology{id=13, phase=READ_OLD_WRITE_ALL, rebalanceId=6, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}
> 14:52:52,577 TRACE (stateTransferExecutor-thread-Test-NodeB-p23774-t3:[StateRequest-cluster-listener]) [StateConsumerImpl] Waiting for inbound transfer to finish: InboundTransferTask{segments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101 104-107 113 116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245 249-254}, finishedSegments={}, unfinishedSegments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101 104-107 113 116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245 249-254}, source=Test-NodeC-20831, isCancelled=false, completionFuture=java.util.concurrent.CompletableFuture@110952e8[Not completed], topologyId=11, timeout=240000, cacheName=cluster-listener}
> 14:52:52,584 DEBUG (remote-thread-Test-NodeB-p23771-t4:[cluster-listener]) [StateConsumerImpl] Discarding state response with old topology id 11 for cache cluster-listener, state transfer request topology was true
> 14:52:52,584 TRACE (remote-thread-Test-NodeB-p23771-t4:[]) [JGroupsTransport] Test-NodeB-45145 sending response for request 13 to Test-NodeC-20831: null
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 8 months
[JBoss JIRA] (ISPN-9014) Conflict resolution consistent hash should not include nodes that are not in the merged cluster view
by Ryan Emerson (JIRA)
[ https://issues.jboss.org/browse/ISPN-9014?page=com.atlassian.jira.plugin.... ]
Ryan Emerson commented on ISPN-9014:
------------------------------------
[~staho] I recommend upgrading to 9.3.1.Final, there were a lot of changes/fixes related to partition handling that went into Infinispan 9.3.
> Conflict resolution consistent hash should not include nodes that are not in the merged cluster view
> ----------------------------------------------------------------------------------------------------
>
> Key: ISPN-9014
> URL: https://issues.jboss.org/browse/ISPN-9014
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.2.1.Final
> Reporter: Dan Berindei
> Assignee: Ryan Emerson
> Labels: testsuite_stability
> Fix For: 9.3.0.Beta1
>
> Attachments: 15nodes-merge-issue.zip
>
>
> Conflict resolution fails when trying to read entries from nodes that are not in the JGroups cluster view, and this causes random failures in {{ClusterListenerDistTest.testClusterListenerNodeGoesDown random}}.
> # NodeA leaves the cluster, but still manages to start a rebalance with [NodeB, NodeC] (topology id 11)
> # One node doesn't receive topology 11, so NodeB becomes coordinator and starts conflict resolution with all 3 nodes in the pending CH (topology 12)
> # Conflict resolution fails because NodeB and NodeC can't read the entries from NodeA
> # {{onPartitionMerge}} also queued a rebalance, so NodeB starts a new rebalance without canceling the previous rebalance first (topology 13)
> # Because there is no reset topology, NodeB thinks it already requested all the segments for NodeC's in topology 11, so it doesn't add any new inbound transfer
> # NodeC's state response arrives on NodeB with topology 11, NodeB discards it, and state transfer hangs.
> {noformat}
> 14:52:52,426 INFO (testng-Test:[cluster-listener]) [CLUSTER] ISPN000310: Starting cluster-wide rebalance for cache cluster-listener, topology CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}
> 14:52:52,479 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [ClusterCacheStatus] Recovered 2 partition(s) for cache cluster-listener: [CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}, CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeA-57087: 78+79, Test-NodeB-45145: 90+88, Test-NodeC-20831: 88+89]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeA-57087, Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[48e3ddc7-ee97-42d8-a57d-283e8d28ec25, 301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}]
> 14:52:52,484 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache cluster-listener, topology = CacheTopology{id=12, phase=CONFLICT_RESOLUTION, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]}, unionCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]}, actualMembers=[Test-NodeB-45145, Test-NodeA-57087, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, 48e3ddc7-ee97-42d8-a57d-283e8d28ec25, ae95a681-2ba1-4e04-bfe5-05aa59425149]}, availability mode = null
> 14:52:52,488 ERROR (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [DefaultConflictManager] Cache cluster-listener encountered exception whilst trying to resolve conflicts on merge: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node Test-NodeA-57087 was suspected
> 14:52:52,532 INFO (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [CLUSTER] ISPN000310: Starting cluster-wide rebalance for cache cluster-listener, topology CacheTopology{id=13, phase=READ_OLD_WRITE_ALL, rebalanceId=6, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}
> 14:52:52,577 TRACE (stateTransferExecutor-thread-Test-NodeB-p23774-t3:[StateRequest-cluster-listener]) [StateConsumerImpl] Waiting for inbound transfer to finish: InboundTransferTask{segments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101 104-107 113 116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245 249-254}, finishedSegments={}, unfinishedSegments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101 104-107 113 116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245 249-254}, source=Test-NodeC-20831, isCancelled=false, completionFuture=java.util.concurrent.CompletableFuture@110952e8[Not completed], topologyId=11, timeout=240000, cacheName=cluster-listener}
> 14:52:52,584 DEBUG (remote-thread-Test-NodeB-p23771-t4:[cluster-listener]) [StateConsumerImpl] Discarding state response with old topology id 11 for cache cluster-listener, state transfer request topology was true
> 14:52:52,584 TRACE (remote-thread-Test-NodeB-p23771-t4:[]) [JGroupsTransport] Test-NodeB-45145 sending response for request 13 to Test-NodeC-20831: null
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 8 months
[JBoss JIRA] (ISPN-9014) Conflict resolution consistent hash should not include nodes that are not in the merged cluster view
by M S (JIRA)
[ https://issues.jboss.org/browse/ISPN-9014?page=com.atlassian.jira.plugin.... ]
M S commented on ISPN-9014:
---------------------------
Hi.
In one of our environments based on infinispan version 9.1.3 where we have 15 nodes in cloud we got to the point where similar issue occurs on node 22 (Cache zones encountered exception whilst trying to resolve conflicts on merge: java.util.concurrent.CompletionException: org.infinispan.commons.CacheException).
We reproduced it while having 15 nodes in cloud, and then unplugging and plugging node11 back.
I'm attaching infinispan logs from the failed controllers and our cluster config.
Please have a look if the issue is really the same one and the fix from beta version is not sufficient, or new issue must be created.
Thx
> Conflict resolution consistent hash should not include nodes that are not in the merged cluster view
> ----------------------------------------------------------------------------------------------------
>
> Key: ISPN-9014
> URL: https://issues.jboss.org/browse/ISPN-9014
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.2.1.Final
> Reporter: Dan Berindei
> Assignee: Ryan Emerson
> Labels: testsuite_stability
> Fix For: 9.3.0.Beta1
>
> Attachments: 15nodes-merge-issue.zip
>
>
> Conflict resolution fails when trying to read entries from nodes that are not in the JGroups cluster view, and this causes random failures in {{ClusterListenerDistTest.testClusterListenerNodeGoesDown random}}.
> # NodeA leaves the cluster, but still manages to start a rebalance with [NodeB, NodeC] (topology id 11)
> # One node doesn't receive topology 11, so NodeB becomes coordinator and starts conflict resolution with all 3 nodes in the pending CH (topology 12)
> # Conflict resolution fails because NodeB and NodeC can't read the entries from NodeA
> # {{onPartitionMerge}} also queued a rebalance, so NodeB starts a new rebalance without canceling the previous rebalance first (topology 13)
> # Because there is no reset topology, NodeB thinks it already requested all the segments for NodeC's in topology 11, so it doesn't add any new inbound transfer
> # NodeC's state response arrives on NodeB with topology 11, NodeB discards it, and state transfer hangs.
> {noformat}
> 14:52:52,426 INFO (testng-Test:[cluster-listener]) [CLUSTER] ISPN000310: Starting cluster-wide rebalance for cache cluster-listener, topology CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}
> 14:52:52,479 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [ClusterCacheStatus] Recovered 2 partition(s) for cache cluster-listener: [CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}, CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeA-57087: 78+79, Test-NodeB-45145: 90+88, Test-NodeC-20831: 88+89]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeA-57087, Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[48e3ddc7-ee97-42d8-a57d-283e8d28ec25, 301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}]
> 14:52:52,484 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache cluster-listener, topology = CacheTopology{id=12, phase=CONFLICT_RESOLUTION, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]}, unionCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]}, actualMembers=[Test-NodeB-45145, Test-NodeA-57087, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, 48e3ddc7-ee97-42d8-a57d-283e8d28ec25, ae95a681-2ba1-4e04-bfe5-05aa59425149]}, availability mode = null
> 14:52:52,488 ERROR (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [DefaultConflictManager] Cache cluster-listener encountered exception whilst trying to resolve conflicts on merge: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node Test-NodeA-57087 was suspected
> 14:52:52,532 INFO (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [CLUSTER] ISPN000310: Starting cluster-wide rebalance for cache cluster-listener, topology CacheTopology{id=13, phase=READ_OLD_WRITE_ALL, rebalanceId=6, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}
> 14:52:52,577 TRACE (stateTransferExecutor-thread-Test-NodeB-p23774-t3:[StateRequest-cluster-listener]) [StateConsumerImpl] Waiting for inbound transfer to finish: InboundTransferTask{segments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101 104-107 113 116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245 249-254}, finishedSegments={}, unfinishedSegments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101 104-107 113 116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245 249-254}, source=Test-NodeC-20831, isCancelled=false, completionFuture=java.util.concurrent.CompletableFuture@110952e8[Not completed], topologyId=11, timeout=240000, cacheName=cluster-listener}
> 14:52:52,584 DEBUG (remote-thread-Test-NodeB-p23771-t4:[cluster-listener]) [StateConsumerImpl] Discarding state response with old topology id 11 for cache cluster-listener, state transfer request topology was true
> 14:52:52,584 TRACE (remote-thread-Test-NodeB-p23771-t4:[]) [JGroupsTransport] Test-NodeB-45145 sending response for request 13 to Test-NodeC-20831: null
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 8 months
[JBoss JIRA] (ISPN-9014) Conflict resolution consistent hash should not include nodes that are not in the merged cluster view
by M S (JIRA)
[ https://issues.jboss.org/browse/ISPN-9014?page=com.atlassian.jira.plugin.... ]
M S updated ISPN-9014:
----------------------
Attachment: 15nodes-merge-issue.zip
> Conflict resolution consistent hash should not include nodes that are not in the merged cluster view
> ----------------------------------------------------------------------------------------------------
>
> Key: ISPN-9014
> URL: https://issues.jboss.org/browse/ISPN-9014
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.2.1.Final
> Reporter: Dan Berindei
> Assignee: Ryan Emerson
> Labels: testsuite_stability
> Fix For: 9.3.0.Beta1
>
> Attachments: 15nodes-merge-issue.zip
>
>
> Conflict resolution fails when trying to read entries from nodes that are not in the JGroups cluster view, and this causes random failures in {{ClusterListenerDistTest.testClusterListenerNodeGoesDown random}}.
> # NodeA leaves the cluster, but still manages to start a rebalance with [NodeB, NodeC] (topology id 11)
> # One node doesn't receive topology 11, so NodeB becomes coordinator and starts conflict resolution with all 3 nodes in the pending CH (topology 12)
> # Conflict resolution fails because NodeB and NodeC can't read the entries from NodeA
> # {{onPartitionMerge}} also queued a rebalance, so NodeB starts a new rebalance without canceling the previous rebalance first (topology 13)
> # Because there is no reset topology, NodeB thinks it already requested all the segments for NodeC's in topology 11, so it doesn't add any new inbound transfer
> # NodeC's state response arrives on NodeB with topology 11, NodeB discards it, and state transfer hangs.
> {noformat}
> 14:52:52,426 INFO (testng-Test:[cluster-listener]) [CLUSTER] ISPN000310: Starting cluster-wide rebalance for cache cluster-listener, topology CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}
> 14:52:52,479 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [ClusterCacheStatus] Recovered 2 partition(s) for cache cluster-listener: [CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}, CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeA-57087: 78+79, Test-NodeB-45145: 90+88, Test-NodeC-20831: 88+89]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeA-57087, Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[48e3ddc7-ee97-42d8-a57d-283e8d28ec25, 301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}]
> 14:52:52,484 DEBUG (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache cluster-listener, topology = CacheTopology{id=12, phase=CONFLICT_RESOLUTION, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]}, unionCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49, Test-NodeA-57087: 0+157]}, actualMembers=[Test-NodeB-45145, Test-NodeA-57087, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, 48e3ddc7-ee97-42d8-a57d-283e8d28ec25, ae95a681-2ba1-4e04-bfe5-05aa59425149]}, availability mode = null
> 14:52:52,488 ERROR (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [DefaultConflictManager] Cache cluster-listener encountered exception whilst trying to resolve conflicts on merge: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node Test-NodeA-57087 was suspected
> 14:52:52,532 INFO (stateTransferExecutor-thread-Test-NodeB-p23774-t4:[Merge-3]) [CLUSTER] ISPN000310: Starting cluster-wide rebalance for cache cluster-listener, topology CacheTopology{id=13, phase=READ_OLD_WRITE_ALL, rebalanceId=6, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 128+50, Test-NodeC-20831: 128+49]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeB-45145: 131+125, Test-NodeC-20831: 125+131]}, unionCH=null, actualMembers=[Test-NodeB-45145, Test-NodeC-20831], persistentUUIDs=[301597c4-a4e4-46a6-8983-53e698ef70f7, ae95a681-2ba1-4e04-bfe5-05aa59425149]}
> 14:52:52,577 TRACE (stateTransferExecutor-thread-Test-NodeB-p23774-t3:[StateRequest-cluster-listener]) [StateConsumerImpl] Waiting for inbound transfer to finish: InboundTransferTask{segments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101 104-107 113 116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245 249-254}, finishedSegments={}, unfinishedSegments={19-21 28-33 38-44 50-55 60-62 72 77-79 86-91 101 104-107 113 116-126 168-169 172 181-182 188-189 195-197 200-202 223-226 235 242 245 249-254}, source=Test-NodeC-20831, isCancelled=false, completionFuture=java.util.concurrent.CompletableFuture@110952e8[Not completed], topologyId=11, timeout=240000, cacheName=cluster-listener}
> 14:52:52,584 DEBUG (remote-thread-Test-NodeB-p23771-t4:[cluster-listener]) [StateConsumerImpl] Discarding state response with old topology id 11 for cache cluster-listener, state transfer request topology was true
> 14:52:52,584 TRACE (remote-thread-Test-NodeB-p23771-t4:[]) [JGroupsTransport] Test-NodeB-45145 sending response for request 13 to Test-NodeC-20831: null
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 8 months
[JBoss JIRA] (ISPN-5037) Do not replicate value from backup owner
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5037?page=com.atlassian.jira.plugin.... ]
Dan Berindei reopened ISPN-5037:
--------------------------------
Assignee: (was: Dan Berindei)
I managed to reproduce it in the server, but only with {{remote.withFlags(Flag.FORCE_RETURN_VALUE).put(k, v)}}. The default put operation never sends the previous value back because it uses {{IGNORE_RETURN_VALUES}}, so I'm reducing the priority.
> Do not replicate value from backup owner
> ----------------------------------------
>
> Key: ISPN-5037
> URL: https://issues.jboss.org/browse/ISPN-5037
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Reporter: Radim Vansa
> Labels: 7.0
>
> When a write is replicated from primary owner to backup owner, the backup owner returns the old value. This is unnecessary as the primary owner ignores this value, and introduces networking and marshalling overhead.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 8 months
[JBoss JIRA] (ISPN-3663) Extend compatibility mode to store the binary value instead of unmarshalling to java object
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-3663?page=com.atlassian.jira.plugin.... ]
Adrian Nistor commented on ISPN-3663:
-------------------------------------
I'm so glad!!!!!
> Extend compatibility mode to store the binary value instead of unmarshalling to java object
> -------------------------------------------------------------------------------------------
>
> Key: ISPN-3663
> URL: https://issues.jboss.org/browse/ISPN-3663
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 6.0.0.Beta2
> Reporter: Adrian Nistor
> Assignee: Adrian Nistor
>
> Currently if compat mode is active we unmarshall the remote binary value into a java object and put this into the cache so embedded clients will be able to access a meaningful object. But this is not optimal as it will have to be serialized back to a byte array when remote clients get it and will also be to re-serialized when the put is replicated to another node.
> It's better to store the byte array that comes from the remote client directly and use the storeAsBinary / MarshalledValue mechanism to provide a java object to embedded clients.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 8 months
[JBoss JIRA] (ISPN-3663) Extend compatibility mode to store the binary value instead of unmarshalling to java object
by Gustavo Fernandes (JIRA)
[ https://issues.jboss.org/browse/ISPN-3663?page=com.atlassian.jira.plugin.... ]
Gustavo Fernandes resolved ISPN-3663.
-------------------------------------
Resolution: Out of Date
There's no need to store stuff unmarshalled in the server anymore for interoperability between REST, embedded and Hot Rod.
> Extend compatibility mode to store the binary value instead of unmarshalling to java object
> -------------------------------------------------------------------------------------------
>
> Key: ISPN-3663
> URL: https://issues.jboss.org/browse/ISPN-3663
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 6.0.0.Beta2
> Reporter: Adrian Nistor
> Assignee: Adrian Nistor
>
> Currently if compat mode is active we unmarshall the remote binary value into a java object and put this into the cache so embedded clients will be able to access a meaningful object. But this is not optimal as it will have to be serialized back to a byte array when remote clients get it and will also be to re-serialized when the put is replicated to another node.
> It's better to store the byte array that comes from the remote client directly and use the storeAsBinary / MarshalledValue mechanism to provide a java object to embedded clients.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 8 months