[JBoss JIRA] (ISPN-9106) Readiness probe fails
by Sebastian Łaskawiec (JIRA)
Sebastian Łaskawiec created ISPN-9106:
-----------------------------------------
Summary: Readiness probe fails
Key: ISPN-9106
URL: https://issues.jboss.org/browse/ISPN-9106
Project: Infinispan
Issue Type: Bug
Affects Versions: 9.3.0.Alpha1
Reporter: Sebastian Łaskawiec
Assignee: Ryan Emerson
The readiness check returns an error on current master build:
{code}
opt/jboss/infinispan-server/bin/ispn-cli.sh -c --controller=172.16.26.143:9990 '/subsystem=datagrid-infinispan/cache-container=clustered/health=HEALTH:read-attribute(name=cluster-health)'
Failed to link org/jboss/as/patching/cli/PatchCommand (Module "org.jboss.as.patching" version 4.0.0.Final from local module loader @555590 (finder: local module finder @6d1e7682 (roots: /opt/jboss/infinispan-server/modules,/opt/jboss/infinispan-server/modules/system/layers/base))): org/aesh/command/Command
{code}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 8 months
[JBoss JIRA] (ISPN-6921) Partition does not become available after merge
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-6921?page=com.atlassian.jira.plugin.... ]
Radim Vansa commented on ISPN-6921:
-----------------------------------
OK, I've changed the title of the JIRA. Nevertheless the situation described is clearly undesirable: when majority of nodes reunite, the partition should become available.
> Partition does not become available after merge
> -----------------------------------------------
>
> Key: ISPN-6921
> URL: https://issues.jboss.org/browse/ISPN-6921
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.0.0.Alpha3
> Reporter: Radim Vansa
> Assignee: Dan Berindei
>
> During a split (A, B, C, D) -> (A), (B, C, D) -> (A), (B), (C, D) -> (A, C, D), (B)
> There was topology update with BCD in CH, but stable topology is still ABCD.
> When a merge happens, the merged topology has only CD as newMembers when computing the new availability mode. Therefore, the ACD does not become available though it has enough data from the stable topology.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 8 months
[JBoss JIRA] (ISPN-6921) Partition does not become available after merge
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-6921?page=com.atlassian.jira.plugin.... ]
Radim Vansa updated ISPN-6921:
------------------------------
Summary: Partition does not become available after merge (was: Use intersection of expected members and stable topology during merge)
> Partition does not become available after merge
> -----------------------------------------------
>
> Key: ISPN-6921
> URL: https://issues.jboss.org/browse/ISPN-6921
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.0.0.Alpha3
> Reporter: Radim Vansa
> Assignee: Dan Berindei
>
> During a split (A, B, C, D) -> (A), (B, C, D) -> (A), (B), (C, D) -> (A, C, D), (B)
> There was topology update with BCD in CH, but stable topology is still ABCD.
> When a merge happens, the merged topology has only CD as newMembers when computing the new availability mode. Therefore, the ACD does not become available though it has enough data from the stable topology.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 8 months
[JBoss JIRA] (ISPN-6921) Use intersection of expected members and stable topology during merge
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-6921?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-6921:
------------------------------------
[~rvansa] unfortunately being an owner in the stable topology doesn't guarantee that node still has the data. Because the stable topology update is sent asynchronously after the rebalance ends, B could finish the rebalance for (B, C, D) and send the NO_REBALANCE topology update to C and D, but fail to send the stable topology update before it gets isolated. The rebalance process doesn't guarantee that when a node is removed only that node's segments move, so you could have segment 0 owned by CD in the stable topology and by DB in the rebalanced topology, meaning C has deleted its copy of the segment by the time we're trying to use the stable topology for the (A, C, D) partition.
We could probably move the stable topology from being a completely separate thing to being just a flag or topology phase. I'm not yet sure if that would be enough or if we also have to change the way we update the topology after a node leaves to keep it in the CH until the rebalance ends.
> Use intersection of expected members and stable topology during merge
> ---------------------------------------------------------------------
>
> Key: ISPN-6921
> URL: https://issues.jboss.org/browse/ISPN-6921
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.0.0.Alpha3
> Reporter: Radim Vansa
> Assignee: Dan Berindei
>
> During a split (A, B, C, D) -> (A), (B, C, D) -> (A), (B), (C, D) -> (A, C, D), (B)
> There was topology update with BCD in CH, but stable topology is still ABCD.
> When a merge happens, the merged topology has only CD as newMembers when computing the new availability mode. Therefore, the ACD does not become available though it has enough data from the stable topology.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 8 months
[JBoss JIRA] (ISPN-6921) Use intersection of expected members and stable topology during merge
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-6921?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-6921:
-------------------------------
Status: Open (was: New)
> Use intersection of expected members and stable topology during merge
> ---------------------------------------------------------------------
>
> Key: ISPN-6921
> URL: https://issues.jboss.org/browse/ISPN-6921
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.0.0.Alpha3
> Reporter: Radim Vansa
> Assignee: Dan Berindei
>
> During a split (A, B, C, D) -> (A), (B, C, D) -> (A), (B), (C, D) -> (A, C, D), (B)
> There was topology update with BCD in CH, but stable topology is still ABCD.
> When a merge happens, the merged topology has only CD as newMembers when computing the new availability mode. Therefore, the ACD does not become available though it has enough data from the stable topology.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 8 months
[JBoss JIRA] (ISPN-9105) Majority partition nodes can process minority topology updates after merge
by Dan Berindei (JIRA)
Dan Berindei created ISPN-9105:
----------------------------------
Summary: Majority partition nodes can process minority topology updates after merge
Key: ISPN-9105
URL: https://issues.jboss.org/browse/ISPN-9105
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 9.2.1.Final, 9.3.0.Alpha1
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 9.3.0.Beta1
After a merge, NAKACK2 resends some broadcast messages that were originally sent in a partition to the members of the merged cluster that weren't in that partition.
We have a check in LocalTopologyManagerImpl to ignore topology updates from the wrong coordinator, but unfortunately that only happens after calling resetLocalTopologyBeforeRebalance(). If the topology id is higher than the current topology id, that can install a "reset" topology to prepare for rebalance.
The reset topology has all the owners owned by the minority partition nodes, so the majority partition nodes installing this topology will invalidate all their entries before conflict resolution even starts.
This causes random failures in the conflict resolution tests, e.g. {{MergePolicyPreferredAlwaysTest}}:
{noformat}
19:19:28,448 DEBUG (testng-Test:[]) [GMS] Test-NodeA-50368: installing view MergeView::[Test-NodeA-50368|10] (5) [Test-NodeA-50368, Test-NodeB-27290, Test-NodeC-9368, Test-NodeD-49504, Test-NodeE-55304], 2 subgroups: [Test-NodeA-50368|8] (3) [Test-NodeA-50368, Test-NodeB-27290, Test-NodeC-9368], [Test-NodeD-49504|9] (2) [Test-NodeD-49504, Test-NodeE-55304]
19:19:28,740 TRACE (jgroups-10,Test-NodeA-50368:[]) [GlobalInboundInvocationHandler] Attempting to execute non-CacheRpcCommand: CacheTopologyControlCommand{cache=___defaultcache, type=CH_UPDATE, sender=Test-NodeD-49504, joinInfo=null, topologyId=21, rebalanceId=7, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 132, Test-NodeE-55304: 124]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 134, Test-NodeE-55304: 122]}, availabilityMode=null, phase=READ_ALL_WRITE_ALL, actualMembers=[Test-NodeD-49504, Test-NodeE-55304], throwable=null, viewId=7} [sender=Test-NodeD-49504]
19:19:28,741 DEBUG (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [LocalTopologyManagerImpl] Installing fake cache topology CacheTopology{id=20, phase=NO_REBALANCE, rebalanceId=6, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 132, Test-NodeE-55304: 124]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD-49504, Test-NodeE-55304], persistentUUIDs=[6f22a4be-bf94-42a7-9ea1-4128944351a2, 59c315d5-c7d2-4121-b939-01d62ba9af4f]} for cache ___defaultcache
19:19:28,742 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] On cache ___defaultcache we have: new segments: []; old segments: RangeSet(256)
19:19:28,744 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] On cache ___defaultcache we have: added segments: {}; removed segments: {0-255}
19:19:28,745 DEBUG (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] Removing no longer owned entries for cache ___defaultcache
19:19:28,745 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [InvocationContextInterceptor] Invoked with command InvalidateCommand{keys=[MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}]} and InvocationContext [org.infinispan.context.impl.NonTxInvocationContext@48a2a9d5]
19:19:30,152 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [JGroupsTransport] Test-NodeA-50368 sending request 232 to all: SingleRpcCommand{cacheName='___defaultcache', command=RemoveCommand{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=null, metadata=null, flags=[SKIP_REMOTE_LOOKUP, PUT_FOR_STATE_TRANSFER, IGNORE_RETURN_VALUES], commandInvocationId=CommandInvocation:Test-NodeA-50368:109014, valueMatcher=MATCH_ALWAYS, topologyId=24}}
19:19:28,748 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [DefaultDataContainer] Removed ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=DURING SPLIT} from container
19:19:30,096 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache conflict detected {Test-NodeA-50368=NullCacheEntry{}, Test-NodeE-55304=ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, Test-NodeC-9368=NullCacheEntry{}, Test-NodeD-49504=ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, Test-NodeB-27290=NullCacheEntry{}}
19:19:30,132 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache applying EntryMergePolicy org.infinispan.conflict.MergePolicy to PreferredEntry NullCacheEntry{}, otherEntries [ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, NullCacheEntry{}, ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, NullCacheEntry{}]
19:19:30,132 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache executing remove on conflict: key MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}
19:19:35,274 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.conflict.impl.MergePolicyPreferredAlwaysTest.testPartitionMergePolicy[REPL_SYNC, 5N]
java.lang.AssertionError: Key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}. VersionMap: {Test-NodeA-50368=null, Test-NodeE-55304=null, Test-NodeC-9368=null, Test-NodeD-49504=null, Test-NodeB-27290=null}
at org.testng.AssertJUnit.fail(AssertJUnit.java:59) ~[testng-6.9.9.jar:?]
at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24) ~[testng-6.9.9.jar:?]
at org.testng.AssertJUnit.assertNotNull(AssertJUnit.java:267) ~[testng-6.9.9.jar:?]
at org.infinispan.conflict.impl.BaseMergePolicyTest.afterConflictResolutionAndMerge(BaseMergePolicyTest.java:113) ~[test-classes/:?]
at org.infinispan.conflict.impl.BaseMergePolicyTest.testPartitionMergePolicy(BaseMergePolicyTest.java:138) ~[test-classes/:?]
{noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 8 months
[JBoss JIRA] (ISPN-9104) Majority partition nodes can process minority topology updates after merge
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-9104?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9104:
-------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request: https://github.com/infinispan/infinispan/pull/5941
> Majority partition nodes can process minority topology updates after merge
> --------------------------------------------------------------------------
>
> Key: ISPN-9104
> URL: https://issues.jboss.org/browse/ISPN-9104
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.1.Final, 9.3.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Labels: testsuite_stability
> Fix For: 9.3.0.Beta1
>
>
> After a merge, NAKACK2 resends some broadcast messages that were originally sent in a partition to the members of the merged cluster that weren't in that partition.
> We have a check in LocalTopologyManagerImpl to ignore topology updates from the wrong coordinator, but unfortunately that only happens after calling resetLocalTopologyBeforeRebalance(). If the topology id is higher than the current topology id, that can install a "reset" topology to prepare for rebalance.
> The reset topology has all the owners owned by the minority partition nodes, so the majority partition nodes installing this topology will invalidate all their entries before conflict resolution even starts.
> This causes random failures in the conflict resolution tests, e.g. {{MergePolicyPreferredAlwaysTest}}:
> {noformat}
> 19:19:28,448 DEBUG (testng-Test:[]) [GMS] Test-NodeA-50368: installing view MergeView::[Test-NodeA-50368|10] (5) [Test-NodeA-50368, Test-NodeB-27290, Test-NodeC-9368, Test-NodeD-49504, Test-NodeE-55304], 2 subgroups: [Test-NodeA-50368|8] (3) [Test-NodeA-50368, Test-NodeB-27290, Test-NodeC-9368], [Test-NodeD-49504|9] (2) [Test-NodeD-49504, Test-NodeE-55304]
> 19:19:28,740 TRACE (jgroups-10,Test-NodeA-50368:[]) [GlobalInboundInvocationHandler] Attempting to execute non-CacheRpcCommand: CacheTopologyControlCommand{cache=___defaultcache, type=CH_UPDATE, sender=Test-NodeD-49504, joinInfo=null, topologyId=21, rebalanceId=7, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 132, Test-NodeE-55304: 124]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 134, Test-NodeE-55304: 122]}, availabilityMode=null, phase=READ_ALL_WRITE_ALL, actualMembers=[Test-NodeD-49504, Test-NodeE-55304], throwable=null, viewId=7} [sender=Test-NodeD-49504]
> 19:19:28,741 DEBUG (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [LocalTopologyManagerImpl] Installing fake cache topology CacheTopology{id=20, phase=NO_REBALANCE, rebalanceId=6, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeD-49504: 132, Test-NodeE-55304: 124]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD-49504, Test-NodeE-55304], persistentUUIDs=[6f22a4be-bf94-42a7-9ea1-4128944351a2, 59c315d5-c7d2-4121-b939-01d62ba9af4f]} for cache ___defaultcache
> 19:19:28,742 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] On cache ___defaultcache we have: new segments: []; old segments: RangeSet(256)
> 19:19:28,744 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] On cache ___defaultcache we have: added segments: {}; removed segments: {0-255}
> 19:19:28,745 DEBUG (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [StateConsumerImpl] Removing no longer owned entries for cache ___defaultcache
> 19:19:28,745 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [InvocationContextInterceptor] Invoked with command InvalidateCommand{keys=[MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}]} and InvocationContext [org.infinispan.context.impl.NonTxInvocationContext@48a2a9d5]
> 19:19:30,152 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [JGroupsTransport] Test-NodeA-50368 sending request 232 to all: SingleRpcCommand{cacheName='___defaultcache', command=RemoveCommand{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=null, metadata=null, flags=[SKIP_REMOTE_LOOKUP, PUT_FOR_STATE_TRANSFER, IGNORE_RETURN_VALUES], commandInvocationId=CommandInvocation:Test-NodeA-50368:109014, valueMatcher=MATCH_ALWAYS, topologyId=24}}
> 19:19:28,748 TRACE (transport-thread-Test-NodeA-p66802-t6:[Topology-___defaultcache]) [DefaultDataContainer] Removed ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=DURING SPLIT} from container
> 19:19:30,096 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache conflict detected {Test-NodeA-50368=NullCacheEntry{}, Test-NodeE-55304=ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, Test-NodeC-9368=NullCacheEntry{}, Test-NodeD-49504=ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, Test-NodeB-27290=NullCacheEntry{}}
> 19:19:30,132 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache applying EntryMergePolicy org.infinispan.conflict.MergePolicy to PreferredEntry NullCacheEntry{}, otherEntries [ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, NullCacheEntry{}, ImmortalCacheEntry{key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}, value=BEFORE SPLIT}, NullCacheEntry{}]
> 19:19:30,132 TRACE (stateTransferExecutor-thread-Test-NodeA-p66803-t4:[]) [DefaultConflictManager] Cache ___defaultcache executing remove on conflict: key MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}
> 19:19:35,274 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.conflict.impl.MergePolicyPreferredAlwaysTest.testPartitionMergePolicy[REPL_SYNC, 5N]
> java.lang.AssertionError: Key=MagicKey{1AD3/F92B3173/51@Test-NodeA-50368}. VersionMap: {Test-NodeA-50368=null, Test-NodeE-55304=null, Test-NodeC-9368=null, Test-NodeD-49504=null, Test-NodeB-27290=null}
> at org.testng.AssertJUnit.fail(AssertJUnit.java:59) ~[testng-6.9.9.jar:?]
> at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24) ~[testng-6.9.9.jar:?]
> at org.testng.AssertJUnit.assertNotNull(AssertJUnit.java:267) ~[testng-6.9.9.jar:?]
> at org.infinispan.conflict.impl.BaseMergePolicyTest.afterConflictResolutionAndMerge(BaseMergePolicyTest.java:113) ~[test-classes/:?]
> at org.infinispan.conflict.impl.BaseMergePolicyTest.testPartitionMergePolicy(BaseMergePolicyTest.java:138) ~[test-classes/:?]
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 8 months