[JBoss JIRA] (ISPN-9319) Cache.lock(key) should throw AvailabilityException in degraded mode
by Dan Berindei (JIRA)
Dan Berindei created ISPN-9319:
----------------------------------
Summary: Cache.lock(key) should throw AvailabilityException in degraded mode
Key: ISPN-9319
URL: https://issues.jboss.org/browse/ISPN-9319
Project: Infinispan
Issue Type: Bug
Components: Core, Test Suite - Core
Affects Versions: 9.3.0.CR1
Reporter: Dan Berindei
Fix For: 9.4.0.Final
We need a test for {{cache.lock(key)}} when one of the owners is not an actual member of the cluster and the cache is in degraded mode. {{PartitionHandlingInterceptor}} doesn't handle {{LockControlCommand}} explicitly, so it might end up throwing {{TimeoutException}} after waiting for a new topology. We should make sure it throws an {{AvailabilityException}} instead.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 6 months
[JBoss JIRA] (ISPN-5016) Specify and document cache consistency guarantees
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5016?page=com.atlassian.jira.plugin.... ]
Work on ISPN-5016 stopped by Dan Berindei.
------------------------------------------
> Specify and document cache consistency guarantees
> -------------------------------------------------
>
> Key: ISPN-5016
> URL: https://issues.jboss.org/browse/ISPN-5016
> Project: Infinispan
> Issue Type: Task
> Components: Documentation-Core
> Affects Versions: 7.0.2.Final
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
>
> We can't simply use the consistency model defined by Java Specification and broaden it for whole cache (maybe the expression "can't" is too strong, but we definitely don't want to do that in some cases).
> By consistency guarantees/model I mean mostly in which order are
> writes allowed to be observed: and we can't boil it down to simply
> causal, PRAM or any other consistency model as writes can be observed as non-atomic in Infinispan.
> Infinispan documentation is quite scarce about that, the only trace I've
> found is in Glossarry [2] "Infinispan has traditionally followed ACID
> principles as far as possible, however an eventually consistent mode
> embracing BASE is on the roadmap."
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 6 months
[JBoss JIRA] (ISPN-9270) ClusteredLockWith2NodesTest.testTryLockAndKillLocking random failures
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-9270?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9270:
-------------------------------
Status: Open (was: Pull Request Sent)
> ClusteredLockWith2NodesTest.testTryLockAndKillLocking random failures
> ---------------------------------------------------------------------
>
> Key: ISPN-9270
> URL: https://issues.jboss.org/browse/ISPN-9270
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.3.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Labels: testsuite_stability
> Fix For: 9.3.0.Final
>
>
> The test stops the coordinator A of a 2-node cluster \[A, B\], and then verifies that B stays available.
> Unfortunately that doesn't always happen: A tries to install topology \[B\] before shutting down, but B might receive it too late and reject it. If B expects members \[A, B\] and sees only B, the cache will enter degraded mode:
> {noformat}
> 19:04:05,863 INFO (jgroups-8,Test-NodeB-21374:[org.infinispan.LOCKS]) [CLUSTER] [Context=org.infinispan.LOCKS] ISPN100008: Updating cache members list [Test-NodeD-51051], topology id 7
> 19:04:05,863 TRACE (jgroups-4,Test-NodeD-51051:[]) [JGroupsTransport] Test-NodeD-51051 received command from Test-NodeB-21374: CacheTopologyControlCommand{cache=org.infinispan.LOCKS, type=CH_UPDATE, sender=Test-NodeB-21374, joinInfo=null, topologyId=7, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[Test-NodeD-51051: 256]}, pendingCH=null, availabilityMode=AVAILABLE, phase=NO_REBALANCE, actualMembers=[Test-NodeD-51051], throwable=null, viewId=1}
> 19:04:05,880 INFO (jgroups-4,Test-NodeD-51051:[]) [CLUSTER] ISPN000094: Received new cluster view for channel ISPN: [Test-NodeD-51051|2] (1) [Test-NodeD-51051]
> 19:04:05,884 DEBUG (stateTransferExecutor-thread-Test-NodeD-p101-t1:[Merge-2]) [ClusterCacheStatus] Recovered 1 partition(s) for cache org.infinispan.LOCKS: [CacheTopology{id=6, phase=NO_REBALANCE, rebalanceId=2, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeB-21374: 124, Test-NodeD-51051: 132]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD-51051], persistentUUIDs=[6ac3a341-5af4-4aa2-ae7b-a4183d02d959]}]
> 19:04:05,884 WARN (stateTransferExecutor-thread-Test-NodeD-p101-t1:[Merge-2]) [CLUSTER] [Context=org.infinispan.LOCKS] ISPN000320: After merge (or coordinator change), cache still hasn't recovered a majority of members and must stay in degraded mode. Current members are [Test-NodeD-51051], lost members are [Test-NodeB-21374], stable members are [Test-NodeB-21374, Test-NodeD-51051]
> 19:04:05,885 INFO (stateTransferExecutor-thread-Test-NodeD-p101-t1:[Merge-2]) [CLUSTER] [Context=org.infinispan.LOCKS] ISPN100011: Entering availability mode DEGRADED_MODE, topology id 8
> 19:04:05,941 DEBUG (transport-thread-Test-NodeD-p100-t2:[Topology-org.infinispan.LOCKS]) [LocalTopologyManagerImpl] Ignoring topology 7 for cache org.infinispan.LOCKS from old coordinator Test-NodeB-21374
> 19:04:06,024 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.lock.ClusteredLockWith2NodesTest.testTryLockAndKillLocking
> java.lang.Error: java.util.concurrent.ExecutionException: java.lang.Error: java.util.concurrent.ExecutionException: org.infinispan.lock.exception.ClusteredLockException: org.infinispan.partitionhandling.AvailabilityException: ISPN000306: Key 'ClusteredLockKey{name=Test}' is not available. Not all owners are in this partition
> at org.infinispan.functional.FunctionalTestUtils.await(FunctionalTestUtils.java:47) ~[infinispan-core-9.3.0-SNAPSHOT-tests.jar:9.3.0-SNAPSHOT]
> at org.infinispan.lock.ClusteredLockWith2NodesTest.testTryLockAndKillLocking(ClusteredLockWith2NodesTest.java:49) ~[test-classes/:?]
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 6 months
[JBoss JIRA] (ISPN-9291) BasePartitionHandlingTest.Partition.installMergeView() doesn't compute the merge digest
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-9291?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9291:
-------------------------------
Fix Version/s: 9.4.0.Alpha1
(was: 9.3.0.Final)
Sprint: Sprint 9.4.0.Alpha1 (was: Sprint 9.3.0.Final)
> BasePartitionHandlingTest.Partition.installMergeView() doesn't compute the merge digest
> ---------------------------------------------------------------------------------------
>
> Key: ISPN-9291
> URL: https://issues.jboss.org/browse/ISPN-9291
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.3.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Labels: testsuite_stability
> Fix For: 9.4.0.Alpha1
>
>
> The partition handling tests use {{BasePartitionHandlingTest.Partition.installMergeView(view1, view2)}} to install the merge view without waiting for {{MERGE3}} to run, making them much faster. Unfortunately, the implementation is incorrect: {{GMS.installView(view)}} only works for regular views, merge views need to be installed with {{GMS.installView(mergeView, digest)}}.
> The result is that the nodes that got isolated from the coordinator request the retransmission of all the {{NAKACK2}} messages (including view updates) since the cluster first started. The isolated nodes cannot install the merge view until they deliver all the older messages (even without knowing whether they're OOB or not). But if {{STABLE}} ran and cleared a range of messages already, the retransmission request cannot be satisfied, so the view updates will never be delivered.
> This is easily reproducible in {{CrashedNodeDuringConflictResolutionTest}} if we add a delay before updating the topology in {{StateConsumerImpl}}. The test installs the merge view manually, but then kills NodeC and expects the cluster to install the new view automatically. NodeD can't install the new view because it's waiting for earlier messages from NodeA:
> {noformat}
> 18:27:13,054 INFO (testng-test:[]) [TestSuiteProgress] Test starting: org.infinispan.conflict.impl.CrashedNodeDuringConflictResolutionTest.testPartitionMergePolicy[DIST_SYNC]
> 18:27:13,640 DEBUG (testng-test:[]) [GMS] test-NodeA-39513: installing view MergeView::[test-NodeA-39513|10] (4) [test-NodeA-39513, test-NodeB-9439, test-NodeC-43706, test-NodeD-59078], 2 subgroups: [test-NodeA-39513|8] (2) [test-NodeA-39513, test-NodeB-9439], [test-NodeC-43706|9] (2) [test-NodeC-43706, test-NodeD-59078]
> 18:27:13,674 DEBUG (testng-test:[]) [GMS] test-NodeD-59078: installing view MergeView::[test-NodeA-39513|10] (4) [test-NodeA-39513, test-NodeB-9439, test-NodeC-43706, test-NodeD-59078], 2 subgroups: [test-NodeA-39513|8] (2) [test-NodeA-39513, test-NodeB-9439], [test-NodeC-43706|9] (2) [test-NodeC-43706, test-NodeD-59078]
> 18:27:13,828 TRACE (jgroups-7,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((1): {50}) to test-NodeA-39513
> 18:27:13,966 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((49): {1-49}) to test-NodeA-39513
> 18:27:14,067 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> 18:27:14,504 DEBUG (testng-test:[]) [DefaultCacheManager] Stopping cache manager ISPN on test-NodeC-43706
> 18:27:18,642 TRACE (VERIFY_SUSPECT.TimerThread-89,test-NodeA-39513:[]) [GMS] test-NodeA-39513: joiners=[], suspected=[test-NodeC-43706], leaving=[], new view: [test-NodeA-39513|11] (3) [test-NodeA-39513, test-NodeB-9439, test-NodeD-59078]
> 18:27:18,643 TRACE (VERIFY_SUSPECT.TimerThread-89,test-NodeA-39513:[]) [GMS] test-NodeA-39513: mcasting view [test-NodeA-39513|11] (3) [test-NodeA-39513, test-NodeB-9439, test-NodeD-59078]
> 18:27:18,646 DEBUG (VERIFY_SUSPECT.TimerThread-89,test-NodeA-39513:[]) [GMS] test-NodeA-39513: installing view [test-NodeA-39513|11] (3) [test-NodeA-39513, test-NodeB-9439, test-NodeD-59078]
> 18:27:18,652 TRACE (VERIFY_SUSPECT.TimerThread-89,test-NodeA-39513:[]) [TCP_NIO2] test-NodeA-39513: sending msg to null, src=test-NodeA-39513, headers are GMS: GmsHeader[VIEW], NAKACK2: [MSG, seqno=63], TP: [cluster_name=ISPN]
> 18:27:18,656 TRACE (jgroups-20,test-NodeA-39513:[]) [TCP_NIO2] test-NodeA-39513: received [dst: test-NodeA-39513, src: test-NodeB-9439 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[VIEW_ACK], UNICAST3: DATA, seqno=100, TP: [cluster_name=ISPN]
> 18:27:20,554 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> 18:27:20,653 WARN (VERIFY_SUSPECT.TimerThread-89,test-NodeA-39513:[]) [GMS] test-NodeA-39513: failed to collect all ACKs (expected=2) for view [test-NodeA-39513|11] after 2000ms, missing 1 ACKs from (1) test-NodeD-59078
> 18:27:20,656 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> 18:27:20,756 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> ...
> 18:28:14,412 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> 18:28:14,513 TRACE (Timer runner-1,test-NodeD-59078:[]) [NAKACK2] test-NodeD-59078: sending XMIT_REQ ((45): {1-45}) to test-NodeA-39513
> 18:28:14,589 ERROR (testng-test:[]) [TestSuiteProgress] Test failed: org.infinispan.conflict.impl.CrashedNodeDuringConflictResolutionTest.testPartitionMergePolicy[DIST_SYNC]
> java.lang.RuntimeException: Cache ___defaultcache timed out waiting for rebalancing to complete on node test-NodeA-39513, current topology is CacheTopology{id=21, phase=CONFLICT_RESOLUTION, rebalanceId=7, currentCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (3)[test-NodeD-59078: 256+0, test-NodeA-39513: 0+256, test-NodeB-9439: 0+256]}, pendingCH=null, unionCH=null, actualMembers=[test-NodeD-59078, test-NodeA-39513, test-NodeB-9439], persistentUUIDs=[828108c4-4251-49fc-9481-ff6392bea9fb, 1d4b6f07-b71b-41a1-adfb-abbe68944a9f, 3a1ece05-c282-433e-9eb5-7b3e0f1932aa]}. rebalanceInProgress=true, currentChIsBalanced=true
> at org.infinispan.test.TestingUtil.waitForNoRebalance(TestingUtil.java:392) ~[test-classes/:?]
> at org.infinispan.conflict.impl.CrashedNodeDuringConflictResolutionTest.performMerge(CrashedNodeDuringConflictResolutionTest.java:113) ~[test-classes/:?]
> at org.infinispan.conflict.impl.BaseMergePolicyTest.testPartitionMergePolicy(BaseMergePolicyTest.java:137) ~[test-classes/:?]
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 6 months
[JBoss JIRA] (ISPN-9127) Remote commands can access components before they are started
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-9127?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-9127:
----------------------------------
Sprint: Sprint 9.4.0.Alpha1
> Remote commands can access components before they are started
> -------------------------------------------------------------
>
> Key: ISPN-9127
> URL: https://issues.jboss.org/browse/ISPN-9127
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.2.Final, 9.3.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Labels: testsuite_stability
>
> {{PerCacheInboundInvocationHandler.handle()}} may be called before the component was started, because {{GlobalInboundInvocationHandler}} fetches it from the component registry without any checks. {{CommandsFactoryImpl.initializeReplicableCommand()}} doesn't wait for the components that it injects into remote commands to be started, either.
> This started causing random test failures in {{ConcurrentStartForkChannelTest}} after ISPN-8515, which moved most initialization work from {{init()}} methods to {{start()}} methods. Because {{StateProviderImpl}} starts after {{StateTransferManagerImpl}}, it's possible for a node to receive a {{StateRequestCommand}} before {{StateProviderImpl}} has initialized:
> {noformat}
> 16:15:09,549 TRACE (remote-thread-Test-NodeB-p51957-t2:[org.infinispan.CONFIG]) [StateProviderImpl] Starting outbound transfer to node Test-NodeA for cache null, topology id 2, segments {0-255}
> 16:15:09,551 WARN (remote-thread-Test-NodeB-p51957-t2:[]) [NonTotalOrderPerCacheInboundInvocationHandler] ISPN000071: Caught exception when handling command StateRequestCommand{cache=org.infinispan.CONFIG, origin=Test-NodeA, type=START_STATE_TRANSFER, topologyId=2, segments={0-255}}
> java.lang.IllegalArgumentException: chunkSize must be greater than 0
> at org.infinispan.statetransfer.OutboundTransferTask.<init>(OutboundTransferTask.java:114) ~[classes/:?]
> at org.infinispan.statetransfer.StateProviderImpl.startOutboundTransfer(StateProviderImpl.java:273) ~[classes/:?]
> at org.infinispan.statetransfer.StateRequestCommand.invokeAsync(StateRequestCommand.java:101) ~[classes/:?]
> at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:94) ~[classes/:?]
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 6 months
[JBoss JIRA] (ISPN-9127) Remote commands can access components before they are started
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-9127?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9127:
-------------------------------
Status: Open (was: Pull Request Sent)
> Remote commands can access components before they are started
> -------------------------------------------------------------
>
> Key: ISPN-9127
> URL: https://issues.jboss.org/browse/ISPN-9127
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.2.Final, 9.3.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Labels: testsuite_stability
>
> {{PerCacheInboundInvocationHandler.handle()}} may be called before the component was started, because {{GlobalInboundInvocationHandler}} fetches it from the component registry without any checks. {{CommandsFactoryImpl.initializeReplicableCommand()}} doesn't wait for the components that it injects into remote commands to be started, either.
> This started causing random test failures in {{ConcurrentStartForkChannelTest}} after ISPN-8515, which moved most initialization work from {{init()}} methods to {{start()}} methods. Because {{StateProviderImpl}} starts after {{StateTransferManagerImpl}}, it's possible for a node to receive a {{StateRequestCommand}} before {{StateProviderImpl}} has initialized:
> {noformat}
> 16:15:09,549 TRACE (remote-thread-Test-NodeB-p51957-t2:[org.infinispan.CONFIG]) [StateProviderImpl] Starting outbound transfer to node Test-NodeA for cache null, topology id 2, segments {0-255}
> 16:15:09,551 WARN (remote-thread-Test-NodeB-p51957-t2:[]) [NonTotalOrderPerCacheInboundInvocationHandler] ISPN000071: Caught exception when handling command StateRequestCommand{cache=org.infinispan.CONFIG, origin=Test-NodeA, type=START_STATE_TRANSFER, topologyId=2, segments={0-255}}
> java.lang.IllegalArgumentException: chunkSize must be greater than 0
> at org.infinispan.statetransfer.OutboundTransferTask.<init>(OutboundTransferTask.java:114) ~[classes/:?]
> at org.infinispan.statetransfer.StateProviderImpl.startOutboundTransfer(StateProviderImpl.java:273) ~[classes/:?]
> at org.infinispan.statetransfer.StateRequestCommand.invokeAsync(StateRequestCommand.java:101) ~[classes/:?]
> at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:94) ~[classes/:?]
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 6 months
[JBoss JIRA] (ISPN-9127) Remote commands can access components before they are started
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-9127?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-9127:
----------------------------------
Fix Version/s: (was: 9.3.0.Final)
> Remote commands can access components before they are started
> -------------------------------------------------------------
>
> Key: ISPN-9127
> URL: https://issues.jboss.org/browse/ISPN-9127
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.2.Final, 9.3.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Labels: testsuite_stability
>
> {{PerCacheInboundInvocationHandler.handle()}} may be called before the component was started, because {{GlobalInboundInvocationHandler}} fetches it from the component registry without any checks. {{CommandsFactoryImpl.initializeReplicableCommand()}} doesn't wait for the components that it injects into remote commands to be started, either.
> This started causing random test failures in {{ConcurrentStartForkChannelTest}} after ISPN-8515, which moved most initialization work from {{init()}} methods to {{start()}} methods. Because {{StateProviderImpl}} starts after {{StateTransferManagerImpl}}, it's possible for a node to receive a {{StateRequestCommand}} before {{StateProviderImpl}} has initialized:
> {noformat}
> 16:15:09,549 TRACE (remote-thread-Test-NodeB-p51957-t2:[org.infinispan.CONFIG]) [StateProviderImpl] Starting outbound transfer to node Test-NodeA for cache null, topology id 2, segments {0-255}
> 16:15:09,551 WARN (remote-thread-Test-NodeB-p51957-t2:[]) [NonTotalOrderPerCacheInboundInvocationHandler] ISPN000071: Caught exception when handling command StateRequestCommand{cache=org.infinispan.CONFIG, origin=Test-NodeA, type=START_STATE_TRANSFER, topologyId=2, segments={0-255}}
> java.lang.IllegalArgumentException: chunkSize must be greater than 0
> at org.infinispan.statetransfer.OutboundTransferTask.<init>(OutboundTransferTask.java:114) ~[classes/:?]
> at org.infinispan.statetransfer.StateProviderImpl.startOutboundTransfer(StateProviderImpl.java:273) ~[classes/:?]
> at org.infinispan.statetransfer.StateRequestCommand.invokeAsync(StateRequestCommand.java:101) ~[classes/:?]
> at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:94) ~[classes/:?]
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 6 months