[JBoss JIRA] (ISPN-10985) Liveness/readiness scripts don't work with custom configuration
by Pedro Zapata Fernandez (Jira)
[ https://issues.redhat.com/browse/ISPN-10985?page=com.atlassian.jira.plugi... ]
Pedro Zapata Fernandez updated ISPN-10985:
------------------------------------------
Sprint: DataGrid Sprint #40
> Liveness/readiness scripts don't work with custom configuration
> ---------------------------------------------------------------
>
> Key: ISPN-10985
> URL: https://issues.redhat.com/browse/ISPN-10985
> Project: Infinispan
> Issue Type: Bug
> Components: OpenShift
> Affects Versions: 10.0.1.Final
> Environment: OpenShift, custom configuration
> Reporter: Jens Reimann
> Assignee: Ryan Emerson
> Priority: Blocker
>
> Using a custom configuration, the liveness/readiness scripts (`/opt/infinispan/bin/readinessProbe.sh`) no longer work. They do use the in-image config files to evaluate the state of HTTPS, however as the in-image config file is not using, this may result in the following error:
> ~~~
> sh-4.4$ /opt/infinispan/bin/readinessProbe.sh
> curl: (35) error:1408F10B:SSL routines:ssl3_get_record:wrong version number
> ~~~
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-10985) Liveness/readiness scripts don't work with custom configuration
by Pedro Zapata Fernandez (Jira)
[ https://issues.redhat.com/browse/ISPN-10985?page=com.atlassian.jira.plugi... ]
Pedro Zapata Fernandez updated ISPN-10985:
------------------------------------------
Priority: Critical (was: Blocker)
> Liveness/readiness scripts don't work with custom configuration
> ---------------------------------------------------------------
>
> Key: ISPN-10985
> URL: https://issues.redhat.com/browse/ISPN-10985
> Project: Infinispan
> Issue Type: Bug
> Components: OpenShift
> Affects Versions: 10.0.1.Final
> Environment: OpenShift, custom configuration
> Reporter: Jens Reimann
> Assignee: Ryan Emerson
> Priority: Critical
>
> Using a custom configuration, the liveness/readiness scripts (`/opt/infinispan/bin/readinessProbe.sh`) no longer work. They do use the in-image config files to evaluate the state of HTTPS, however as the in-image config file is not using, this may result in the following error:
> ~~~
> sh-4.4$ /opt/infinispan/bin/readinessProbe.sh
> curl: (35) error:1408F10B:SSL routines:ssl3_get_record:wrong version number
> ~~~
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-5530) AtomicObjectFactoryTest.distributedCacheTest random failures
by Pedro Zapata Fernandez (Jira)
[ https://issues.redhat.com/browse/ISPN-5530?page=com.atlassian.jira.plugin... ]
Pedro Zapata Fernandez closed ISPN-5530.
----------------------------------------
Resolution: Out of Date
> AtomicObjectFactoryTest.distributedCacheTest random failures
> ------------------------------------------------------------
>
> Key: ISPN-5530
> URL: https://issues.redhat.com/browse/ISPN-5530
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite
> Affects Versions: 7.2.2.Final, 8.0.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Tristan Tarrant
> Priority: Blocker
> Labels: testsuite_stability
>
> {noformat}
> java.lang.AssertionError: obtained = 999; espected = 1000 expected:<1000> but was:<999>
> at org.testng.AssertJUnit.fail(AssertJUnit.java:59)
> at org.testng.AssertJUnit.failNotEquals(AssertJUnit.java:364)
> at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:80)
> at org.infinispan.atomic.AtomicObjectFactoryTest.distributedCacheTest(AtomicObjectFactoryTest.java:114)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:84)
> at org.testng.internal.Invoker.invokeMethod(Invoker.java:714)
> at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901)
> at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231)
> at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
> at org.testng.TestRunner.privateRun(TestRunner.java:767)
> at org.testng.TestRunner.run(TestRunner.java:617)
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:348)
> at org.testng.SuiteRunner.access$000(SuiteRunner.java:38)
> at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:382)
> at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-4568) DistSyncL1RepeatableReadFuncTest.testNoEntryInL1MultipleConcurrentGetsWithInvalidation random failures
by Pedro Zapata Fernandez (Jira)
[ https://issues.redhat.com/browse/ISPN-4568?page=com.atlassian.jira.plugin... ]
Pedro Zapata Fernandez updated ISPN-4568:
-----------------------------------------
Priority: Major (was: Blocker)
> DistSyncL1RepeatableReadFuncTest.testNoEntryInL1MultipleConcurrentGetsWithInvalidation random failures
> ------------------------------------------------------------------------------------------------------
>
> Key: ISPN-4568
> URL: https://issues.redhat.com/browse/ISPN-4568
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Priority: Major
> Labels: testsuite_stability
>
> Very likely related to ISPN-4564, as there seem to be 2 unjustified pauses ~ 3s and some log messages also appear to be delayed:
> {noformat}
> 08:23:48,443 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAN-p28720-t1:) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@e9a3538]
> 08:23:48,470 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAN-p28720-t1:) [JGroupsTransport] dests=[DistSyncL1RepeatableReadFuncTest-NodeAN-7764, DistSyncL1RepeatableReadFuncTest-NodeAM-739], command=SingleRpcCommand{cacheName='dist', command=PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}}, mode=SYNCHRONOUS, timeout=60000
> 08:23:50,953 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext [org.infinispan.context.impl.NonTxInvocationContext@62801f8c]
> 08:23:50,953 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [L1ManagerImpl] Invalidating keys [key-to-the-cache] on nodes [DistSyncL1RepeatableReadFuncTest-NodeAK-9309]. Use multicast? false
> 08:23:51,060 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [JGroupsTransport] dests=[DistSyncL1RepeatableReadFuncTest-NodeAK-9309], command=SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}}, mode=SYNCHRONOUS_IGNORE_LEAVERS, timeout=60000
> 08:23:51,062 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAK-p28661-t5:) [BaseRpcInvokingCommand] Invoking command InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}, with originLocal flag set to false
> 08:23:50,972 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [CallInterceptor] Executing command: PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}.
> 08:23:51,786 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAK-p28661-t5:) [InboundInvocationHandlerImpl] About to send back response null for command SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}}
> 08:23:51,796 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [CommandAwareRpcDispatcher] Responses: [sender=DistSyncL1RepeatableReadFuncTest-NodeAK-9309, received=true, suspected=false]
> 08:23:54,561 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [RpcManagerImpl] Response(s) to SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}} is {}
> 08:23:56,955 ERROR (testng-DistSyncL1RepeatableReadFuncTest:) [UnitTestTestNGListener] Test testNoEntryInL1MultipleConcurrentGetsWithInvalidation(org.infinispan.distribution.DistSyncL1RepeatableReadFuncTest) failed.
> java.util.concurrent.TimeoutException
> at java.util.concurrent.FutureTask.get(FutureTask.java:201)
> at org.infinispan.commons.util.concurrent.NotifyingFutureImpl.get(NotifyingFutureImpl.java:84)
> at org.infinispan.distribution.BaseDistSyncL1Test.testNoEntryInL1MultipleConcurrentGetsWithInvalidation(BaseDistSyncL1Test.java:217)
> 08:23:54,578 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [L1NonTxInterceptor] Allowing entry to commit as local node is owner
> 08:23:57,861 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [EntryWrappingInterceptor] About to commit entry RepeatableReadEntry(499752d9){key=key-to-the-cache, value=second-put, oldValue=first-put, isCreated=false, isChanged=true, isRemoved=false, isValid=true, skipRemoteGet=false, metadata=EmbeddedMetadata{version=null}}
> {noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-4568) DistSyncL1RepeatableReadFuncTest.testNoEntryInL1MultipleConcurrentGetsWithInvalidation random failures
by Pedro Zapata Fernandez (Jira)
[ https://issues.redhat.com/browse/ISPN-4568?page=com.atlassian.jira.plugin... ]
Pedro Zapata Fernandez closed ISPN-4568.
----------------------------------------
Resolution: Out of Date
> DistSyncL1RepeatableReadFuncTest.testNoEntryInL1MultipleConcurrentGetsWithInvalidation random failures
> ------------------------------------------------------------------------------------------------------
>
> Key: ISPN-4568
> URL: https://issues.redhat.com/browse/ISPN-4568
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Priority: Major
> Labels: testsuite_stability
>
> Very likely related to ISPN-4564, as there seem to be 2 unjustified pauses ~ 3s and some log messages also appear to be delayed:
> {noformat}
> 08:23:48,443 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAN-p28720-t1:) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@e9a3538]
> 08:23:48,470 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAN-p28720-t1:) [JGroupsTransport] dests=[DistSyncL1RepeatableReadFuncTest-NodeAN-7764, DistSyncL1RepeatableReadFuncTest-NodeAM-739], command=SingleRpcCommand{cacheName='dist', command=PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}}, mode=SYNCHRONOUS, timeout=60000
> 08:23:50,953 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [InvocationContextInterceptor] Invoked with command PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true} and InvocationContext [org.infinispan.context.impl.NonTxInvocationContext@62801f8c]
> 08:23:50,953 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [L1ManagerImpl] Invalidating keys [key-to-the-cache] on nodes [DistSyncL1RepeatableReadFuncTest-NodeAK-9309]. Use multicast? false
> 08:23:51,060 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [JGroupsTransport] dests=[DistSyncL1RepeatableReadFuncTest-NodeAK-9309], command=SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}}, mode=SYNCHRONOUS_IGNORE_LEAVERS, timeout=60000
> 08:23:51,062 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAK-p28661-t5:) [BaseRpcInvokingCommand] Invoking command InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}, with originLocal flag set to false
> 08:23:50,972 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [CallInterceptor] Executing command: PutKeyValueCommand{key=key-to-the-cache, value=second-put, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}.
> 08:23:51,786 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAK-p28661-t5:) [InboundInvocationHandlerImpl] About to send back response null for command SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}}
> 08:23:51,796 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [CommandAwareRpcDispatcher] Responses: [sender=DistSyncL1RepeatableReadFuncTest-NodeAK-9309, received=true, suspected=false]
> 08:23:54,561 TRACE (transport-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28700-t2:) [RpcManagerImpl] Response(s) to SingleRpcCommand{cacheName='dist', command=InvalidateL1Command{num keys=1, origin=DistSyncL1RepeatableReadFuncTest-NodeAN-7764}} is {}
> 08:23:56,955 ERROR (testng-DistSyncL1RepeatableReadFuncTest:) [UnitTestTestNGListener] Test testNoEntryInL1MultipleConcurrentGetsWithInvalidation(org.infinispan.distribution.DistSyncL1RepeatableReadFuncTest) failed.
> java.util.concurrent.TimeoutException
> at java.util.concurrent.FutureTask.get(FutureTask.java:201)
> at org.infinispan.commons.util.concurrent.NotifyingFutureImpl.get(NotifyingFutureImpl.java:84)
> at org.infinispan.distribution.BaseDistSyncL1Test.testNoEntryInL1MultipleConcurrentGetsWithInvalidation(BaseDistSyncL1Test.java:217)
> 08:23:54,578 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [L1NonTxInterceptor] Allowing entry to commit as local node is owner
> 08:23:57,861 TRACE (remote-thread-DistSyncL1RepeatableReadFuncTest-NodeAM-p28701-t6:) [EntryWrappingInterceptor] About to commit entry RepeatableReadEntry(499752d9){key=key-to-the-cache, value=second-put, oldValue=first-put, isCreated=false, isChanged=true, isRemoved=false, isValid=true, skipRemoteGet=false, metadata=EmbeddedMetadata{version=null}}
> {noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-9682) NullPointerException when put to JCache
by Pedro Zapata Fernandez (Jira)
[ https://issues.redhat.com/browse/ISPN-9682?page=com.atlassian.jira.plugin... ]
Pedro Zapata Fernandez updated ISPN-9682:
-----------------------------------------
Priority: Critical (was: Blocker)
> NullPointerException when put to JCache
> ---------------------------------------
>
> Key: ISPN-9682
> URL: https://issues.redhat.com/browse/ISPN-9682
> Project: Infinispan
> Issue Type: Bug
> Components: JCache
> Affects Versions: 9.4.0.Final, 9.4.1.Final
> Reporter: Andrei Arkaev
> Assignee: Katia Aresti
> Priority: Critical
>
> After upgrade from 9.4.0.RC3 to 9.4.0.Final (and 9.4.1.Final too) I have an error
> java.lang.NullPointerException: null
> at org.infinispan.functional.impl.AbstractFunctionalMap.invokeAsync(AbstractFunctionalMap.java:127)
> at org.infinispan.functional.impl.ReadWriteMapImpl.eval(ReadWriteMapImpl.java:70)
> at org.infinispan.jcache.embedded.JCache.put(JCache.java:409)
> Cache configuration
> <local-cache
> name="SomeGlobalCache"
> simple-cache="true"
> statistics="false"
> statistics-available="false">
> <transaction mode="NONE"/>
> </local-cache>
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-4996) Problem with capacityFactor=0 and restart of all nodes with capacityFactor > 0
by Pedro Zapata Fernandez (Jira)
[ https://issues.redhat.com/browse/ISPN-4996?page=com.atlassian.jira.plugin... ]
Pedro Zapata Fernandez closed ISPN-4996.
----------------------------------------
Resolution: Out of Date
> Problem with capacityFactor=0 and restart of all nodes with capacityFactor > 0
> ------------------------------------------------------------------------------
>
> Key: ISPN-4996
> URL: https://issues.redhat.com/browse/ISPN-4996
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 7.0.2.Final
> Reporter: Enrico Olivelli
> Assignee: Dan Berindei
> Priority: Blocker
>
> I have a only one DIST_SYNC cache, most of the JVM in the cluster are configured with capacityFactor = 0 (like the distibutedlocalstorage=false property of Coherence) and some node are configured with capacityFactor>0 (for instance 1000). We are talking about 100 nodes with capacityFactor=0 and 4 nodes of the other kind, al the cluster is indide one single "site/rack". Partition Handling is off, numOwners is 1.
> When all the nodes with capacityFactor > 0 are down the cluster comes to a degraded state
> the ploblem is that even if nodes with capacityFactor>0 are up again the cluster does not recover, a full restart is needed
> If I enable partition-handling AvailablyExceptions start to be throw and I think is the expected behaviour (see the "Infinispan User Guide").
>
> I think this is the problem and it is a bug:
>
> {noformat}
> 14/11/17 09:27:25 WARN topology.CacheTopologyControlCommand: ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=shared, type=JOIN, sender=testserver1@xxxxxxx-22311, site-id=xxx, rack-id=xxx, machine-id=24 bytes, joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.impl.TopologyAwareConsistentHashFactory@78b791ef, hashFunction=MurmurHash3, numSegments=60, numOwners=1, timeout=120000, totalOrder=false, distributed=true}, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, throwable=null, viewId=3}
> java.lang.IllegalArgumentException: A cache topology's pending consistent hash must contain all the current consistent hash's members
> at org.infinispan.topology.CacheTopology.<init>(CacheTopology.java:48)
> at org.infinispan.topology.CacheTopology.<init>(CacheTopology.java:43)
> at org.infinispan.topology.ClusterCacheStatus.startQueuedRebalance(ClusterCacheStatus.java:631)
> at org.infinispan.topology.ClusterCacheStatus.queueRebalance(ClusterCacheStatus.java:85)
> at org.infinispan.partionhandling.impl.PreferAvailabilityStrategy.onJoin(PreferAvailabilityStrategy.java:22)
> at org.infinispan.topology.ClusterCacheStatus.doJoin(ClusterCacheStatus.java:540)
> at org.infinispan.topology.ClusterTopologyManagerImpl.handleJoin(ClusterTopologyManagerImpl.java:123)
> at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:158)
> at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:140)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$4.run(CommandAwareRpcDispatcher.java:278)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> After that error every "put" results in:
> {noformat}
> 14/11/17 09:27:27 ERROR interceptors.InvocationContextInterceptor: ISPN000136: Execution error
> org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 1
> at org.infinispan.statetransfer.StateTransferLockImpl.waitForTransactionData(StateTransferLockImpl.java:93)
> at org.infinispan.interceptors.base.BaseStateTransferInterceptor.waitForTransactionData(BaseStateTransferInterceptor.java:96)
> at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:188)
> at org.infinispan.statetransfer.StateTransferInterceptor.visitPutKeyValueCommand(StateTransferInterceptor.java:95)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.CacheMgmtInterceptor.updateStoreStatistics(CacheMgmtInterceptor.java:148)
> at org.infinispan.interceptors.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:134)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:102)
> at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:71)
> at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:35)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
> at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333)
> at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1576)
> at org.infinispan.cache.impl.CacheImpl.putInternal(CacheImpl.java:1054)
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1046)
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1646)
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:245)
> {noformat}
>
> This is the actual configuration:
>
> {code:java}
> GlobalConfiguration globalConfig = new GlobalConfigurationBuilder()
> .globalJmxStatistics()
> .allowDuplicateDomains(true)
> .cacheManagerName(instanceName)
> .transport()
> .defaultTransport()
> .clusterName(clustername)
> .addProperty("configurationFile", configurationFile) (udp for my cluster, approx 100 machines)
> .machineId(instanceName)
> .siteId("site1")
> .rackId("rack1")
> .nodeName(serviceName + "@" + instanceName)
> .remoteCommandThreadPool().threadPoolFactory(CachedThreadPoolExecutorFactory.create())
> .build();
> Configuration wildcard = new ConfigurationBuilder()
> .locking().lockAcquisitionTimeout(lockAcquisitionTimeout)
> .concurrencyLevel(10000).isolationLevel(IsolationLevel.READ_COMMITTED).useLockStriping(true)
> .clustering()
> .cacheMode(CacheMode.DIST_SYNC)
> .l1().lifespan(l1ttl)
> .hash().numOwners(numOwners).capacityFactor(capacityFactor)
> .partitionHandling().enabled(false)
> .stateTransfer().awaitInitialTransfer(false).timeout(initialTransferTimeout).fetchInMemoryState(false)
> .storeAsBinary().enabled(true).storeKeysAsBinary(false).storeValuesAsBinary(true)
> .jmxStatistics().enable()
> .unsafe().unreliableReturnValues(true)
> .build();
> {code}
> One workaround is to set capacityFactor = 1 instead of 0, but I do not want "simple-nodes" (with less RAM) to becaome key-owners
> For me this is a showstopper problem
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-11297) Rejoining nodes with global state may have their caches corrupted if there is a config mismatch
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-11297?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-11297:
-----------------------------------
Status: Open (was: New)
> Rejoining nodes with global state may have their caches corrupted if there is a config mismatch
> -----------------------------------------------------------------------------------------------
>
> Key: ISPN-11297
> URL: https://issues.redhat.com/browse/ISPN-11297
> Project: Infinispan
> Issue Type: Bug
> Components: Configuration, Core
> Affects Versions: 10.1.1.Final
> Reporter: Tristan Tarrant
> Assignee: Tristan Tarrant
> Priority: Critical
>
> With a persistent global state enabled, when a node that was previously part of a cluster rejoins it currently processes caches from the cluster state before the ones from the local state. This means that, if the cache configuration is incompatible, it will be overwritten with the one coming from the cluster.
> When joining the node should perform compatibility checks between caches in the cluster state and the local state before proceeding with creating them. If a mismatch is found, it should fail fast.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-8232) Transaction inconsistency during network partitions
by Pedro Zapata Fernandez (Jira)
[ https://issues.redhat.com/browse/ISPN-8232?page=com.atlassian.jira.plugin... ]
Pedro Zapata Fernandez updated ISPN-8232:
-----------------------------------------
Sprint: DataGrid Sprint #40
> Transaction inconsistency during network partitions
> ---------------------------------------------------
>
> Key: ISPN-8232
> URL: https://issues.redhat.com/browse/ISPN-8232
> Project: Infinispan
> Issue Type: Bug
> Components: Transactions
> Affects Versions: 9.1.0.Final
> Reporter: Pedro Ruivo
> Assignee: Pedro Ruivo
> Priority: Critical
> Labels: consistency
>
> In scenario where the originator stays in minor partition (in our test suite, the originator isolated tests), it is possible to a transaction to be committed and rolled back in the majority partition.
> In {{Pessimitic Locking}}, the transaction is committed in one-phase using the {{PrepareCommand}}. If the partition happens when the originator sends the {{PrepareCommand}}, the nodes in the majority partition may or may not receive it. We can have the case where some nodes receive the {{PrepareCommand}} and applied and other don't receive it.
> When the topology is updated in the majority partition, the {{TransactionTable}} rollbacks all transaction in which the originator isn't present. So, in the nodes where the {{PrepareCommand}} isn't received, the transaction is rolled back.
> The originator in the minory partition detects the partition and marks the transaction partially completed. When the merge occurs, it tries to commit the transaction again. In the nodes where the transaction is rolled back, the transaction is marked as completed and when the {{PrepareCommand}} is received, it throws an {{IllegalStateException}} ({{TransactionTable:386, getOrCreateRemoteTransaction()}}). In this case, the transaction isn't removed from the {{PartitionHandlingManager}} and our test suite fails with {{"there are pending tx".}}
> Other theoretically scenario is the {{PrepareCommand}} to be executed when no locks are acquired.
> The same issue can happen with {{Optimistic Locking}} for the {{CommitCommand}}.
> The problem is the transaction table can't identify is the node left gracefully or not. A solution would be to have an {{"expected members"}} list, ideally separated from the {{CacheTopology}} to avoid sending it every time. Also, it would need some sysadmin tools for the case where the node crashes and it won't be back online for a while (or for some reason, it doesn't need to be back online).
> A sysadmin could remove the node from this list ({{CacheTopology}} is updated and there is no need to increase it) and decide what to do with the pending transactions (or an automatic mechanism to auto-commit/rollback the transaction).
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-11297) Rejoining nodes with global state may have their caches corrupted if there is a config mismatch
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-11297?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-11297:
-----------------------------------
Sprint: DataGrid Sprint #39, DataGrid Sprint #40 (was: DataGrid Sprint #39)
> Rejoining nodes with global state may have their caches corrupted if there is a config mismatch
> -----------------------------------------------------------------------------------------------
>
> Key: ISPN-11297
> URL: https://issues.redhat.com/browse/ISPN-11297
> Project: Infinispan
> Issue Type: Bug
> Components: Configuration, Core
> Affects Versions: 10.1.1.Final
> Reporter: Tristan Tarrant
> Assignee: Tristan Tarrant
> Priority: Critical
>
> With a persistent global state enabled, when a node that was previously part of a cluster rejoins it currently processes caches from the cluster state before the ones from the local state. This means that, if the cache configuration is incompatible, it will be overwritten with the one coming from the cluster.
> When joining the node should perform compatibility checks between caches in the cluster state and the local state before proceeding with creating them. If a mismatch is found, it should fail fast.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months