[JBoss JIRA] (ISPN-4996) Problem with capacityFactor=0 and restart of all nodes with capacityFactor > 0
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-4996?page=com.atlassian.jira.plugin... ]
Dan Berindei commented on ISPN-4996:
------------------------------------
There's a similar problem when starting a single node with capacityFactor == 0
It "works" if another node with CF > 0 joins while the CF=0 node keeps retrying to join, but it will eventually time out otherwise.
The best option is probably to detect when total CF == 0 and add a "phantom" node owning all the segments, then pretend the cache is in degraded mode.
> Problem with capacityFactor=0 and restart of all nodes with capacityFactor > 0
> ------------------------------------------------------------------------------
>
> Key: ISPN-4996
> URL: https://issues.redhat.com/browse/ISPN-4996
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 7.0.2.Final
> Reporter: Enrico Olivelli
> Assignee: Dan Berindei
> Priority: Blocker
>
> I have a only one DIST_SYNC cache, most of the JVM in the cluster are configured with capacityFactor = 0 (like the distibutedlocalstorage=false property of Coherence) and some node are configured with capacityFactor>0 (for instance 1000). We are talking about 100 nodes with capacityFactor=0 and 4 nodes of the other kind, al the cluster is indide one single "site/rack". Partition Handling is off, numOwners is 1.
> When all the nodes with capacityFactor > 0 are down the cluster comes to a degraded state
> the ploblem is that even if nodes with capacityFactor>0 are up again the cluster does not recover, a full restart is needed
> If I enable partition-handling AvailablyExceptions start to be throw and I think is the expected behaviour (see the "Infinispan User Guide").
>
> I think this is the problem and it is a bug:
>
> {noformat}
> 14/11/17 09:27:25 WARN topology.CacheTopologyControlCommand: ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=shared, type=JOIN, sender=testserver1@xxxxxxx-22311, site-id=xxx, rack-id=xxx, machine-id=24 bytes, joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.impl.TopologyAwareConsistentHashFactory@78b791ef, hashFunction=MurmurHash3, numSegments=60, numOwners=1, timeout=120000, totalOrder=false, distributed=true}, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, throwable=null, viewId=3}
> java.lang.IllegalArgumentException: A cache topology's pending consistent hash must contain all the current consistent hash's members
> at org.infinispan.topology.CacheTopology.<init>(CacheTopology.java:48)
> at org.infinispan.topology.CacheTopology.<init>(CacheTopology.java:43)
> at org.infinispan.topology.ClusterCacheStatus.startQueuedRebalance(ClusterCacheStatus.java:631)
> at org.infinispan.topology.ClusterCacheStatus.queueRebalance(ClusterCacheStatus.java:85)
> at org.infinispan.partionhandling.impl.PreferAvailabilityStrategy.onJoin(PreferAvailabilityStrategy.java:22)
> at org.infinispan.topology.ClusterCacheStatus.doJoin(ClusterCacheStatus.java:540)
> at org.infinispan.topology.ClusterTopologyManagerImpl.handleJoin(ClusterTopologyManagerImpl.java:123)
> at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:158)
> at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:140)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$4.run(CommandAwareRpcDispatcher.java:278)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> After that error every "put" results in:
> {noformat}
> 14/11/17 09:27:27 ERROR interceptors.InvocationContextInterceptor: ISPN000136: Execution error
> org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 1
> at org.infinispan.statetransfer.StateTransferLockImpl.waitForTransactionData(StateTransferLockImpl.java:93)
> at org.infinispan.interceptors.base.BaseStateTransferInterceptor.waitForTransactionData(BaseStateTransferInterceptor.java:96)
> at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:188)
> at org.infinispan.statetransfer.StateTransferInterceptor.visitPutKeyValueCommand(StateTransferInterceptor.java:95)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.CacheMgmtInterceptor.updateStoreStatistics(CacheMgmtInterceptor.java:148)
> at org.infinispan.interceptors.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:134)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:102)
> at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:71)
> at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:35)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
> at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333)
> at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1576)
> at org.infinispan.cache.impl.CacheImpl.putInternal(CacheImpl.java:1054)
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1046)
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1646)
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:245)
> {noformat}
>
> This is the actual configuration:
>
> {code:java}
> GlobalConfiguration globalConfig = new GlobalConfigurationBuilder()
> .globalJmxStatistics()
> .allowDuplicateDomains(true)
> .cacheManagerName(instanceName)
> .transport()
> .defaultTransport()
> .clusterName(clustername)
> .addProperty("configurationFile", configurationFile) (udp for my cluster, approx 100 machines)
> .machineId(instanceName)
> .siteId("site1")
> .rackId("rack1")
> .nodeName(serviceName + "@" + instanceName)
> .remoteCommandThreadPool().threadPoolFactory(CachedThreadPoolExecutorFactory.create())
> .build();
> Configuration wildcard = new ConfigurationBuilder()
> .locking().lockAcquisitionTimeout(lockAcquisitionTimeout)
> .concurrencyLevel(10000).isolationLevel(IsolationLevel.READ_COMMITTED).useLockStriping(true)
> .clustering()
> .cacheMode(CacheMode.DIST_SYNC)
> .l1().lifespan(l1ttl)
> .hash().numOwners(numOwners).capacityFactor(capacityFactor)
> .partitionHandling().enabled(false)
> .stateTransfer().awaitInitialTransfer(false).timeout(initialTransferTimeout).fetchInMemoryState(false)
> .storeAsBinary().enabled(true).storeKeysAsBinary(false).storeValuesAsBinary(true)
> .jmxStatistics().enable()
> .unsafe().unreliableReturnValues(true)
> .build();
> {code}
> One workaround is to set capacityFactor = 1 instead of 0, but I do not want "simple-nodes" (with less RAM) to becaome key-owners
> For me this is a showstopper problem
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-11289) Use InfinispanServerRule externally
by Gustavo Lira e Silva (Jira)
Gustavo Lira e Silva created ISPN-11289:
-------------------------------------------
Summary: Use InfinispanServerRule externally
Key: ISPN-11289
URL: https://issues.redhat.com/browse/ISPN-11289
Project: Infinispan
Issue Type: Enhancement
Components: Test Suite
Affects Versions: 10.1.1.Final
Reporter: Gustavo Lira e Silva
Assignee: Gustavo Lira e Silva
We are trying to use InfinispanServerRule on jdg-functional-tests but we can only start the tests with container mode if we set
{code:java}
-Dserver.output.dir=jdg/server/runtime/target/rhdg-server-8.0.0
{code}
server.output.dir needs to use a relative path inside jdg project, otherwise docker container doesn't start. This is happening because this line https://github.com/infinispan/jdg/blob/13cae7b461f934f7349b776b6269f95ce3...
In other words, If we download redhat-datagrid-8.0.0-server and set -Dserver.output.dir to server downloaded docker container doesn't start
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-11282) CLI: site command isn't working properly
by Pedro Ruivo (Jira)
[ https://issues.redhat.com/browse/ISPN-11282?page=com.atlassian.jira.plugi... ]
Pedro Ruivo updated ISPN-11282:
-------------------------------
Fix Version/s: 10.1.2.Final
11.0.0.Alpha1
> CLI: site command isn't working properly
> ----------------------------------------
>
> Key: ISPN-11282
> URL: https://issues.redhat.com/browse/ISPN-11282
> Project: Infinispan
> Issue Type: Bug
> Components: CLI
> Affects Versions: 10.1.1.Final
> Reporter: Pedro Ruivo
> Assignee: Pedro Ruivo
> Priority: Major
> Fix For: 10.1.2.Final, 11.0.0.Alpha1
>
>
> * {{site status}}: option {{--site}} isn't working properly. It returns all the backups even if you use a non-existing site:
> {noformat}
> [pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache
> {
> "NYC" : "online"
> }
> [pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache --site=NYC
> {
> "NYC" : "online"
> }
> [pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache --site=ajdhds
> {
> "NYC" : "online"
> }
> {noformat}
> * {{clear-push-state-status}} operation isn't registered
> * {{bring-online}} and {{take-offline}} operations seems to fail:
> {noformat}
> [pedro-laptop-3-35787@cluster//containers/default]> site take-offline --cache=xsiteCache --site=NYC
> Not Found
> [pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache
> {
> "NYC" : "offline"
> }
> [pedro-laptop-3-35787@cluster//containers/default]> site bring-online --cache=xsiteCache --site=NYC
> Not Found
> {noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-11282) CLI: site command isn't working properly
by Pedro Ruivo (Jira)
[ https://issues.redhat.com/browse/ISPN-11282?page=com.atlassian.jira.plugi... ]
Pedro Ruivo updated ISPN-11282:
-------------------------------
Git Pull Request: https://github.com/infinispan/infinispan/pull/7837, https://github.com/infinispan/infinispan/pull/7838 (was: https://github.com/infinispan/infinispan/pull/7837)
> CLI: site command isn't working properly
> ----------------------------------------
>
> Key: ISPN-11282
> URL: https://issues.redhat.com/browse/ISPN-11282
> Project: Infinispan
> Issue Type: Bug
> Components: CLI
> Affects Versions: 10.1.1.Final
> Reporter: Pedro Ruivo
> Assignee: Pedro Ruivo
> Priority: Major
> Fix For: 10.1.2.Final, 11.0.0.Alpha1
>
>
> * {{site status}}: option {{--site}} isn't working properly. It returns all the backups even if you use a non-existing site:
> {noformat}
> [pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache
> {
> "NYC" : "online"
> }
> [pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache --site=NYC
> {
> "NYC" : "online"
> }
> [pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache --site=ajdhds
> {
> "NYC" : "online"
> }
> {noformat}
> * {{clear-push-state-status}} operation isn't registered
> * {{bring-online}} and {{take-offline}} operations seems to fail:
> {noformat}
> [pedro-laptop-3-35787@cluster//containers/default]> site take-offline --cache=xsiteCache --site=NYC
> Not Found
> [pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache
> {
> "NYC" : "offline"
> }
> [pedro-laptop-3-35787@cluster//containers/default]> site bring-online --cache=xsiteCache --site=NYC
> Not Found
> {noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months