[JBoss JIRA] (ISPN-11282) CLI: site command isn't working properly
by Pedro Ruivo (Jira)
[ https://issues.redhat.com/browse/ISPN-11282?page=com.atlassian.jira.plugi... ]
Pedro Ruivo updated ISPN-11282:
-------------------------------
Description:
* {{site status}}: option {{--site}} isn't working properly. It returns all the backups even if you use a non-existing site:
{noformat}
[pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache
{
"NYC" : "online"
}
[pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache --site=NYC
{
"NYC" : "online"
}
[pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache --site=ajdhds
{
"NYC" : "online"
}
{noformat}
* {{clear-push-state-status}} operation isn't registered
* {{bring-online}} and {{take-offline}} operations seems to fail:
{noformat}
[pedro-laptop-3-35787@cluster//containers/default]> site take-offline --cache=xsiteCache --site=NYC
Not Found
[pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache
{
"NYC" : "offline"
}
[pedro-laptop-3-35787@cluster//containers/default]> site bring-online --cache=xsiteCache --site=NYC
Not Found
{noformat}
was:
* {{site status}}: option {{--site}} isn't working properly. It returns all the backups even if you use a non-existing site:
{noformat}
[pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache
{
"NYC" : "online"
}
[pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache --site=NYC
{
"NYC" : "online"
}
[pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache --site=ajdhds
{
"NYC" : "online"
}
{noformat}
* {clear-push-state-status} operation isn't registered
* {bring-online} and {take-offline} operations seems to fail:
{noformat}
[pedro-laptop-3-35787@cluster//containers/default]> site take-offline --cache=xsiteCache --site=NYC
Not Found
[pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache
{
"NYC" : "offline"
}
[pedro-laptop-3-35787@cluster//containers/default]> site bring-online --cache=xsiteCache --site=NYC
Not Found
{noformat}
> CLI: site command isn't working properly
> ----------------------------------------
>
> Key: ISPN-11282
> URL: https://issues.redhat.com/browse/ISPN-11282
> Project: Infinispan
> Issue Type: Bug
> Components: CLI
> Affects Versions: 10.1.1.Final
> Reporter: Pedro Ruivo
> Assignee: Pedro Ruivo
> Priority: Major
>
> * {{site status}}: option {{--site}} isn't working properly. It returns all the backups even if you use a non-existing site:
> {noformat}
> [pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache
> {
> "NYC" : "online"
> }
> [pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache --site=NYC
> {
> "NYC" : "online"
> }
> [pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache --site=ajdhds
> {
> "NYC" : "online"
> }
> {noformat}
> * {{clear-push-state-status}} operation isn't registered
> * {{bring-online}} and {{take-offline}} operations seems to fail:
> {noformat}
> [pedro-laptop-3-35787@cluster//containers/default]> site take-offline --cache=xsiteCache --site=NYC
> Not Found
> [pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache
> {
> "NYC" : "offline"
> }
> [pedro-laptop-3-35787@cluster//containers/default]> site bring-online --cache=xsiteCache --site=NYC
> Not Found
> {noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 8 months
[JBoss JIRA] (ISPN-11282) CLI: site command isn't working properly
by Pedro Ruivo (Jira)
Pedro Ruivo created ISPN-11282:
----------------------------------
Summary: CLI: site command isn't working properly
Key: ISPN-11282
URL: https://issues.redhat.com/browse/ISPN-11282
Project: Infinispan
Issue Type: Bug
Components: CLI
Affects Versions: 10.1.1.Final
Reporter: Pedro Ruivo
Assignee: Pedro Ruivo
* {{site status}}: option {{--site}} isn't working properly. It returns all the backups even if you use a non-existing site:
{noformat}
[pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache
{
"NYC" : "online"
}
[pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache --site=NYC
{
"NYC" : "online"
}
[pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache --site=ajdhds
{
"NYC" : "online"
}
{noformat}
* {clear-push-state-status} operation isn't registered
* {bring-online} and {take-offline} operations seems to fail:
{noformat}
[pedro-laptop-3-35787@cluster//containers/default]> site take-offline --cache=xsiteCache --site=NYC
Not Found
[pedro-laptop-3-35787@cluster//containers/default]> site status --cache=xsiteCache
{
"NYC" : "offline"
}
[pedro-laptop-3-35787@cluster//containers/default]> site bring-online --cache=xsiteCache --site=NYC
Not Found
{noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 8 months
[JBoss JIRA] (ISPN-11266) Split CacheTopologyControlCommand into individual commands
by Ryan Emerson (Jira)
[ https://issues.redhat.com/browse/ISPN-11266?page=com.atlassian.jira.plugi... ]
Ryan Emerson updated ISPN-11266:
--------------------------------
Status: Open (was: New)
> Split CacheTopologyControlCommand into individual commands
> ----------------------------------------------------------
>
> Key: ISPN-11266
> URL: https://issues.redhat.com/browse/ISPN-11266
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 10.1.1.Final
> Reporter: Ryan Emerson
> Assignee: Ryan Emerson
> Priority: Major
> Fix For: 11.0.0.Alpha1
>
>
> Currently the {{CacheTopologyControlCommand}} uses a Type field and a switch statement to differentiate between various topology actions. This worked well for the old Externalizer approach, however it does not fit well with protobuf messages. Instead, the CacheTopologyControlCommand should be split into individual commands, e.g. a TopologyJoinCommand etc.
> This enables the logic of the command types to be separated, making it easier to maintain backwards compatibility in the long term. Each command will use a ProtoStream TypeId in the range of 1000 -> 3999, so the cost of two bytes is the same as the existing class ID plus enum Type that we require with the single CacheTopologyControlCommand.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 8 months
[JBoss JIRA] (ISPN-4996) Problem with capacityFactor=0 and restart of all nodes with capacityFactor > 0
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-4996?page=com.atlassian.jira.plugin... ]
Dan Berindei commented on ISPN-4996:
------------------------------------
[~johnou] as a workaround, you can replace
{code:java}
globalConfigurationBuilder.zeroCapacityNode(true);
{code}
or
{code:java}
builder.clustering().hash().capacityFactor(0f);
{code}
with
{code:java}
builder.clustering().hash().capacityFactor(0.00001f);
{code}
That will make the node own all the segments when there is no other node with capacity factor >= 1f, and zero segments when there is a node with capacity factor >= 1.
> Problem with capacityFactor=0 and restart of all nodes with capacityFactor > 0
> ------------------------------------------------------------------------------
>
> Key: ISPN-4996
> URL: https://issues.redhat.com/browse/ISPN-4996
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 7.0.2.Final
> Reporter: Enrico Olivelli
> Assignee: Dan Berindei
> Priority: Blocker
>
> I have a only one DIST_SYNC cache, most of the JVM in the cluster are configured with capacityFactor = 0 (like the distibutedlocalstorage=false property of Coherence) and some node are configured with capacityFactor>0 (for instance 1000). We are talking about 100 nodes with capacityFactor=0 and 4 nodes of the other kind, al the cluster is indide one single "site/rack". Partition Handling is off, numOwners is 1.
> When all the nodes with capacityFactor > 0 are down the cluster comes to a degraded state
> the ploblem is that even if nodes with capacityFactor>0 are up again the cluster does not recover, a full restart is needed
> If I enable partition-handling AvailablyExceptions start to be throw and I think is the expected behaviour (see the "Infinispan User Guide").
>
> I think this is the problem and it is a bug:
>
> {noformat}
> 14/11/17 09:27:25 WARN topology.CacheTopologyControlCommand: ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=shared, type=JOIN, sender=testserver1@xxxxxxx-22311, site-id=xxx, rack-id=xxx, machine-id=24 bytes, joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.impl.TopologyAwareConsistentHashFactory@78b791ef, hashFunction=MurmurHash3, numSegments=60, numOwners=1, timeout=120000, totalOrder=false, distributed=true}, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, throwable=null, viewId=3}
> java.lang.IllegalArgumentException: A cache topology's pending consistent hash must contain all the current consistent hash's members
> at org.infinispan.topology.CacheTopology.<init>(CacheTopology.java:48)
> at org.infinispan.topology.CacheTopology.<init>(CacheTopology.java:43)
> at org.infinispan.topology.ClusterCacheStatus.startQueuedRebalance(ClusterCacheStatus.java:631)
> at org.infinispan.topology.ClusterCacheStatus.queueRebalance(ClusterCacheStatus.java:85)
> at org.infinispan.partionhandling.impl.PreferAvailabilityStrategy.onJoin(PreferAvailabilityStrategy.java:22)
> at org.infinispan.topology.ClusterCacheStatus.doJoin(ClusterCacheStatus.java:540)
> at org.infinispan.topology.ClusterTopologyManagerImpl.handleJoin(ClusterTopologyManagerImpl.java:123)
> at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:158)
> at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:140)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$4.run(CommandAwareRpcDispatcher.java:278)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> After that error every "put" results in:
> {noformat}
> 14/11/17 09:27:27 ERROR interceptors.InvocationContextInterceptor: ISPN000136: Execution error
> org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 1
> at org.infinispan.statetransfer.StateTransferLockImpl.waitForTransactionData(StateTransferLockImpl.java:93)
> at org.infinispan.interceptors.base.BaseStateTransferInterceptor.waitForTransactionData(BaseStateTransferInterceptor.java:96)
> at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:188)
> at org.infinispan.statetransfer.StateTransferInterceptor.visitPutKeyValueCommand(StateTransferInterceptor.java:95)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.CacheMgmtInterceptor.updateStoreStatistics(CacheMgmtInterceptor.java:148)
> at org.infinispan.interceptors.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:134)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:102)
> at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:71)
> at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:35)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
> at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333)
> at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1576)
> at org.infinispan.cache.impl.CacheImpl.putInternal(CacheImpl.java:1054)
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1046)
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1646)
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:245)
> {noformat}
>
> This is the actual configuration:
>
> {code:java}
> GlobalConfiguration globalConfig = new GlobalConfigurationBuilder()
> .globalJmxStatistics()
> .allowDuplicateDomains(true)
> .cacheManagerName(instanceName)
> .transport()
> .defaultTransport()
> .clusterName(clustername)
> .addProperty("configurationFile", configurationFile) (udp for my cluster, approx 100 machines)
> .machineId(instanceName)
> .siteId("site1")
> .rackId("rack1")
> .nodeName(serviceName + "@" + instanceName)
> .remoteCommandThreadPool().threadPoolFactory(CachedThreadPoolExecutorFactory.create())
> .build();
> Configuration wildcard = new ConfigurationBuilder()
> .locking().lockAcquisitionTimeout(lockAcquisitionTimeout)
> .concurrencyLevel(10000).isolationLevel(IsolationLevel.READ_COMMITTED).useLockStriping(true)
> .clustering()
> .cacheMode(CacheMode.DIST_SYNC)
> .l1().lifespan(l1ttl)
> .hash().numOwners(numOwners).capacityFactor(capacityFactor)
> .partitionHandling().enabled(false)
> .stateTransfer().awaitInitialTransfer(false).timeout(initialTransferTimeout).fetchInMemoryState(false)
> .storeAsBinary().enabled(true).storeKeysAsBinary(false).storeValuesAsBinary(true)
> .jmxStatistics().enable()
> .unsafe().unreliableReturnValues(true)
> .build();
> {code}
> One workaround is to set capacityFactor = 1 instead of 0, but I do not want "simple-nodes" (with less RAM) to becaome key-owners
> For me this is a showstopper problem
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 8 months
[JBoss JIRA] (ISPN-11227) Cluster fails to startup due to initial state transfer timing out
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-11227?page=com.atlassian.jira.plugi... ]
Dan Berindei closed ISPN-11227.
-------------------------------
Resolution: Duplicate Issue
> Cluster fails to startup due to initial state transfer timing out
> -----------------------------------------------------------------
>
> Key: ISPN-11227
> URL: https://issues.redhat.com/browse/ISPN-11227
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 10.1.1.Final
> Reporter: Johno Crawford
> Priority: Major
> Attachments: ISPN11227.zip
>
>
> If a zero capacity node is part of a running cluster and all other nodes are restarted, the nodes will hang on startup.
> {code:java}
> "ForkJoinPool.commonPool-worker-2@11514" daemon prio=5 tid=0xa3 nid=NA waiting
> java.lang.Thread.State: WAITING
> at sun.misc.Unsafe.park(Unsafe.java:-1)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:270)
> at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1091)
> at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:513)
> at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:693)
> at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:632)
> at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:517)
> at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:498)
> at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:491)
> {code}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 8 months
[JBoss JIRA] (ISPN-4996) Problem with capacityFactor=0 and restart of all nodes with capacityFactor > 0
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-4996?page=com.atlassian.jira.plugin... ]
Dan Berindei updated ISPN-4996:
-------------------------------
Description:
I have a only one DIST_SYNC cache, most of the JVM in the cluster are configured with capacityFactor = 0 (like the distibutedlocalstorage=false property of Coherence) and some node are configured with capacityFactor>0 (for instance 1000). We are talking about 100 nodes with capacityFactor=0 and 4 nodes of the other kind, al the cluster is indide one single "site/rack". Partition Handling is off, numOwners is 1.
When all the nodes with capacityFactor > 0 are down the cluster comes to a degraded state
the ploblem is that even if nodes with capacityFactor>0 are up again the cluster does not recover, a full restart is needed
If I enable partition-handling AvailablyExceptions start to be throw and I think is the expected behaviour (see the "Infinispan User Guide").
I think this is the problem and it is a bug:
{noformat}
14/11/17 09:27:25 WARN topology.CacheTopologyControlCommand: ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=shared, type=JOIN, sender=testserver1@xxxxxxx-22311, site-id=xxx, rack-id=xxx, machine-id=24 bytes, joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.impl.TopologyAwareConsistentHashFactory@78b791ef, hashFunction=MurmurHash3, numSegments=60, numOwners=1, timeout=120000, totalOrder=false, distributed=true}, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, throwable=null, viewId=3}
java.lang.IllegalArgumentException: A cache topology's pending consistent hash must contain all the current consistent hash's members
at org.infinispan.topology.CacheTopology.<init>(CacheTopology.java:48)
at org.infinispan.topology.CacheTopology.<init>(CacheTopology.java:43)
at org.infinispan.topology.ClusterCacheStatus.startQueuedRebalance(ClusterCacheStatus.java:631)
at org.infinispan.topology.ClusterCacheStatus.queueRebalance(ClusterCacheStatus.java:85)
at org.infinispan.partionhandling.impl.PreferAvailabilityStrategy.onJoin(PreferAvailabilityStrategy.java:22)
at org.infinispan.topology.ClusterCacheStatus.doJoin(ClusterCacheStatus.java:540)
at org.infinispan.topology.ClusterTopologyManagerImpl.handleJoin(ClusterTopologyManagerImpl.java:123)
at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:158)
at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:140)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$4.run(CommandAwareRpcDispatcher.java:278)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}
After that error every "put" results in:
{noformat}
14/11/17 09:27:27 ERROR interceptors.InvocationContextInterceptor: ISPN000136: Execution error
org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 1
at org.infinispan.statetransfer.StateTransferLockImpl.waitForTransactionData(StateTransferLockImpl.java:93)
at org.infinispan.interceptors.base.BaseStateTransferInterceptor.waitForTransactionData(BaseStateTransferInterceptor.java:96)
at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:188)
at org.infinispan.statetransfer.StateTransferInterceptor.visitPutKeyValueCommand(StateTransferInterceptor.java:95)
at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
at org.infinispan.interceptors.CacheMgmtInterceptor.updateStoreStatistics(CacheMgmtInterceptor.java:148)
at org.infinispan.interceptors.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:134)
at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:102)
at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:71)
at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:35)
at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333)
at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1576)
at org.infinispan.cache.impl.CacheImpl.putInternal(CacheImpl.java:1054)
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1046)
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1646)
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:245)
{noformat}
This is the actual configuration:
{code:java}
GlobalConfiguration globalConfig = new GlobalConfigurationBuilder()
.globalJmxStatistics()
.allowDuplicateDomains(true)
.cacheManagerName(instanceName)
.transport()
.defaultTransport()
.clusterName(clustername)
.addProperty("configurationFile", configurationFile) (udp for my cluster, approx 100 machines)
.machineId(instanceName)
.siteId("site1")
.rackId("rack1")
.nodeName(serviceName + "@" + instanceName)
.remoteCommandThreadPool().threadPoolFactory(CachedThreadPoolExecutorFactory.create())
.build();
Configuration wildcard = new ConfigurationBuilder()
.locking().lockAcquisitionTimeout(lockAcquisitionTimeout)
.concurrencyLevel(10000).isolationLevel(IsolationLevel.READ_COMMITTED).useLockStriping(true)
.clustering()
.cacheMode(CacheMode.DIST_SYNC)
.l1().lifespan(l1ttl)
.hash().numOwners(numOwners).capacityFactor(capacityFactor)
.partitionHandling().enabled(false)
.stateTransfer().awaitInitialTransfer(false).timeout(initialTransferTimeout).fetchInMemoryState(false)
.storeAsBinary().enabled(true).storeKeysAsBinary(false).storeValuesAsBinary(true)
.jmxStatistics().enable()
.unsafe().unreliableReturnValues(true)
.build();
{code}
One workaround is to set capacityFactor = 1 instead of 0, but I do not want "simple-nodes" (with less RAM) to becaome key-owners
For me this is a showstopper problem
was:
I have a only one DIST_SYNC cache, most of the JVM in the cluster are configured with capacityFactor = 0 (like the distibutedlocalstorage=false property of Coherence) and some node are configured with capacityFactor>0 (for instance 1000). We are talking about 100 nodes with capacityFactor=0 and 4 nodes of the other kind, al the cluster is indide one single "site/rack". Partition Handling is off, numOwners is 1.
When all the nodes with capacityFactor > 0 are down the cluster comes to a degraded state
the ploblem is that even if nodes with capacityFactor>0 are up again the cluster does not recover, a full restart is needed
If I enable partition-handling AvailablyExceptions start to be throw and I think is the expected behaviour (see the "Infinispan User Guide").
I think this is the problem and it is a bug:
14/11/17 09:27:25 WARN topology.CacheTopologyControlCommand: ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=shared, type=JOIN, sender=testserver1@xxxxxxx-22311, site-id=xxx, rack-id=xxx, machine-id=24 bytes, joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.impl.TopologyAwareConsistentHashFactory@78b791ef, hashFunction=MurmurHash3, numSegments=60, numOwners=1, timeout=120000, totalOrder=false, distributed=true}, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, throwable=null, viewId=3}
java.lang.IllegalArgumentException: A cache topology's pending consistent hash must contain all the current consistent hash's members
at org.infinispan.topology.CacheTopology.<init>(CacheTopology.java:48)
at org.infinispan.topology.CacheTopology.<init>(CacheTopology.java:43)
at org.infinispan.topology.ClusterCacheStatus.startQueuedRebalance(ClusterCacheStatus.java:631)
at org.infinispan.topology.ClusterCacheStatus.queueRebalance(ClusterCacheStatus.java:85)
at org.infinispan.partionhandling.impl.PreferAvailabilityStrategy.onJoin(PreferAvailabilityStrategy.java:22)
at org.infinispan.topology.ClusterCacheStatus.doJoin(ClusterCacheStatus.java:540)
at org.infinispan.topology.ClusterTopologyManagerImpl.handleJoin(ClusterTopologyManagerImpl.java:123)
at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:158)
at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:140)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$4.run(CommandAwareRpcDispatcher.java:278)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
After that error every "put" results in:
14/11/17 09:27:27 ERROR interceptors.InvocationContextInterceptor: ISPN000136: Execution error
org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 1
at org.infinispan.statetransfer.StateTransferLockImpl.waitForTransactionData(StateTransferLockImpl.java:93)
at org.infinispan.interceptors.base.BaseStateTransferInterceptor.waitForTransactionData(BaseStateTransferInterceptor.java:96)
at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:188)
at org.infinispan.statetransfer.StateTransferInterceptor.visitPutKeyValueCommand(StateTransferInterceptor.java:95)
at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
at org.infinispan.interceptors.CacheMgmtInterceptor.updateStoreStatistics(CacheMgmtInterceptor.java:148)
at org.infinispan.interceptors.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:134)
at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:102)
at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:71)
at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:35)
at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333)
at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1576)
at org.infinispan.cache.impl.CacheImpl.putInternal(CacheImpl.java:1054)
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1046)
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1646)
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:245)
This is the actual configuration:
GlobalConfiguration globalConfig = new GlobalConfigurationBuilder()
.globalJmxStatistics()
.allowDuplicateDomains(true)
.cacheManagerName(instanceName)
.transport()
.defaultTransport()
.clusterName(clustername)
.addProperty("configurationFile", configurationFile) (udp for my cluster, approx 100 machines)
.machineId(instanceName)
.siteId("site1")
.rackId("rack1")
.nodeName(serviceName + "@" + instanceName)
.remoteCommandThreadPool().threadPoolFactory(CachedThreadPoolExecutorFactory.create())
.build();
Configuration wildcard = new ConfigurationBuilder()
.locking().lockAcquisitionTimeout(lockAcquisitionTimeout)
.concurrencyLevel(10000).isolationLevel(IsolationLevel.READ_COMMITTED).useLockStriping(true)
.clustering()
.cacheMode(CacheMode.DIST_SYNC)
.l1().lifespan(l1ttl)
.hash().numOwners(numOwners).capacityFactor(capacityFactor)
.partitionHandling().enabled(false)
.stateTransfer().awaitInitialTransfer(false).timeout(initialTransferTimeout).fetchInMemoryState(false)
.storeAsBinary().enabled(true).storeKeysAsBinary(false).storeValuesAsBinary(true)
.jmxStatistics().enable()
.unsafe().unreliableReturnValues(true)
.build();
One workaround is to set capacityFactor = 1 instead of 0, but I do not want "simple-nodes" (with less RAM) to becaome key-owners
For me this is a showstopper problem
> Problem with capacityFactor=0 and restart of all nodes with capacityFactor > 0
> ------------------------------------------------------------------------------
>
> Key: ISPN-4996
> URL: https://issues.redhat.com/browse/ISPN-4996
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 7.0.2.Final
> Reporter: Enrico Olivelli
> Assignee: Dan Berindei
> Priority: Blocker
>
> I have a only one DIST_SYNC cache, most of the JVM in the cluster are configured with capacityFactor = 0 (like the distibutedlocalstorage=false property of Coherence) and some node are configured with capacityFactor>0 (for instance 1000). We are talking about 100 nodes with capacityFactor=0 and 4 nodes of the other kind, al the cluster is indide one single "site/rack". Partition Handling is off, numOwners is 1.
> When all the nodes with capacityFactor > 0 are down the cluster comes to a degraded state
> the ploblem is that even if nodes with capacityFactor>0 are up again the cluster does not recover, a full restart is needed
> If I enable partition-handling AvailablyExceptions start to be throw and I think is the expected behaviour (see the "Infinispan User Guide").
>
> I think this is the problem and it is a bug:
>
> {noformat}
> 14/11/17 09:27:25 WARN topology.CacheTopologyControlCommand: ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=shared, type=JOIN, sender=testserver1@xxxxxxx-22311, site-id=xxx, rack-id=xxx, machine-id=24 bytes, joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.impl.TopologyAwareConsistentHashFactory@78b791ef, hashFunction=MurmurHash3, numSegments=60, numOwners=1, timeout=120000, totalOrder=false, distributed=true}, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, throwable=null, viewId=3}
> java.lang.IllegalArgumentException: A cache topology's pending consistent hash must contain all the current consistent hash's members
> at org.infinispan.topology.CacheTopology.<init>(CacheTopology.java:48)
> at org.infinispan.topology.CacheTopology.<init>(CacheTopology.java:43)
> at org.infinispan.topology.ClusterCacheStatus.startQueuedRebalance(ClusterCacheStatus.java:631)
> at org.infinispan.topology.ClusterCacheStatus.queueRebalance(ClusterCacheStatus.java:85)
> at org.infinispan.partionhandling.impl.PreferAvailabilityStrategy.onJoin(PreferAvailabilityStrategy.java:22)
> at org.infinispan.topology.ClusterCacheStatus.doJoin(ClusterCacheStatus.java:540)
> at org.infinispan.topology.ClusterTopologyManagerImpl.handleJoin(ClusterTopologyManagerImpl.java:123)
> at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:158)
> at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:140)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$4.run(CommandAwareRpcDispatcher.java:278)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> After that error every "put" results in:
> {noformat}
> 14/11/17 09:27:27 ERROR interceptors.InvocationContextInterceptor: ISPN000136: Execution error
> org.infinispan.util.concurrent.TimeoutException: Timed out waiting for topology 1
> at org.infinispan.statetransfer.StateTransferLockImpl.waitForTransactionData(StateTransferLockImpl.java:93)
> at org.infinispan.interceptors.base.BaseStateTransferInterceptor.waitForTransactionData(BaseStateTransferInterceptor.java:96)
> at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:188)
> at org.infinispan.statetransfer.StateTransferInterceptor.visitPutKeyValueCommand(StateTransferInterceptor.java:95)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.CacheMgmtInterceptor.updateStoreStatistics(CacheMgmtInterceptor.java:148)
> at org.infinispan.interceptors.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:134)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:102)
> at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:71)
> at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:35)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71)
> at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333)
> at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1576)
> at org.infinispan.cache.impl.CacheImpl.putInternal(CacheImpl.java:1054)
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1046)
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1646)
> at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:245)
> {noformat}
>
> This is the actual configuration:
>
> {code:java}
> GlobalConfiguration globalConfig = new GlobalConfigurationBuilder()
> .globalJmxStatistics()
> .allowDuplicateDomains(true)
> .cacheManagerName(instanceName)
> .transport()
> .defaultTransport()
> .clusterName(clustername)
> .addProperty("configurationFile", configurationFile) (udp for my cluster, approx 100 machines)
> .machineId(instanceName)
> .siteId("site1")
> .rackId("rack1")
> .nodeName(serviceName + "@" + instanceName)
> .remoteCommandThreadPool().threadPoolFactory(CachedThreadPoolExecutorFactory.create())
> .build();
> Configuration wildcard = new ConfigurationBuilder()
> .locking().lockAcquisitionTimeout(lockAcquisitionTimeout)
> .concurrencyLevel(10000).isolationLevel(IsolationLevel.READ_COMMITTED).useLockStriping(true)
> .clustering()
> .cacheMode(CacheMode.DIST_SYNC)
> .l1().lifespan(l1ttl)
> .hash().numOwners(numOwners).capacityFactor(capacityFactor)
> .partitionHandling().enabled(false)
> .stateTransfer().awaitInitialTransfer(false).timeout(initialTransferTimeout).fetchInMemoryState(false)
> .storeAsBinary().enabled(true).storeKeysAsBinary(false).storeValuesAsBinary(true)
> .jmxStatistics().enable()
> .unsafe().unreliableReturnValues(true)
> .build();
> {code}
> One workaround is to set capacityFactor = 1 instead of 0, but I do not want "simple-nodes" (with less RAM) to becaome key-owners
> For me this is a showstopper problem
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 8 months
[JBoss JIRA] (ISPN-11231) Transcoder lookup is inefficient
by Ryan Emerson (Jira)
[ https://issues.redhat.com/browse/ISPN-11231?page=com.atlassian.jira.plugi... ]
Ryan Emerson resolved ISPN-11231.
---------------------------------
Fix Version/s: 10.1.2.Final
Resolution: Done
> Transcoder lookup is inefficient
> --------------------------------
>
> Key: ISPN-11231
> URL: https://issues.redhat.com/browse/ISPN-11231
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.4.17.Final, 10.1.1.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 9.4.18.Final, 10.1.2.Final, 11.0.0.Alpha1
>
>
> {{EncoderRegistryImpl}} stores transcoders in a set, without any lookup optimization because it assumes the number of transcoders is small.
> However, {{InternalCacheFactory.bootstrap()}} registers a "different" {{TranscoderMarshallerAdapter}} for each cache, even though they all delegate to the same global marshaller.
> I believe the global marshaller transcoder adapter should be registered only once in {{EncoderRegistryFactory}}, and ideally we should have a way of looking up transcoders by media types.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 8 months