[JBoss JIRA] (ISPN-4828) Increasing default internal thread pool size
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4828?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4828:
-------------------------------
Priority: Critical (was: Major)
> Increasing default internal thread pool size
> --------------------------------------------
>
> Key: ISPN-4828
> URL: https://issues.jboss.org/browse/ISPN-4828
> Project: Infinispan
> Issue Type: Enhancement
> Components: Configuration, Core
> Affects Versions: 7.0.0.CR1
> Reporter: Matej Čimbora
> Priority: Critical
>
> Using synchronous replication with high number of concurrent clients doing put() operations over a shared set of keys, lock-acquisition timeouts occur when various thread pools (internal, jgroups oob) do not have appropriate size.
> org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [3 seconds] on key [key_00000000000003B4] for requestor [Thread[OOB-66,default,node03-12795,5,main]]! Lock held by [Thread[OOB-314,default,node03-12795,5,main]]
> [org.infinispan.interceptors.InvocationContextInterceptor] (Stressor-1) ISPN000136: Execution error
> org.infinispan.util.concurrent.TimeoutException: org.infinispan.util.concurrent.TimeoutException: Node node04-24454 timed out
> This applies to both transactional and non-transactional configuration. The problem can be mitigated by increasing Infinispan's internal thread pool size (defined for remoteCommandsExecutor, blockingBoundedQueueThreadPool). In order to improve user experience either:
> a) When needed, the size of the thread pool should be increased as the load increases
> b) The default values should be high enough to handle even significant load (in terms of number of concurrent clients per node)
> c) The documentation should describe how the end user should size the thread pools based on expected load on the system
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 5 months
[JBoss JIRA] (ISPN-4828) Increasing default internal thread pool size
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4828?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-4828:
------------------------------------
The internal thread pool core size needs to be increased as well (or maybe just use the JGroups defaults). Currently it uses {{internal_thread_pool.min_threads="1"}}, which can lead to deadlocks with MERGE3.
> Increasing default internal thread pool size
> --------------------------------------------
>
> Key: ISPN-4828
> URL: https://issues.jboss.org/browse/ISPN-4828
> Project: Infinispan
> Issue Type: Enhancement
> Components: Configuration, Core
> Affects Versions: 7.0.0.CR1
> Reporter: Matej Čimbora
>
> Using synchronous replication with high number of concurrent clients doing put() operations over a shared set of keys, lock-acquisition timeouts occur when various thread pools (internal, jgroups oob) do not have appropriate size.
> org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [3 seconds] on key [key_00000000000003B4] for requestor [Thread[OOB-66,default,node03-12795,5,main]]! Lock held by [Thread[OOB-314,default,node03-12795,5,main]]
> [org.infinispan.interceptors.InvocationContextInterceptor] (Stressor-1) ISPN000136: Execution error
> org.infinispan.util.concurrent.TimeoutException: org.infinispan.util.concurrent.TimeoutException: Node node04-24454 timed out
> This applies to both transactional and non-transactional configuration. The problem can be mitigated by increasing Infinispan's internal thread pool size (defined for remoteCommandsExecutor, blockingBoundedQueueThreadPool). In order to improve user experience either:
> a) When needed, the size of the thread pool should be increased as the load increases
> b) The default values should be high enough to handle even significant load (in terms of number of concurrent clients per node)
> c) The documentation should describe how the end user should size the thread pools based on expected load on the system
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 5 months
[JBoss JIRA] (ISPN-4784) TestResourceTracker not working properly for OSGi tests
by Ion Savin (JIRA)
[ https://issues.jboss.org/browse/ISPN-4784?page=com.atlassian.jira.plugin.... ]
Ion Savin reassigned ISPN-4784:
-------------------------------
Assignee: Ion Savin
> TestResourceTracker not working properly for OSGi tests
> --------------------------------------------------------
>
> Key: ISPN-4784
> URL: https://issues.jboss.org/browse/ISPN-4784
> Project: Infinispan
> Issue Type: Enhancement
> Components: Test Suite - Core
> Affects Versions: 7.0.0.Beta2
> Reporter: Ion Savin
> Assignee: Ion Savin
>
> The OSGi tests are running in a different process from the test driver and are executed through RMI. The assumptions that the test name is contained in the thread name and that there's a one-to-one mapping from thread to test is no longer valid (ThreadLocal used for the test name).
> Executing the tests in integrationtests/osgi will result in many log messages similar to this one:
> {noformat}
> Test name not set in unknown thread RMI TCP Connection(3)-127.0.0.1
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 5 months
[JBoss JIRA] (ISPN-4840) LockControlCommand timeouts can cause orphaned locks.
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4840?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4840:
-------------------------------
Component/s: Core
> LockControlCommand timeouts can cause orphaned locks.
> -----------------------------------------------------
>
> Key: ISPN-4840
> URL: https://issues.jboss.org/browse/ISPN-4840
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Transactions
> Affects Versions: 7.0.0.CR1
> Reporter: Erik Salter
> Assignee: Dan Berindei
>
> If an originator times out on sending a pessimistic LockControlCommand, the receiver may still get the message. In this case, the originator will send a TxCompletionCommand. Because of this, the receiver will remove the registered remote transaction from its transaction table. If, however, another thread is processing the remote LCC command, it could acquire the lock(s) after the referencing remoteTx is removed. Thus, the affected keys will remain locked indefinitely.
> A simple solution would be to add a check to see if there's a remote transaction when the LCC thread verifies the remote transaction. This would be in addition to checking if the transaction is completed.
> See the following TRACE messages:
> -- Local TX created
> 2014-10-10 11:27:28,899 TRACE [org.infinispan.transaction.TransactionTable] (OOB-1353,session-resource-cluster,240-east-dht2.comcast.net-46326(CMC-Denver-CO)) Created a new local transaction: LocalTransaction{remoteLockedNodes=null, isMarkedForRollback=false, lockedKeys=null, backupKeyLocks=null, isFromStateTransfer=false} globalTx=GlobalTransaction:<240-east-dht2.comcast.net-46326(CMC-Denver-CO)>:859804:local, topologyId=64, age(ms)=0
> -- Remote TX created
> 2014-10-10 11:27:28,525 TRACE [org.infinispan.transaction.TransactionTable] (OOB-2850,session-resource-cluster,240-west-dht2.comcast.net-30190(CH2-Chicago-IL)) Created and registered remote transaction RemoteTransaction{modifications=[], lookedUpEntries={}, lockedKeys=null, backupKeyLocks=null, missingLookedUpEntries=false, isMarkedForRollback=false} globalTx=GlobalTransaction:<240-east-dht2.comcast.net-46326(CMC-Denver-CO)>:859804:remote, topologyId=63, age(ms)=0
> -- Originator times out on LCC
> 2014-10-10 11:27:36,902 WARN [org.infinispan.remoting.rpc.RpcManagerImpl] (OOB-1353,session-resource-cluster,240-east-dht2.comcast.net-46326(CMC-Denver-CO)) ISPN000071: Caught exception when handling command LockControlCommand{cache=eigAllocation, keys=[EdgeResourceCacheKey[edgeDeviceId=2878,resourceId=11130]], flags=null, unlock=false}
> org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to 240-west-dht2.comcast.net-30190(CH2-Chicago-IL)
> at org.infinispan.util.Util.rewrapAsCacheException(Util.java:542)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:188)
> ...
> at org.infinispan.interceptors.distribution.TxDistributionInterceptor.visitLockControlCommand(TxDistributionInterceptor.java:204)
> at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
> at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:134)
> at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:177)
> at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
> at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.visitLockControlCommand(PessimisticLockingInterceptor.java:235)
> at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
> at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:134)
> at org.infinispan.commands.AbstractVisitor.visitLockControlCommand(AbstractVisitor.java:177)
> at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:131)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:120)
> at org.infinispan.interceptors.TxInterceptor.invokeNextInterceptorAndVerifyTransaction(TxInterceptor.java:116)
> ...
> at org.infinispan.CacheImpl.lock(CacheImpl.java:565)
> at org.infinispan.CacheImpl.lock(CacheImpl.java:548)
> ...
> Caused by: org.jgroups.TimeoutException: timeout sending message to 240-west-dht2.comcast.net-30190(CH2-Chicago-IL)
> at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:390)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:303)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:181)
> ... 91 more
> 2014-10-10 11:27:36,936 TRACE [org.infinispan.interceptors.TxInterceptor] (OOB-1353,session-resource-cluster,240-east-dht2.comcast.net-46326(CMC-Denver-CO)) invokeNextInterceptorAndVerifyTransaction :: originatorMissing=false, alreadyCompleted=false
> 2014-10-10 11:27:36,937 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (OOB-1353,session-resource-cluster,240-east-dht2.comcast.net-46326(CMC-Denver-CO)) ISPN000136: Execution error
> org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to 240-west-dht2.comcast.net-30190(CH2-Chicago-IL)
> at org.infinispan.util.Util.rewrapAsCacheException(Util.java:542)
> ...
> -- Same stack trace
> -- Remote TX removed -- must be from TxCompletionMessage since isMarkedForRollback == false
> 2014-10-10 11:27:36,530 TRACE [org.infinispan.transaction.TransactionTable] (OOB-2963,session-resource-cluster,240-west-dht2.comcast.net-30190(CH2-Chicago-IL)) Removed remote transaction GlobalTransaction:<240-east-dht2.comcast.net-46326(CMC-Denver-CO)>:859804:local ? RemoteTransaction{modifications=[], lookedUpEntries={}, lockedKeys=null, backupKeyLocks=null, missingLookedUpEntries=false, isMarkedForRollback=false} globalTx=GlobalTransaction:<240-east-dht2.comcast.net-46326(CMC-Denver-CO)>:859804:remote, topologyId=63, age(ms)=8004
> 2014-10-10 11:27:36,530 TRACE [org.infinispan.transaction.TransactionTable] (OOB-2963,session-resource-cluster,240-west-dht2.comcast.net-30190(CH2-Chicago-IL)) Removed RemoteTransaction{modifications=[], lookedUpEntries={}, lockedKeys=null, backupKeyLocks=null, missingLookedUpEntries=false, isMarkedForRollback=false} globalTx=GlobalTransaction:<240-east-dht2.comcast.net-46326(CMC-Denver-CO)>:859804:remote, topologyId=63, age(ms)=8004 from transaction table.
> -- Local TX removed
> 2014-10-10 11:27:36,937 TRACE [org.infinispan.transaction.TransactionTable] (OOB-1353,session-resource-cluster,240-east-dht2.comcast.net-46326(CMC-Denver-CO)) Removed LocalTransaction{remoteLockedNodes=[240-east-dht2.comcast.net-46326(CMC-Denver-CO), 240-west-dht2.comcast.net-30190(CH2-Chicago-IL)], isMarkedForRollback=false, lockedKeys=null, backupKeyLocks=null, isFromStateTransfer=false} globalTx=GlobalTransaction:<240-east-dht2.comcast.net-46326(CMC-Denver-CO)>:859804:local, topologyId=64, age(ms)=8038 from transaction table.
> -- Lock acquisition completes!
> 2014-10-10 11:27:39,195 TRACE [org.infinispan.util.concurrent.locks.LockManagerImpl] (OOB-2850,session-resource-cluster,240-west-dht2.comcast.net-30190(CH2-Chicago-IL)) Attempting to lock EdgeResourceCacheKey[edgeDeviceId=2878,resourceId=11130] with acquisition timeout of 5000 millis
> 2014-10-10 11:27:39,198 TRACE [org.infinispan.util.concurrent.locks.LockManagerImpl] (OOB-2850,session-resource-cluster,240-west-dht2.comcast.net-30190(CH2-Chicago-IL)) Successfully acquired lock EdgeResourceCacheKey[edgeDeviceId=2878,resourceId=11130]!
> 2014-10-10 11:27:39,202 TRACE [org.infinispan.transaction.TransactionTable] (OOB-2850,session-resource-cluster,240-west-dht2.comcast.net-30190(CH2-Chicago-IL)) Transaction=GlobalTransaction:<240-east-dht2.comcast.net-46326(CMC-Denver-CO)>:859804:remote, nodeMaxPrunedTxId=858439
> -- Not in completion map since transaction was removed without commit or rollback. Lock never releases
> 2014-10-10 11:27:39,203 TRACE [org.infinispan.interceptors.TxInterceptor] (OOB-2850,session-resource-cluster,240-west-dht2.comcast.net-30190(CH2-Chicago-IL)) invokeNextInterceptorAndVerifyTransaction :: originatorMissing=false, alreadyCompleted=false
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 5 months
[JBoss JIRA] (ISPN-4673) Split-brain: get() returns null when all owners are removed from view
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4673?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4673:
-------------------------------
Fix Version/s: 7.0.0.CR2
> Split-brain: get() returns null when all owners are removed from view
> ---------------------------------------------------------------------
>
> Key: ISPN-4673
> URL: https://issues.jboss.org/browse/ISPN-4673
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 7.0.0.Beta1
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Blocker
> Fix For: 7.0.0.CR2
>
>
> After split brain: when calling {{cache.get()}} for an entry that has all owners in the missing partition, JGroupsTransport removes the target nodes that are no longer members and then returns empty response map
> BaseDistributionInterceptor.invokeClusterGetCommandRemotely takes empty map as null response, although the entry is just not available.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 5 months
[JBoss JIRA] (ISPN-4673) Split-brain: get() returns null when all owners are removed from view
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4673?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4673:
-------------------------------
Priority: Blocker (was: Critical)
> Split-brain: get() returns null when all owners are removed from view
> ---------------------------------------------------------------------
>
> Key: ISPN-4673
> URL: https://issues.jboss.org/browse/ISPN-4673
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 7.0.0.Beta1
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Blocker
>
> After split brain: when calling {{cache.get()}} for an entry that has all owners in the missing partition, JGroupsTransport removes the target nodes that are no longer members and then returns empty response map
> BaseDistributionInterceptor.invokeClusterGetCommandRemotely takes empty map as null response, although the entry is just not available.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 5 months