[JBoss JIRA] (ISPN-2904) Race condition in cache startup causes state transfer timeout
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2904?page=com.atlassian.jira.plugin.... ]
Mircea Markus resolved ISPN-2904.
---------------------------------
Resolution: Out of Date
as per Paul:
I'm pretty sure this bug is a manifestation of https://issues.jboss.org/browse/AS7-5904
The fix is here: https://github.com/jbossas/jboss-as/commit/f2642412dc5ebe34d9cd2221c96ab4...
> Race condition in cache startup causes state transfer timeout
> -------------------------------------------------------------
>
> Key: ISPN-2904
> URL: https://issues.jboss.org/browse/ISPN-2904
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 5.1.7.Final
> Reporter: Dennis Reed
> Assignee: Mircea Markus
>
> When starting multiple caches at the same time (as EAP domain mode deployment does), one cache can timeout during state transfer and abort startup.
> This is caused by a race condition where the master node accepts requests while it can't process them because it's still starting.
> Because of this, the other node's REQUEST_JOIN is ignored, and it finally times out.
> [node1]
> 10:47:23,390 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (ServerService Thread Pool -- 65) dests=[master:server-two/web], command=CacheViewControlCommand{cache=repl, type=REQUEST_JOIN, sender=master:server-one/web, newViewId=0, newMembers=null, oldViewId=0, oldMembers=null}, mode=SYNCHRONOUS_IGNORE_LEAVERS, timeout=60000
> 10:47:23,396 TRACE [org.jgroups.protocols.TCP] (ServerService Thread Pool -- 65) sending msg to master:server-two/web, src=master:server-one/web, headers are RequestCorrelator: id=200, type=REQ, id=7, rsp_expected=true, RSVP: REQ(4), UNICAST2: DATA, seqno=27, TCP: [channel_name=web]
> ...
> 10:48:23,404 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 65) MSC000001: Failed to start service jboss.infinispan.web.repl: org.jboss.msc.service.StartException in service jboss.infinispan.web.repl: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.BaseStateTransferManagerImpl.waitForJoinToComplete() throws java.lang.InterruptedException on object of type ReplicatedStateTransferManagerImpl
> [node2]
> 10:47:23,352 TRACE [org.infinispan.factories.GlobalComponentRegistry] (MSC service thread 1-6) Registering component Component{instance=org.infinispan.marshall.jboss.ExternalizerTable@3f9c437d, name=org.infinispan.marshall.jboss.ExternalizerTable} under name org.infinispan.marshall.jboss.ExternalizerTable
> ...
> 10:47:23,397 TRACE [org.jgroups.protocols.TCP] (OOB-19,null) received [dst: master:server-two/web, src: master:server-one/web (4 headers), size=54 bytes, flags=OOB|DONT_BUNDLE|RSVP], headers are RequestCorrelator: id=200, type=REQ, id=7, rsp_expected=true, RSVP: REQ(4), UNICAST2: DATA, seqno=27, TCP: [channel_name=web]
> 10:47:23,398 TRACE [org.jgroups.blocks.RequestCorrelator] (OOB-19,null) calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 7
> 10:47:23,398 TRACE [org.infinispan.marshall.jboss.ExternalizerTable] (OOB-19,null) Either the marshaller has stopped or hasn't started. Read externalizers are not properly populated: {}
> 10:47:23,398 TRACE [org.infinispan.marshall.jboss.ExternalizerTable] (OOB-19,null) Cache manager is shutting down and type (id=74) cannot be resolved (thread not interrupted)
> 10:47:23,400 TRACE [org.jgroups.blocks.RequestCorrelator] (OOB-19,null) sending rsp for 7 to master:server-one/web
> ...
> 10:47:23,522 TRACE [org.infinispan.factories.GlobalComponentRegistry] (ServerService Thread Pool -- 64) Invoking start method public void org.infinispan.marshall.jboss.ExternalizerTable.start() on component org.infinispan.marshall.jboss.ExternalizerTable
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months
[JBoss JIRA] (ISPN-2892) View installation loop when restarting cache on multiple nodes
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2892?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2892:
--------------------------------
Assignee: Dan Berindei (was: Mircea Markus)
> View installation loop when restarting cache on multiple nodes
> --------------------------------------------------------------
>
> Key: ISPN-2892
> URL: https://issues.jboss.org/browse/ISPN-2892
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 5.1.7.Final
> Reporter: Dennis Reed
> Assignee: Dan Berindei
>
> Restarting a cache on multiple nodes at the same time can cause the following error:
> ERROR [org.infinispan.cacheviews.CacheViewsManagerImpl] (CacheViewInstaller-19,node1/web) ISPN000172: Failed to prepare view CacheView{viewId=18, members=[node2/web]} for cache default-host/test, rolling back to view CacheView{viewId=17, members=[node1/web, node2/web]}: java.util.concurrent.ExecutionException: org.infinispan.CacheException: java.lang.IllegalStateException: default-host/test: Received cache view prepare request after the local node has already shut down
> After the initial error, the following error began repeating every second for a few minutes until BaseStateTransferManagerImpl.waitForJoinToComplete() timed out and the cache failed to start:
> ERROR [org.infinispan.cacheviews.CacheViewsManagerImpl] (CacheViewInstaller-19,node1/web) ISPN000172: Failed to prepare view CacheView{viewId=21, members=[node2/web]} for cache default-host/test, rolling back to view CacheView{viewId=20, members=[]}: java.util.concurrent.ExecutionException: org.infinispan.CacheException: java.lang.IllegalStateException: Cannot prepare new view CacheView{viewId=21, members=[node2/web]} on cache default-host/test, we are currently preparing view CacheView{viewId=18, members=[node2/web]}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months
[JBoss JIRA] (ISPN-2910) Divide by zero exception on shutdown
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2910?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2910:
--------------------------------
Assignee: Dan Berindei (was: Mircea Markus)
> Divide by zero exception on shutdown
> ------------------------------------
>
> Key: ISPN-2910
> URL: https://issues.jboss.org/browse/ISPN-2910
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 5.2.1.Final
> Reporter: Paul Ferraro
> Assignee: Dan Berindei
>
> 19:40:50,671 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,shared=udp) ISPN000094: Received new cluster view: [perf20/web|13] [perf20/web, perf21/web]
> 19:40:50,754 ERROR [org.infinispan.topology.ClusterTopologyManagerImpl] (notification-thread-0) ISPN000196: Failed to recover cluster state after the current node became the coordinator: java.lang.ArithmeticException: / by zero
> at org.infinispan.distribution.ch.DefaultConsistentHashFactory.addPrimaryOwners(DefaultConsistentHashFactory.java:130)
> at org.infinispan.distribution.ch.DefaultConsistentHashFactory.rebalanceBuilder(DefaultConsistentHashFactory.java:124)
> at org.infinispan.distribution.ch.DefaultConsistentHashFactory.updateMembers(DefaultConsistentHashFactory.java:86)
> at org.infinispan.distribution.ch.DefaultConsistentHashFactory.updateMembers(DefaultConsistentHashFactory.java:45)
> at org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheStatusAfterMerge(ClusterTopologyManagerImpl.java:306)
> at org.infinispan.topology.ClusterTopologyManagerImpl.handleNewView(ClusterTopologyManagerImpl.java:236)
> at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.handleViewChange(ClusterTopologyManagerImpl.java:597)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.6.0_38]
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [rt.jar:1.6.0_38]
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [rt.jar:1.6.0_38]
> at java.lang.reflect.Method.invoke(Method.java:597) [rt.jar:1.6.0_38]
> at org.infinispan.notifications.AbstractListenerImpl$ListenerInvocation$1.run(AbstractListenerImpl.java:212)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_38]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_38]
> at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_38]
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months
[JBoss JIRA] (ISPN-2910) Divide by zero exception on shutdown
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2910?page=com.atlassian.jira.plugin.... ]
Mircea Markus commented on ISPN-2910:
-------------------------------------
seems to happen when there are no nodes in the cluster.
> Divide by zero exception on shutdown
> ------------------------------------
>
> Key: ISPN-2910
> URL: https://issues.jboss.org/browse/ISPN-2910
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 5.2.1.Final
> Reporter: Paul Ferraro
> Assignee: Dan Berindei
>
> 19:40:50,671 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,shared=udp) ISPN000094: Received new cluster view: [perf20/web|13] [perf20/web, perf21/web]
> 19:40:50,754 ERROR [org.infinispan.topology.ClusterTopologyManagerImpl] (notification-thread-0) ISPN000196: Failed to recover cluster state after the current node became the coordinator: java.lang.ArithmeticException: / by zero
> at org.infinispan.distribution.ch.DefaultConsistentHashFactory.addPrimaryOwners(DefaultConsistentHashFactory.java:130)
> at org.infinispan.distribution.ch.DefaultConsistentHashFactory.rebalanceBuilder(DefaultConsistentHashFactory.java:124)
> at org.infinispan.distribution.ch.DefaultConsistentHashFactory.updateMembers(DefaultConsistentHashFactory.java:86)
> at org.infinispan.distribution.ch.DefaultConsistentHashFactory.updateMembers(DefaultConsistentHashFactory.java:45)
> at org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheStatusAfterMerge(ClusterTopologyManagerImpl.java:306)
> at org.infinispan.topology.ClusterTopologyManagerImpl.handleNewView(ClusterTopologyManagerImpl.java:236)
> at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.handleViewChange(ClusterTopologyManagerImpl.java:597)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.6.0_38]
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [rt.jar:1.6.0_38]
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [rt.jar:1.6.0_38]
> at java.lang.reflect.Method.invoke(Method.java:597) [rt.jar:1.6.0_38]
> at org.infinispan.notifications.AbstractListenerImpl$ListenerInvocation$1.run(AbstractListenerImpl.java:212)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_38]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_38]
> at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_38]
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months
[JBoss JIRA] (ISPN-2910) Divide by zero exception on shutdown
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2910?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2910:
--------------------------------
Fix Version/s: 5.3.0.Alpha1
5.3.0.Final
> Divide by zero exception on shutdown
> ------------------------------------
>
> Key: ISPN-2910
> URL: https://issues.jboss.org/browse/ISPN-2910
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 5.2.1.Final
> Reporter: Paul Ferraro
> Assignee: Dan Berindei
> Fix For: 5.3.0.Alpha1, 5.3.0.Final
>
>
> 19:40:50,671 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,shared=udp) ISPN000094: Received new cluster view: [perf20/web|13] [perf20/web, perf21/web]
> 19:40:50,754 ERROR [org.infinispan.topology.ClusterTopologyManagerImpl] (notification-thread-0) ISPN000196: Failed to recover cluster state after the current node became the coordinator: java.lang.ArithmeticException: / by zero
> at org.infinispan.distribution.ch.DefaultConsistentHashFactory.addPrimaryOwners(DefaultConsistentHashFactory.java:130)
> at org.infinispan.distribution.ch.DefaultConsistentHashFactory.rebalanceBuilder(DefaultConsistentHashFactory.java:124)
> at org.infinispan.distribution.ch.DefaultConsistentHashFactory.updateMembers(DefaultConsistentHashFactory.java:86)
> at org.infinispan.distribution.ch.DefaultConsistentHashFactory.updateMembers(DefaultConsistentHashFactory.java:45)
> at org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheStatusAfterMerge(ClusterTopologyManagerImpl.java:306)
> at org.infinispan.topology.ClusterTopologyManagerImpl.handleNewView(ClusterTopologyManagerImpl.java:236)
> at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.handleViewChange(ClusterTopologyManagerImpl.java:597)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.6.0_38]
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [rt.jar:1.6.0_38]
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [rt.jar:1.6.0_38]
> at java.lang.reflect.Method.invoke(Method.java:597) [rt.jar:1.6.0_38]
> at org.infinispan.notifications.AbstractListenerImpl$ListenerInvocation$1.run(AbstractListenerImpl.java:212)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_38]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_38]
> at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_38]
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months
[JBoss JIRA] (ISPN-2913) putForExternalRead leaves locks
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2913?page=com.atlassian.jira.plugin.... ]
Mircea Markus commented on ISPN-2913:
-------------------------------------
{quote}
In TxDistributionInterceptor.remoteGetAndStoreInL1 locks are acquired. Without a transaction these locks are never released.
{quote}
This is a transactional cache, so this call should always be executing in the scope of a transaction. How did you make the invocation outside of the scope of a transaction?
Also can you please give it a try on a newer than 5.2.1 release: 5.2.5.Final.
> putForExternalRead leaves locks
> -------------------------------
>
> Key: ISPN-2913
> URL: https://issues.jboss.org/browse/ISPN-2913
> Project: Infinispan
> Issue Type: Bug
> Components: Locking and Concurrency
> Affects Versions: 5.2.1.Final
> Reporter: Sebastian Tusk
> Assignee: Mircea Markus
> Priority: Critical
>
> In TxDistributionInterceptor.remoteGetAndStoreInL1 locks are acquired. Without a transaction these locks are never released. The cache setup is Dist, Async, L1, 2 Nodes, 1 Owner, Optimistic Locking.
> In AbstractTxLockingInterceptor.visitGetKeyValueCommand locks are released explicitly if outside of transactions. I fixed this problem by doing the same in OptimisticLockingInterceptor.visitPutKeyValueCommand. It is very likely that this doesn't fix all problems. For instance OptimisticLockingInterceptor.visitPutMapCommand or PessimisticLockingInterceptor.
> Cache Config:
> <namedCache name="entity">
> <jmxStatistics enabled="true" />
>
> <clustering mode="dist">
> <stateTransfer fetchInMemoryState="false" timeout="20000" />
> <async />
> <l1 enabled="true" />
> <hash numOwners="1"/>
> </clustering>
> <locking isolationLevel="READ_COMMITTED"
> lockAcquisitionTimeout="15000" useLockStriping="false" />
>
> <eviction maxEntries="10000" strategy="LRU" />
> <expiration maxIdle="100000" wakeUpInterval="5000"/>
> <storeAsBinary storeKeysAsBinary="true" storeValuesAsBinary="false" enabled="false" />
>
> <transaction transactionMode="TRANSACTIONAL" autoCommit="false" lockingMode="OPTIMISTIC"/>
> </namedCache>
> Fixed OptimisticLockingInterceptor.visitPutKeyValueCommand:
> @Override
> public Object visitPutKeyValueCommand(InvocationContext ctx, PutKeyValueCommand command) throws Throwable {
> try {
> if (command.isConditional()) markKeyAsRead(ctx, command);
> return invokeNextInterceptor(ctx, command);
> } catch (Throwable te) {
> throw cleanLocksAndRethrow(ctx, te);
> } finally {
> //with putForExternalRead the value might be put into L1 without a transaction
> //we need to release any locks for these cases
> if (!ctx.isInTxScope()) lockManager.unlockAll(ctx);
> }
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months
[JBoss JIRA] (ISPN-2923) provides an optimal way to search and retrieve the first non null value
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2923?page=com.atlassian.jira.plugin.... ]
Mircea Markus commented on ISPN-2923:
-------------------------------------
Infinispan doesn't allow inserting keys with null values, how did you end up with null values in the cache?
I think this makes sense as a general requirement, e.g. stop a map/reduce task if enough values have been found.
> provides an optimal way to search and retrieve the first non null value
> -----------------------------------------------------------------------
>
> Key: ISPN-2923
> URL: https://issues.jboss.org/browse/ISPN-2923
> Project: Infinispan
> Issue Type: Feature Request
> Components: Distributed Execution and Map/Reduce
> Affects Versions: 5.2.5.Final
> Reporter: Mathieu Lachance
> Assignee: Vladimir Blagojevic
>
> It would be nice if infinispan could provide a way to the common problem of searching and retrieve the first non null value.
> My attempt was to use the map reduce framework, but it would still scan all the keys and value even if the result as been found. Maybe Infinispan could provide a way to "stop" the map-reduce operation on all nodes when asked, ex, when the first non null value has been found ?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months