[JBoss JIRA] (ISPN-2714) org.infinispan.distexec.mapreduce.TopologyAwareTwoNodesMapReduceTest.testInvokeMapperCancellation test fails randomly
by Anna Manukyan (JIRA)
Anna Manukyan created ISPN-2714:
-----------------------------------
Summary: org.infinispan.distexec.mapreduce.TopologyAwareTwoNodesMapReduceTest.testInvokeMapperCancellation test fails randomly
Key: ISPN-2714
URL: https://issues.jboss.org/browse/ISPN-2714
Project: Infinispan
Issue Type: Bug
Components: Distributed Execution and Map/Reduce
Affects Versions: 5.2.0.CR1
Reporter: Anna Manukyan
Assignee: Vladimir Blagojevic
The test org.infinispan.distexec.mapreduce.TopologyAwareTwoNodesMapReduceTest.testInvokeMapperCancellation fails randomly on all environments.
The error log is:
{code}
Error Message
Expected exception java.util.concurrent.CancellationException but got java.lang.AssertionError: Mapper not cancelled, root cause org.jgroups.TimeoutException: timeout sending message to TopologyAwareTwoNodesMapReduceTest-NodeB-22523(test2)
Stacktrace
org.testng.TestException:
Expected exception java.util.concurrent.CancellationException but got java.lang.AssertionError: Mapper not cancelled, root cause org.jgroups.TimeoutException: timeout sending message to TopologyAwareTwoNodesMapReduceTest-NodeB-22523(test2)
at org.testng.internal.Invoker.handleInvocationResults(Invoker.java:1503)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:764)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:907)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1237)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
at org.testng.TestRunner.privateRun(TestRunner.java:767)
at org.testng.TestRunner.run(TestRunner.java:617)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
at org.testng.SuiteRunner.access$000(SuiteRunner.java:37)
at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:368)
at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.AssertionError: Mapper not cancelled, root cause org.jgroups.TimeoutException: timeout sending message to TopologyAwareTwoNodesMapReduceTest-NodeB-22523(test2)
at org.infinispan.distexec.mapreduce.SimpleTwoNodesMapReduceTest.testInvokeMapperCancellation(SimpleTwoNodesMapReduceTest.java:106)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:715)
... 15 more
{code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 2 months
[JBoss JIRA] (ISPN-2612) Problem broadcasting CH_UPDATE command
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-2612?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-2612:
------------------------------------
[~mlinhard] Yes I did, but I didn't really have two different addresses on my machine - I tested with 127.0.0.1 and 127.0.0.2. Thanks for the logs!
[~an1310] I believe your issue is quite different, and you should be able to fix it by setting {{RSVP.ack_on_delivery=false}} in your JGroups configuration. See ISPN-2713 for more details.
> Problem broadcasting CH_UPDATE command
> --------------------------------------
>
> Key: ISPN-2612
> URL: https://issues.jboss.org/browse/ISPN-2612
> Project: Infinispan
> Issue Type: Bug
> Components: RPC
> Affects Versions: 5.2.0.Beta5
> Reporter: Michal Linhard
> Assignee: Dan Berindei
> Priority: Blocker
> Fix For: 5.2.0.CR2
>
> Attachments: session-cluster.xml, test.zip, test_TRACE.zip
>
>
> Infinispan 5.2.0.Beta5
> JGroups 3.2.4.Final
> Steps to reproduce (I'm using two virtual interfaces test1, test2)
> 1. Start org.jboss.qa.jdg.Test with -Djgroups.udp.bind_addr=test1 -Djava.net.preferIPv4Stack=true
> 2. wait 10 sec
> 3. Start org.jboss.qa.jdg.Test with -Djgroups.udp.bind_addr=test2 -Djava.net.preferIPv4Stack=true
> After 5 seconds there should be this timeout exception:
> {code}
> 19:42:14,146 WARN [org.infinispan.topology.CacheTopologyControlCommand] (OOB-2,mlinhard-work-37329) ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=___defaultcache, type=REBALANCE_CONFIRM, sender=mlinhard-work-47337, joinInfo=null, topologyId=1, currentCH=null, pendingCH=null, throwable=null, viewId=1}
> java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
> at java.util.concurrent.FutureTask.get(FutureTask.java:91)
> at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:563)
> at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastConsistentHashUpdate(ClusterTopologyManagerImpl.java:349)
> at org.infinispan.topology.ClusterTopologyManagerImpl.handleRebalanceCompleted(ClusterTopologyManagerImpl.java:213)
> at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:160)
> at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:137)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:252)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:219)
> at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:483)
> at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:390)
> at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:248)
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:598)
> at org.jgroups.JChannel.up(JChannel.java:703)
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
> at org.jgroups.protocols.RSVP.up(RSVP.java:172)
> at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)
> at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:736)
> at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:414)
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:606)
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:187)
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)
> at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)
> at org.jgroups.protocols.Discovery.up(Discovery.java:359)
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1287)
> at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1850)
> at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1823)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException
> at org.infinispan.util.Util.rewrapAsCacheException(Util.java:532)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:152)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:518)
> at org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:545)
> at org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:542)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> ... 3 more
> Caused by: org.jgroups.TimeoutException: TimeoutException
> at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:145)
> at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:40)
> at org.jgroups.util.AckCollector.waitForAllAcks(AckCollector.java:93)
> at org.jgroups.protocols.RSVP$Entry.block(RSVP.java:287)
> at org.jgroups.protocols.RSVP.down(RSVP.java:118)
> at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025)
> at org.jgroups.JChannel.down(JChannel.java:718)
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:616)
> at org.jgroups.blocks.RequestCorrelator.sendRequest(RequestCorrelator.java:173)
> at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:360)
> at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:103)
> at org.jgroups.blocks.Request.execute(Request.java:83)
> at org.jgroups.blocks.MessageDispatcher.cast(MessageDispatcher.java:335)
> at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:249)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processCalls(CommandAwareRpcDispatcher.java:330)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:145)
> ... 8 more
> {code}
> Analysis:
> These are the messages sent after view change:
> {code}
> test1 test2
> <--- JOIN ----
> ---- REBALANCE_START --->
> <--- StateRequestCommand ----
> ---- StateResponseCommand --->
> <--- REBALANCE_CONFIRM ----
> ---- CH_UPDATE --->
> {code}
> The last CH_UPDATE message is broadcast, test2 successfully processes it, but test1 stays in waiting state, because it for some reason awaits response also from itself - local variable entry in the method RSVP.down
> (https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/...)
> contained local address.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 2 months
[JBoss JIRA] (ISPN-2713) REBALANCE_START and REBALANCE_CONFIRM commands deadlock when RSVP.ack_on_delivery=true
by Dan Berindei (JIRA)
Dan Berindei created ISPN-2713:
----------------------------------
Summary: REBALANCE_START and REBALANCE_CONFIRM commands deadlock when RSVP.ack_on_delivery=true
Key: ISPN-2713
URL: https://issues.jboss.org/browse/ISPN-2713
Project: Infinispan
Issue Type: Bug
Components: State transfer
Affects Versions: 5.2.0.CR1
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 5.3.0.Final
When the coordinator sends a REBALANCE_START command, it holds a lock on the ClusterCacheStatus until it receives the responses from all the other members.
If a node doesn't need to request any new state, it sends the rebalance confirmation to the coordinator on the same thread that received the REBALANCE_START command. The REBALANCE_CONFIRM command also wants to acquire a lock on the ClusterCacheStatus on the coordinator, but because the REBALANCE_CONFIRM command is sent asynchronously, it doesn't deadlock with the thread waiting for REBALANCE_START responses on the coordinator.
At least, that's what happens when {{RSVP.ack_on_delivery=false}} (the Infinispan default). When {{RSVP.ack_on_delivery=true}} (the JGroups default), the "asynchronous" REBALANCE_CONFIRM command becomes synchronous, and it generates a deadlock. The rebalance then fails after the RSVP timeout expires (10 seconds by default).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 2 months
[JBoss JIRA] (ISPN-2712) Initial state transfer doesn't appear to all be persisted when using eviction in a replicated cluster
by Horia Chiorean (JIRA)
[ https://issues.jboss.org/browse/ISPN-2712?page=com.atlassian.jira.plugin.... ]
Horia Chiorean commented on ISPN-2712:
--------------------------------------
It's also worth mentioning that the same scenario (with eviction enabled) works fine in 5.1.x versions (tested with 5.1.2 and 5.1.8).
> Initial state transfer doesn't appear to all be persisted when using eviction in a replicated cluster
> ------------------------------------------------------------------------------------------------------
>
> Key: ISPN-2712
> URL: https://issues.jboss.org/browse/ISPN-2712
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 5.2.0.CR1
> Reporter: Randall Hauch
> Assignee: Mircea Markus
> Priority: Critical
> Attachments: ispn_eviction_log.txt, spectrum-repository-infinispan.xml
>
>
> Using a clustered cache with 2 nodes, where the cache in each node is configured identically with replication, eviction and (non-shared) file cache store. (See attached configuration.)
> The first (coordinator) process in the cluster is started and populated with 293 entries. Then the first process continually adds a few entries every 5 seconds. After a short delay, the second process is started, at which point it joins the cluster and starts the state-transfer process; logging shows in the first process that all 293 entries are transferred to the new cluster member, and the second log shows that they are all received. The second process then attempts to look for a specific entry that was created during initial population in the first process. This fails to find the existing entry.
> By enabling trace logging and "IDE breakpoint output messages" around state transfer, it's visible that from the 293 keys, only 218 are placed into the cache, the others being lost.
> (This problem was originally discovered when clustering ModeShape, which behaves roughly in the manner described above. The initial entries that are populated upon initialization are content created when a new repository is started. The second process looks for this content, and if it finds the content it knows not to create all of this initial content. However, if it doesn't find it, it thinks the repository has not yet been initialized and that it should create the initial content. The problem described by this bug then manifests itself in ModeShape through dozens of exceptions because the repository has been corrupted. See MODE-1745 for details on this problem. ModeShape's corresponding known issue for this issue, ISPN-2712, is MODE-1754.)
> The eviction is configured like this:
> {code:xml}
> <eviction strategy="LIRS" maxEntries="1000"/>
> {code}
> The attached log file is from the second process (the "receiver" node) and it contains the following key points:
> * line 40 - the total number of keys & entries to be transferred = 293
> * line 1352 and from there onwards 1358 / 1364 / i + 6 - the data container's size stops growing at 218, while the other entries are being sent. This means that in effect, they are ignored.
> * line 1797 - the loop from {{org.infinispan.statetransfer.StateConsumerImpl#doApplyState}} finishes
> Disabling eviction fixes the problem and all 293 nodes are placed in Node2's cache.
> (I initially marked this as CRITICAL priority, though it is a blocker for our use of Infinispan 5.2.)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 2 months
[JBoss JIRA] (ISPN-2711) NPE in StateConsumerImpl.addUpdatedKey
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-2711?page=com.atlassian.jira.plugin.... ]
Adrian Nistor reassigned ISPN-2711:
-----------------------------------
Assignee: Adrian Nistor (was: Mircea Markus)
> NPE in StateConsumerImpl.addUpdatedKey
> --------------------------------------
>
> Key: ISPN-2711
> URL: https://issues.jboss.org/browse/ISPN-2711
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 5.2.0.CR1
> Reporter: Michal Linhard
> Assignee: Adrian Nistor
>
> I've got this in ER8.1 elasticity test 1-4-1
> https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/view/EDG-REPOR...
> during startup of the 3rd node
> {code}
> 11:58:09,975 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (OOB-105,null) ISPN000136: Execution error: java.lang.NullPointerException
> at org.infinispan.statetransfer.StateConsumerImpl.addUpdatedKey(StateConsumerImpl.java:165) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.interceptors.EntryWrappingInterceptor.commitEntryIfNeeded(EntryWrappingInterceptor.java:375) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.interceptors.EntryWrappingInterceptor.commitContextEntries(EntryWrappingInterceptor.java:258) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.interceptors.EntryWrappingInterceptor.invokeNextAndApplyChanges(EntryWrappingInterceptor.java:277) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.interceptors.EntryWrappingInterceptor.visitPutKeyValueCommand(EntryWrappingInterceptor.java:166) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.visitPutKeyValueCommand(NonTransactionalLockingInterceptor.java:82) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:132) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:62) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.statetransfer.StateTransferInterceptor.handleTopologyAffectedCommand(StateTransferInterceptor.java:216) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.statetransfer.StateTransferInterceptor.handleWriteCommand(StateTransferInterceptor.java:194) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.statetransfer.StateTransferInterceptor.visitPutKeyValueCommand(StateTransferInterceptor.java:136) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.interceptors.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:127) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:118) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:128) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:92) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:62) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:77) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:343) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommand(BaseRpcInvokingCommand.java:61) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.commands.remote.SingleRpcCommand.perform(SingleRpcCommand.java:70) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:101) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithWaitForBlocks(InboundInvocationHandlerImpl.java:122) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:86) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:245) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:218) [infinispan-core-5.2.0.CR1-redhat-1.jar:5.2.0.CR1-redhat-1]
> at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:483) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:390) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:248) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:598) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.blocks.mux.MuxUpHandler.up(MuxUpHandler.java:130) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.JChannel.up(JChannel.java:703) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.protocols.RSVP.up(RSVP.java:188) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.protocols.FRAG2.up(FRAG2.java:181) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:400) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:418) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:432) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:721) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:574) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:187) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.protocols.MERGE2.up(MERGE2.java:205) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.protocols.Discovery.up(Discovery.java:359) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2640) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1287) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1850) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1823) [jgroups-3.2.5.Final-redhat-1.jar:3.2.5.Final-redhat-1]
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_38]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_38]
> at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_38]
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 2 months
[JBoss JIRA] (ISPN-2612) Problem broadcasting CH_UPDATE command
by Michal Linhard (JIRA)
[ https://issues.jboss.org/browse/ISPN-2612?page=com.atlassian.jira.plugin.... ]
Michal Linhard updated ISPN-2612:
---------------------------------
Attachment: test_TRACE.zip
here's the TRACE log anyway
> Problem broadcasting CH_UPDATE command
> --------------------------------------
>
> Key: ISPN-2612
> URL: https://issues.jboss.org/browse/ISPN-2612
> Project: Infinispan
> Issue Type: Bug
> Components: RPC
> Affects Versions: 5.2.0.Beta5
> Reporter: Michal Linhard
> Assignee: Dan Berindei
> Priority: Blocker
> Fix For: 5.2.0.CR2
>
> Attachments: session-cluster.xml, test.zip, test_TRACE.zip
>
>
> Infinispan 5.2.0.Beta5
> JGroups 3.2.4.Final
> Steps to reproduce (I'm using two virtual interfaces test1, test2)
> 1. Start org.jboss.qa.jdg.Test with -Djgroups.udp.bind_addr=test1 -Djava.net.preferIPv4Stack=true
> 2. wait 10 sec
> 3. Start org.jboss.qa.jdg.Test with -Djgroups.udp.bind_addr=test2 -Djava.net.preferIPv4Stack=true
> After 5 seconds there should be this timeout exception:
> {code}
> 19:42:14,146 WARN [org.infinispan.topology.CacheTopologyControlCommand] (OOB-2,mlinhard-work-37329) ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=___defaultcache, type=REBALANCE_CONFIRM, sender=mlinhard-work-47337, joinInfo=null, topologyId=1, currentCH=null, pendingCH=null, throwable=null, viewId=1}
> java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
> at java.util.concurrent.FutureTask.get(FutureTask.java:91)
> at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:563)
> at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastConsistentHashUpdate(ClusterTopologyManagerImpl.java:349)
> at org.infinispan.topology.ClusterTopologyManagerImpl.handleRebalanceCompleted(ClusterTopologyManagerImpl.java:213)
> at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:160)
> at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:137)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:252)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:219)
> at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:483)
> at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:390)
> at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:248)
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:598)
> at org.jgroups.JChannel.up(JChannel.java:703)
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
> at org.jgroups.protocols.RSVP.up(RSVP.java:172)
> at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)
> at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:736)
> at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:414)
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:606)
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:187)
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)
> at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)
> at org.jgroups.protocols.Discovery.up(Discovery.java:359)
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1287)
> at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1850)
> at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1823)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException
> at org.infinispan.util.Util.rewrapAsCacheException(Util.java:532)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:152)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:518)
> at org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:545)
> at org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:542)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> ... 3 more
> Caused by: org.jgroups.TimeoutException: TimeoutException
> at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:145)
> at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:40)
> at org.jgroups.util.AckCollector.waitForAllAcks(AckCollector.java:93)
> at org.jgroups.protocols.RSVP$Entry.block(RSVP.java:287)
> at org.jgroups.protocols.RSVP.down(RSVP.java:118)
> at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025)
> at org.jgroups.JChannel.down(JChannel.java:718)
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:616)
> at org.jgroups.blocks.RequestCorrelator.sendRequest(RequestCorrelator.java:173)
> at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:360)
> at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:103)
> at org.jgroups.blocks.Request.execute(Request.java:83)
> at org.jgroups.blocks.MessageDispatcher.cast(MessageDispatcher.java:335)
> at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:249)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processCalls(CommandAwareRpcDispatcher.java:330)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:145)
> ... 8 more
> {code}
> Analysis:
> These are the messages sent after view change:
> {code}
> test1 test2
> <--- JOIN ----
> ---- REBALANCE_START --->
> <--- StateRequestCommand ----
> ---- StateResponseCommand --->
> <--- REBALANCE_CONFIRM ----
> ---- CH_UPDATE --->
> {code}
> The last CH_UPDATE message is broadcast, test2 successfully processes it, but test1 stays in waiting state, because it for some reason awaits response also from itself - local variable entry in the method RSVP.down
> (https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/...)
> contained local address.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 2 months
[JBoss JIRA] (ISPN-2612) Problem broadcasting CH_UPDATE command
by Michal Linhard (JIRA)
[ https://issues.jboss.org/browse/ISPN-2612?page=com.atlassian.jira.plugin.... ]
Michal Linhard edited comment on ISPN-2612 at 1/14/13 2:03 PM:
---------------------------------------------------------------
here's the TRACE log anyway - test_TRACE.zip
was (Author: mlinhard):
here's the TRACE log anyway
> Problem broadcasting CH_UPDATE command
> --------------------------------------
>
> Key: ISPN-2612
> URL: https://issues.jboss.org/browse/ISPN-2612
> Project: Infinispan
> Issue Type: Bug
> Components: RPC
> Affects Versions: 5.2.0.Beta5
> Reporter: Michal Linhard
> Assignee: Dan Berindei
> Priority: Blocker
> Fix For: 5.2.0.CR2
>
> Attachments: session-cluster.xml, test.zip, test_TRACE.zip
>
>
> Infinispan 5.2.0.Beta5
> JGroups 3.2.4.Final
> Steps to reproduce (I'm using two virtual interfaces test1, test2)
> 1. Start org.jboss.qa.jdg.Test with -Djgroups.udp.bind_addr=test1 -Djava.net.preferIPv4Stack=true
> 2. wait 10 sec
> 3. Start org.jboss.qa.jdg.Test with -Djgroups.udp.bind_addr=test2 -Djava.net.preferIPv4Stack=true
> After 5 seconds there should be this timeout exception:
> {code}
> 19:42:14,146 WARN [org.infinispan.topology.CacheTopologyControlCommand] (OOB-2,mlinhard-work-37329) ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=___defaultcache, type=REBALANCE_CONFIRM, sender=mlinhard-work-47337, joinInfo=null, topologyId=1, currentCH=null, pendingCH=null, throwable=null, viewId=1}
> java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
> at java.util.concurrent.FutureTask.get(FutureTask.java:91)
> at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:563)
> at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastConsistentHashUpdate(ClusterTopologyManagerImpl.java:349)
> at org.infinispan.topology.ClusterTopologyManagerImpl.handleRebalanceCompleted(ClusterTopologyManagerImpl.java:213)
> at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:160)
> at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:137)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:252)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:219)
> at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:483)
> at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:390)
> at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:248)
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:598)
> at org.jgroups.JChannel.up(JChannel.java:703)
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
> at org.jgroups.protocols.RSVP.up(RSVP.java:172)
> at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)
> at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:736)
> at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:414)
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:606)
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:187)
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)
> at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)
> at org.jgroups.protocols.Discovery.up(Discovery.java:359)
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1287)
> at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1850)
> at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1823)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException
> at org.infinispan.util.Util.rewrapAsCacheException(Util.java:532)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:152)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:518)
> at org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:545)
> at org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:542)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> ... 3 more
> Caused by: org.jgroups.TimeoutException: TimeoutException
> at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:145)
> at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:40)
> at org.jgroups.util.AckCollector.waitForAllAcks(AckCollector.java:93)
> at org.jgroups.protocols.RSVP$Entry.block(RSVP.java:287)
> at org.jgroups.protocols.RSVP.down(RSVP.java:118)
> at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025)
> at org.jgroups.JChannel.down(JChannel.java:718)
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:616)
> at org.jgroups.blocks.RequestCorrelator.sendRequest(RequestCorrelator.java:173)
> at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:360)
> at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:103)
> at org.jgroups.blocks.Request.execute(Request.java:83)
> at org.jgroups.blocks.MessageDispatcher.cast(MessageDispatcher.java:335)
> at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:249)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processCalls(CommandAwareRpcDispatcher.java:330)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:145)
> ... 8 more
> {code}
> Analysis:
> These are the messages sent after view change:
> {code}
> test1 test2
> <--- JOIN ----
> ---- REBALANCE_START --->
> <--- StateRequestCommand ----
> ---- StateResponseCommand --->
> <--- REBALANCE_CONFIRM ----
> ---- CH_UPDATE --->
> {code}
> The last CH_UPDATE message is broadcast, test2 successfully processes it, but test1 stays in waiting state, because it for some reason awaits response also from itself - local variable entry in the method RSVP.down
> (https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/...)
> contained local address.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 2 months
[JBoss JIRA] (ISPN-2612) Problem broadcasting CH_UPDATE command
by Michal Linhard (JIRA)
[ https://issues.jboss.org/browse/ISPN-2612?page=com.atlassian.jira.plugin.... ]
Michal Linhard commented on ISPN-2612:
--------------------------------------
@Dan I just tried with 5.2.0.CR1 and it's the same error. Did you really run the Test class twice as I mention in "steps to reproduce" ?
> Problem broadcasting CH_UPDATE command
> --------------------------------------
>
> Key: ISPN-2612
> URL: https://issues.jboss.org/browse/ISPN-2612
> Project: Infinispan
> Issue Type: Bug
> Components: RPC
> Affects Versions: 5.2.0.Beta5
> Reporter: Michal Linhard
> Assignee: Dan Berindei
> Priority: Blocker
> Fix For: 5.2.0.CR2
>
> Attachments: session-cluster.xml, test.zip
>
>
> Infinispan 5.2.0.Beta5
> JGroups 3.2.4.Final
> Steps to reproduce (I'm using two virtual interfaces test1, test2)
> 1. Start org.jboss.qa.jdg.Test with -Djgroups.udp.bind_addr=test1 -Djava.net.preferIPv4Stack=true
> 2. wait 10 sec
> 3. Start org.jboss.qa.jdg.Test with -Djgroups.udp.bind_addr=test2 -Djava.net.preferIPv4Stack=true
> After 5 seconds there should be this timeout exception:
> {code}
> 19:42:14,146 WARN [org.infinispan.topology.CacheTopologyControlCommand] (OOB-2,mlinhard-work-37329) ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=___defaultcache, type=REBALANCE_CONFIRM, sender=mlinhard-work-47337, joinInfo=null, topologyId=1, currentCH=null, pendingCH=null, throwable=null, viewId=1}
> java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
> at java.util.concurrent.FutureTask.get(FutureTask.java:91)
> at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:563)
> at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastConsistentHashUpdate(ClusterTopologyManagerImpl.java:349)
> at org.infinispan.topology.ClusterTopologyManagerImpl.handleRebalanceCompleted(ClusterTopologyManagerImpl.java:213)
> at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:160)
> at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:137)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:252)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:219)
> at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:483)
> at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:390)
> at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:248)
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:598)
> at org.jgroups.JChannel.up(JChannel.java:703)
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
> at org.jgroups.protocols.RSVP.up(RSVP.java:172)
> at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)
> at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:736)
> at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:414)
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:606)
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:187)
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)
> at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)
> at org.jgroups.protocols.Discovery.up(Discovery.java:359)
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1287)
> at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1850)
> at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1823)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException
> at org.infinispan.util.Util.rewrapAsCacheException(Util.java:532)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:152)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:518)
> at org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:545)
> at org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:542)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> ... 3 more
> Caused by: org.jgroups.TimeoutException: TimeoutException
> at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:145)
> at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:40)
> at org.jgroups.util.AckCollector.waitForAllAcks(AckCollector.java:93)
> at org.jgroups.protocols.RSVP$Entry.block(RSVP.java:287)
> at org.jgroups.protocols.RSVP.down(RSVP.java:118)
> at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025)
> at org.jgroups.JChannel.down(JChannel.java:718)
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:616)
> at org.jgroups.blocks.RequestCorrelator.sendRequest(RequestCorrelator.java:173)
> at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:360)
> at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:103)
> at org.jgroups.blocks.Request.execute(Request.java:83)
> at org.jgroups.blocks.MessageDispatcher.cast(MessageDispatcher.java:335)
> at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:249)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processCalls(CommandAwareRpcDispatcher.java:330)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:145)
> ... 8 more
> {code}
> Analysis:
> These are the messages sent after view change:
> {code}
> test1 test2
> <--- JOIN ----
> ---- REBALANCE_START --->
> <--- StateRequestCommand ----
> ---- StateResponseCommand --->
> <--- REBALANCE_CONFIRM ----
> ---- CH_UPDATE --->
> {code}
> The last CH_UPDATE message is broadcast, test2 successfully processes it, but test1 stays in waiting state, because it for some reason awaits response also from itself - local variable entry in the method RSVP.down
> (https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/...)
> contained local address.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 2 months
[JBoss JIRA] (ISPN-2712) Initial state transfer doesn't appear to all be persisted when using eviction in a replicated cluster
by Randall Hauch (JIRA)
[ https://issues.jboss.org/browse/ISPN-2712?page=com.atlassian.jira.plugin.... ]
Randall Hauch updated ISPN-2712:
--------------------------------
Description:
Using a clustered cache with 2 nodes, where the cache in each node is configured identically with replication, eviction and (non-shared) file cache store. (See attached configuration.)
The first (coordinator) process in the cluster is started and populated with 293 entries. Then the first process continually adds a few entries every 5 seconds. After a short delay, the second process is started, at which point it joins the cluster and starts the state-transfer process; logging shows in the first process that all 293 entries are transferred to the new cluster member, and the second log shows that they are all received. The second process then attempts to look for a specific entry that was created during initial population in the first process. This fails to find the existing entry.
By enabling trace logging and "IDE breakpoint output messages" around state transfer, it's visible that from the 293 keys, only 218 are placed into the cache, the others being lost.
(This problem was originally discovered when clustering ModeShape, which behaves roughly in the manner described above. The initial entries that are populated upon initialization are content created when a new repository is started. The second process looks for this content, and if it finds the content it knows not to create all of this initial content. However, if it doesn't find it, it thinks the repository has not yet been initialized and that it should create the initial content. The problem described by this bug then manifests itself in ModeShape through dozens of exceptions because the repository has been corrupted. See MODE-1745 for details on this problem. ModeShape's corresponding known issue for this issue, ISPN-2712, is MODE-1754.)
The eviction is configured like this:
{code:xml}
<eviction strategy="LIRS" maxEntries="1000"/>
{code}
The attached log file is from the second process (the "receiver" node) and it contains the following key points:
* line 40 - the total number of keys & entries to be transferred = 293
* line 1352 and from there onwards 1358 / 1364 / i + 6 - the data container's size stops growing at 218, while the other entries are being sent. This means that in effect, they are ignored.
* line 1797 - the loop from {{org.infinispan.statetransfer.StateConsumerImpl#doApplyState}} finishes
Disabling eviction fixes the problem and all 293 nodes are placed in Node2's cache.
(I initially marked this as CRITICAL priority, though it is a blocker for our use of Infinispan 5.2.)
was:
Using a clustered cache with 2 nodes, where the cache in each node is configured identically with replication, eviction and (non-shared) file cache store. (See attached configuration.)
The first (coordinator) process in the cluster is started and populated with 293 entries. Then the first process continually adds a few entries every 5 seconds. After a short delay, the second process is started, at which point it joins the cluster and starts the state-transfer process; logging shows in the first process that all 293 entries are transferred to the new cluster member, and the second log shows that they are all received. The second process then attempts to look for a specific entry that was created during initial population in the first process. This fails to find the existing entry.
By enabling trace logging and "IDE breakpoint output messages" around state transfer, it's visible that from the 293 keys, only 218 are placed into the cache, the others being lost.
(This problem was originally discovered when clustering ModeShape, which behaves roughly in the manner described above. The initial entries that are populated upon initialization are content created when a new repository is started. The second process looks for this content, and if it finds the content it knows not to create all of this initial content. However, if it doesn't find it, it thinks the repository has not yet been initialized and that it should create the initial content. The problem described by this bug then manifests itself in ModeShape through dozens of exceptions because the repository has been corrupted. See MODE-1745 for details on this problem.)
The eviction is configured like this:
{code:xml}
<eviction strategy="LIRS" maxEntries="1000"/>
{code}
The attached log file is from the second process (the "receiver" node) and it contains the following key points:
* line 40 - the total number of keys & entries to be transferred = 293
* line 1352 and from there onwards 1358 / 1364 / i + 6 - the data container's size stops growing at 218, while the other entries are being sent. This means that in effect, they are ignored.
* line 1797 - the loop from {{org.infinispan.statetransfer.StateConsumerImpl#doApplyState}} finishes
Disabling eviction fixes the problem and all 293 nodes are placed in Node2's cache.
(I initially marked this as CRITICAL priority, though it is a blocker for our use of Infinispan 5.2.)
> Initial state transfer doesn't appear to all be persisted when using eviction in a replicated cluster
> ------------------------------------------------------------------------------------------------------
>
> Key: ISPN-2712
> URL: https://issues.jboss.org/browse/ISPN-2712
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 5.2.0.CR1
> Reporter: Randall Hauch
> Assignee: Mircea Markus
> Priority: Critical
> Attachments: ispn_eviction_log.txt, spectrum-repository-infinispan.xml
>
>
> Using a clustered cache with 2 nodes, where the cache in each node is configured identically with replication, eviction and (non-shared) file cache store. (See attached configuration.)
> The first (coordinator) process in the cluster is started and populated with 293 entries. Then the first process continually adds a few entries every 5 seconds. After a short delay, the second process is started, at which point it joins the cluster and starts the state-transfer process; logging shows in the first process that all 293 entries are transferred to the new cluster member, and the second log shows that they are all received. The second process then attempts to look for a specific entry that was created during initial population in the first process. This fails to find the existing entry.
> By enabling trace logging and "IDE breakpoint output messages" around state transfer, it's visible that from the 293 keys, only 218 are placed into the cache, the others being lost.
> (This problem was originally discovered when clustering ModeShape, which behaves roughly in the manner described above. The initial entries that are populated upon initialization are content created when a new repository is started. The second process looks for this content, and if it finds the content it knows not to create all of this initial content. However, if it doesn't find it, it thinks the repository has not yet been initialized and that it should create the initial content. The problem described by this bug then manifests itself in ModeShape through dozens of exceptions because the repository has been corrupted. See MODE-1745 for details on this problem. ModeShape's corresponding known issue for this issue, ISPN-2712, is MODE-1754.)
> The eviction is configured like this:
> {code:xml}
> <eviction strategy="LIRS" maxEntries="1000"/>
> {code}
> The attached log file is from the second process (the "receiver" node) and it contains the following key points:
> * line 40 - the total number of keys & entries to be transferred = 293
> * line 1352 and from there onwards 1358 / 1364 / i + 6 - the data container's size stops growing at 218, while the other entries are being sent. This means that in effect, they are ignored.
> * line 1797 - the loop from {{org.infinispan.statetransfer.StateConsumerImpl#doApplyState}} finishes
> Disabling eviction fixes the problem and all 293 nodes are placed in Node2's cache.
> (I initially marked this as CRITICAL priority, though it is a blocker for our use of Infinispan 5.2.)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 2 months
[JBoss JIRA] (ISPN-2712) Initial state transfer doesn't appear to all be persisted when using eviction in a replicated cluster
by Randall Hauch (JIRA)
[ https://issues.jboss.org/browse/ISPN-2712?page=com.atlassian.jira.plugin.... ]
Randall Hauch updated ISPN-2712:
--------------------------------
Attachment: ispn_eviction_log.txt
spectrum-repository-infinispan.xml
Attached the Infinispan configuration (used in both processes, with actual locations of cache store specified via system properties) and the aforementioned log.
> Initial state transfer doesn't appear to all be persisted when using eviction in a replicated cluster
> ------------------------------------------------------------------------------------------------------
>
> Key: ISPN-2712
> URL: https://issues.jboss.org/browse/ISPN-2712
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 5.2.0.CR1
> Reporter: Randall Hauch
> Assignee: Mircea Markus
> Priority: Critical
> Attachments: ispn_eviction_log.txt, spectrum-repository-infinispan.xml
>
>
> Using a clustered cache with 2 nodes, where the cache in each node is configured identically with replication, eviction and (non-shared) file cache store. (See attached configuration.)
> The first (coordinator) process in the cluster is started and populated with 293 entries. Then the first process continually adds a few entries every 5 seconds. After a short delay, the second process is started, at which point it joins the cluster and starts the state-transfer process; logging shows in the first process that all 293 entries are transferred to the new cluster member, and the second log shows that they are all received. The second process then attempts to look for a specific entry that was created during initial population in the first process. This fails to find the existing entry.
> By enabling trace logging and "IDE breakpoint output messages" around state transfer, it's visible that from the 293 keys, only 218 are placed into the cache, the others being lost.
> (This problem was originally discovered when clustering ModeShape, which behaves roughly in the manner described above. The initial entries that are populated upon initialization are content created when a new repository is started. The second process looks for this content, and if it finds the content it knows not to create all of this initial content. However, if it doesn't find it, it thinks the repository has not yet been initialized and that it should create the initial content. The problem described by this bug then manifests itself in ModeShape through dozens of exceptions because the repository has been corrupted. See MODE-1745 for details on this problem.)
> The eviction is configured like this:
> {code:xml}
> <eviction strategy="LIRS" maxEntries="1000"/>
> {code}
> The attached log file is from the second process (the "receiver" node) and it contains the following key points:
> * line 40 - the total number of keys & entries to be transferred = 293
> * line 1352 and from there onwards 1358 / 1364 / i + 6 - the data container's size stops growing at 218, while the other entries are being sent. This means that in effect, they are ignored.
> * line 1797 - the loop from {{org.infinispan.statetransfer.StateConsumerImpl#doApplyState}} finishes
> Disabling eviction fixes the problem and all 293 nodes are placed in Node2's cache.
> (I initially marked this as CRITICAL priority, though it is a blocker for our use of Infinispan 5.2.)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 2 months