[infinispan-issues] [JBoss JIRA] (ISPN-2836) org.jgroups.TimeoutException after invoking MapCombineCommand in Map/Reduce task with 2 nodes
Alan Field (JIRA)
jira-events at lists.jboss.org
Fri Mar 1 12:09:56 EST 2013
[ https://issues.jboss.org/browse/ISPN-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757865#comment-12757865 ]
Alan Field edited comment on ISPN-2836 at 3/1/13 12:09 PM:
-----------------------------------------------------------
Bela,
I tried to run the job again with the JGroups 3.3.0 Beta1 JAR file, but I get the following exception trying to create the cache with two nodes:
{noformat}
11:52:12,870 DEBUG [org.radargun.Master] (main) Starting 'StartClusterStage' on 2 slave nodes. Details: StartCluster {config=dist-tcp-no-tx.xml, delayAfterFirstSlaveStarts=5.000 secs, delayBetweenStartingSlaves=0.500 secs, exitBenchmarkOnSlaveFailure=true, expectNumSlaves=null, mayFailOn=null, reachable=null, runOnAllSlaves=false, slaves=null, staggerSlaveStartup=true, useSmartClassLoading=true, validateCluster=true }
11:52:21,088 INFO [org.radargun.stages.StartClusterStage] (main) Received responses from all 2 slaves. Durations [0:7.87 seconds, 1:8.16 seconds]
11:52:21,089 WARN [org.radargun.stages.StartClusterStage] (main) Received error ack DefaultDistStageAck{slaveIndex=1, slaveAddress=/172.18.1.3, isError=true, errorMessage='null', payload=null, remoteExceptionString=
org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:883)
org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:654)
org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:643)
org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:546)
org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:199)
org.infinispan.CacheImpl.start(CacheImpl.java:520)
org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:690)
org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:653)
org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:549)
org.radargun.cachewrappers.InfinispanWrapper.setUpCache(InfinispanWrapper.java:126)
org.radargun.cachewrappers.InfinispanWrapper.setUp(InfinispanWrapper.java:72)
org.radargun.cachewrappers.InfinispanKillableWrapper.setUp(InfinispanKillableWrapper.java:56)
org.radargun.stages.helpers.StartHelper.start(StartHelper.java:69)
org.radargun.stages.StartClusterStage.executeOnSlave(StartClusterStage.java:67)
org.radargun.Slave$2.run(Slave.java:103)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
java.util.concurrent.FutureTask.run(FutureTask.java:138)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)
}
11:52:21,089 INFO [org.radargun.state.MasterState] (main) Exception error on current stage, and exiting (stage's exitBenchmarkOnSlaveFailure is set to true).
{noformat}
This stacktrace doesn't have any JGroups classes in it, but I don't see the exception with the JGroups 3.2.1 JAR file. I found the full exception in the slave node log, and it is related to JGroups:
{noformat}
11:52:21,076 ERROR [org.radargun.cachewrappers.InfinispanMapReduceWrapper] (pool-1-thread-1) Wrapper start failed.
org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.start() throws java.lang.Exception on object of type StateTransferManagerImpl
at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:883)
at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:654)
at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:643)
at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:546)
at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:199)
at org.infinispan.CacheImpl.start(CacheImpl.java:520)
at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:690)
at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:653)
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:549)
at org.radargun.cachewrappers.InfinispanWrapper.setUpCache(InfinispanWrapper.java:126)
at org.radargun.cachewrappers.InfinispanWrapper.setUp(InfinispanWrapper.java:72)
at org.radargun.cachewrappers.InfinispanKillableWrapper.setUp(InfinispanKillableWrapper.java:56)
at org.radargun.stages.helpers.StartHelper.start(StartHelper.java:69)
at org.radargun.stages.StartClusterStage.executeOnSlave(StartClusterStage.java:67)
at org.radargun.Slave$2.run(Slave.java:103)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NoSuchMethodError: org.jgroups.Message.setBuffer(Lorg/jgroups/util/Buffer;)V
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.constructMessage(CommandAwareRpcDispatcher.java:263)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:299)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:178)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:515)
at org.infinispan.topology.LocalTopologyManagerImpl.executeOnCoordinator(LocalTopologyManagerImpl.java:261)
at org.infinispan.topology.LocalTopologyManagerImpl.join(LocalTopologyManagerImpl.java:101)
at org.infinispan.statetransfer.StateTransferManagerImpl.start(StateTransferManagerImpl.java:114)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
... 21 more
{noformat}
was (Author: afield):
Bela,
I tried to run the job again with the JGroups 3.3.0 Beta1 JAR file, but I get the following exception trying to create the cache with two nodes:
{noformat}
11:52:12,870 DEBUG [org.radargun.Master] (main) Starting 'StartClusterStage' on 2 slave nodes. Details: StartCluster {config=dist-tcp-no-tx.xml, delayAfterFirstSlaveStarts=5.000 secs, delayBetweenStartingSlaves=0.500 secs, exitBenchmarkOnSlaveFailure=true, expectNumSlaves=null, mayFailOn=null, reachable=null, runOnAllSlaves=false, slaves=null, staggerSlaveStartup=true, useSmartClassLoading=true, validateCluster=true }
11:52:21,088 INFO [org.radargun.stages.StartClusterStage] (main) Received responses from all 2 slaves. Durations [0:7.87 seconds, 1:8.16 seconds]
11:52:21,089 WARN [org.radargun.stages.StartClusterStage] (main) Received error ack DefaultDistStageAck{slaveIndex=1, slaveAddress=/172.18.1.3, isError=true, errorMessage='null', payload=null, remoteExceptionString=
org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:883)
org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:654)
org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:643)
org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:546)
org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:199)
org.infinispan.CacheImpl.start(CacheImpl.java:520)
org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:690)
org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:653)
org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:549)
org.radargun.cachewrappers.InfinispanWrapper.setUpCache(InfinispanWrapper.java:126)
org.radargun.cachewrappers.InfinispanWrapper.setUp(InfinispanWrapper.java:72)
org.radargun.cachewrappers.InfinispanKillableWrapper.setUp(InfinispanKillableWrapper.java:56)
org.radargun.stages.helpers.StartHelper.start(StartHelper.java:69)
org.radargun.stages.StartClusterStage.executeOnSlave(StartClusterStage.java:67)
org.radargun.Slave$2.run(Slave.java:103)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
java.util.concurrent.FutureTask.run(FutureTask.java:138)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)
}
11:52:21,089 INFO [org.radargun.state.MasterState] (main) Exception error on current stage, and exiting (stage's exitBenchmarkOnSlaveFailure is set to true).
{noformat}
The stacktrace doesn't have any JGroups classes in it, but I don't see the exception with the JGroups 3.2.1 JAR file. I'll keep trying to figure out what the issue is.
> org.jgroups.TimeoutException after invoking MapCombineCommand in Map/Reduce task with 2 nodes
> ---------------------------------------------------------------------------------------------
>
> Key: ISPN-2836
> URL: https://issues.jboss.org/browse/ISPN-2836
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Execution and Map/Reduce
> Affects Versions: 5.2.1.Final
> Reporter: Alan Field
> Assignee: Vladimir Blagojevic
> Attachments: afield-tcp-521-final.txt, udp-edg-perf01.txt, udp-edg-perf02.txt
>
>
> Using RadarGun and two nodes to execute the example WordCount Map/Reduce job against a cache with ~550 keys with a value size of 1MB is producing a thread deadlock. The cache is distributed with transactions disabled.
> TCP transport deadlocks without throwing an exception. Disabling the send queue and setting UNICAST2.conn_expiry_timeout=0 prevents the deadlock, but the job does not complete. The nodes send "are-you-alive" messages back and forth, and I have seen the following exception:
> {noformat}
> 11:44:29,970 ERROR [org.jgroups.protocols.TCP] (OOB-98,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (76 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:352)
> at org.radargun.cachewrappers.InfinispanMapReduceWrapper.executeMapReduceTask(InfinispanMapReduceWrapper.java:98)
> at org.radargun.stages.MapReduceStage.executeOnSlave(MapReduceStage.java:74)
> at org.radargun.Slave$2.run(Slave.java:103)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at org.infinispan.distexec.mapreduce.MapReduceTask$TaskPart.get(MapReduceTask.java:832)
> at org.infinispan.distexec.mapreduce.MapReduceTask.executeMapPhaseWithLocalReduction(MapReduceTask.java:477)
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:350)
> ... 9 more
> Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at org.infinispan.util.Util.rewrapAsCacheException(Util.java:541)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:186)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:515)
> 11:44:29,978 ERROR [org.jgroups.protocols.TCP] (Timer-3,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (60 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:175)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:197)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:254)
> at org.infinispan.remoting.rpc.RpcManagerImpl.access$000(RpcManagerImpl.java:80)
> at org.infinispan.remoting.rpc.RpcManagerImpl$1.call(RpcManagerImpl.java:288)
> ... 5 more
> Caused by: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:390)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:301)
> 11:44:29,979 ERROR [org.jgroups.protocols.TCP] (Timer-4,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (63 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:179)
> ... 11 more
> {noformat}
> With UDP transport, both threads are deadlocked. I will attach thread dumps from runs using TCP and UDP transport.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the infinispan-issues
mailing list