]
Pedro Ruivo commented on ISPN-2836:
-----------------------------------
It is simple to add a timeout parameter to the MapReduceTasks to wait for remote replies.
Note that this timeout is used directly by JGroups so it is not the timeout for the task
because the current implementation waits for the replies sequentially. Also, it is
possible to wait for ever with a timeout value <= 0. The default value is the
replication timeout.
Does it make sense to you? [~afield] [~mgencur]?
org.jgroups.TimeoutException after invoking MapCombineCommand in
Map/Reduce task with 2 nodes
---------------------------------------------------------------------------------------------
Key: ISPN-2836
URL:
https://issues.jboss.org/browse/ISPN-2836
Project: Infinispan
Issue Type: Bug
Components: Distributed Execution and Map/Reduce
Affects Versions: 5.2.1.Final
Reporter: Alan Field
Assignee: Pedro Ruivo
Priority: Blocker
Labels: onboard
Fix For: 5.3.0.Final
Attachments: afield-tcp-521-final.txt, benchmark-mapreduce-multifilesize.xml,
dist-udp-no-tx.xml, jgroups-udp.xml, udp-edg-perf01.txt, udp-edg-perf02.txt
Using RadarGun and two nodes to execute the example WordCount Map/Reduce job against a
cache with ~550 keys with a value size of 1MB is producing a thread deadlock. The cache is
distributed with transactions disabled.
TCP transport deadlocks without throwing an exception. Disabling the send queue and
setting UNICAST2.conn_expiry_timeout=0 prevents the deadlock, but the job does not
complete. The nodes send "are-you-alive" messages back and forth, and I have
seen the following exception:
{noformat}
11:44:29,970 ERROR [org.jgroups.protocols.TCP] (OOB-98,default,edg-perf01-1907) failed
sending message to edg-perf02-32536 (76 bytes): java.net.SocketException: Socket closed,
cause: null
at
org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:352)
at
org.radargun.cachewrappers.InfinispanMapReduceWrapper.executeMapReduceTask(InfinispanMapReduceWrapper.java:98)
at org.radargun.stages.MapReduceStage.executeOnSlave(MapReduceStage.java:74)
at org.radargun.Slave$2.run(Slave.java:103)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.util.concurrent.ExecutionException: org.infinispan.CacheException:
org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.infinispan.distexec.mapreduce.MapReduceTask$TaskPart.get(MapReduceTask.java:832)
at
org.infinispan.distexec.mapreduce.MapReduceTask.executeMapPhaseWithLocalReduction(MapReduceTask.java:477)
at
org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:350)
... 9 more
Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending
message to edg-perf02-32536
at org.infinispan.util.Util.rewrapAsCacheException(Util.java:541)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:186)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:515)
11:44:29,978 ERROR [org.jgroups.protocols.TCP] (Timer-3,default,edg-perf01-1907) failed
sending message to edg-perf02-32536 (60 bytes): java.net.SocketException: Socket closed,
cause: null
at
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:175)
at
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:197)
at
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:254)
at org.infinispan.remoting.rpc.RpcManagerImpl.access$000(RpcManagerImpl.java:80)
at org.infinispan.remoting.rpc.RpcManagerImpl$1.call(RpcManagerImpl.java:288)
... 5 more
Caused by: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:390)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:301)
11:44:29,979 ERROR [org.jgroups.protocols.TCP] (Timer-4,default,edg-perf01-1907) failed
sending message to edg-perf02-32536 (63 bytes): java.net.SocketException: Socket closed,
cause: null
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:179)
... 11 more
{noformat}
With UDP transport, both threads are deadlocked. I will attach thread dumps from runs
using TCP and UDP transport.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: