]
Mircea Markus commented on ISPN-2836:
-------------------------------------
I agree with [~mgencur] that 1 min for 500MB shouldn't be a matter of 60 secs, or even
secs for that metter. Note that in this case though, the Mapper task (assuming
distributeReducePhase=false) iterates over this 500 MB and sends (almost all) data to the
command originator which then Reduces it. This is not a good usage of Map/Reduce task, as
the mapper should be designed to take only a fraction of the data from remote nodes and
bring it back to the originator. I'd be curious to see some figures from Pedro on
where the time is actually spent.
org.jgroups.TimeoutException after invoking MapCombineCommand in
Map/Reduce task with 2 nodes
---------------------------------------------------------------------------------------------
Key: ISPN-2836
URL:
https://issues.jboss.org/browse/ISPN-2836
Project: Infinispan
Issue Type: Bug
Components: Distributed Execution and Map/Reduce
Affects Versions: 5.2.1.Final
Reporter: Alan Field
Assignee: Pedro Ruivo
Priority: Blocker
Labels: onboard
Fix For: 5.3.0.Final
Attachments: afield-tcp-521-final.txt, benchmark-mapreduce-multifilesize.xml,
dist-udp-no-tx.xml, jgroups-udp.xml, udp-edg-perf01.txt, udp-edg-perf02.txt
Using RadarGun and two nodes to execute the example WordCount Map/Reduce job against a
cache with ~550 keys with a value size of 1MB is producing a thread deadlock. The cache is
distributed with transactions disabled.
TCP transport deadlocks without throwing an exception. Disabling the send queue and
setting UNICAST2.conn_expiry_timeout=0 prevents the deadlock, but the job does not
complete. The nodes send "are-you-alive" messages back and forth, and I have
seen the following exception:
{noformat}
11:44:29,970 ERROR [org.jgroups.protocols.TCP] (OOB-98,default,edg-perf01-1907) failed
sending message to edg-perf02-32536 (76 bytes): java.net.SocketException: Socket closed,
cause: null
at
org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:352)
at
org.radargun.cachewrappers.InfinispanMapReduceWrapper.executeMapReduceTask(InfinispanMapReduceWrapper.java:98)
at org.radargun.stages.MapReduceStage.executeOnSlave(MapReduceStage.java:74)
at org.radargun.Slave$2.run(Slave.java:103)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.util.concurrent.ExecutionException: org.infinispan.CacheException:
org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.infinispan.distexec.mapreduce.MapReduceTask$TaskPart.get(MapReduceTask.java:832)
at
org.infinispan.distexec.mapreduce.MapReduceTask.executeMapPhaseWithLocalReduction(MapReduceTask.java:477)
at
org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:350)
... 9 more
Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending
message to edg-perf02-32536
at org.infinispan.util.Util.rewrapAsCacheException(Util.java:541)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:186)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:515)
11:44:29,978 ERROR [org.jgroups.protocols.TCP] (Timer-3,default,edg-perf01-1907) failed
sending message to edg-perf02-32536 (60 bytes): java.net.SocketException: Socket closed,
cause: null
at
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:175)
at
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:197)
at
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:254)
at org.infinispan.remoting.rpc.RpcManagerImpl.access$000(RpcManagerImpl.java:80)
at org.infinispan.remoting.rpc.RpcManagerImpl$1.call(RpcManagerImpl.java:288)
... 5 more
Caused by: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:390)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:301)
11:44:29,979 ERROR [org.jgroups.protocols.TCP] (Timer-4,default,edg-perf01-1907) failed
sending message to edg-perf02-32536 (63 bytes): java.net.SocketException: Socket closed,
cause: null
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:179)
... 11 more
{noformat}
With UDP transport, both threads are deadlocked. I will attach thread dumps from runs
using TCP and UDP transport.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: