]
Alan Field edited comment on ISPN-2836 at 6/4/13 11:38 AM:
-----------------------------------------------------------
[~pruivo] OK, so I have a couple of questions:
1) Can you send me the updated executeMapReduceTask() code?
2) Does the cache configuration need to be changed to avoid the TimeoutException with
larger amounts of data in the cache?
The problem with the characters is that the bytes from the file are not being decoded
properly. The latest version of LoadFileStage doesn't decode the bytes at all, but to
put String values in the cache, this is necessary. I'm trying to determine the best
approach here for data loading stages now.
was (Author: afield):
[~pruivo] OK, so I have a couple of questions:
1) Can you send me the updated executeMapReduceTask() code?
2) Does the cache configuration need to be changed to avoid the TimeoutException with
larger amounts of data in the cache?
org.jgroups.TimeoutException after invoking MapCombineCommand in
Map/Reduce task with 2 nodes
---------------------------------------------------------------------------------------------
Key: ISPN-2836
URL:
https://issues.jboss.org/browse/ISPN-2836
Project: Infinispan
Issue Type: Bug
Components: Distributed Execution and Map/Reduce
Affects Versions: 5.2.1.Final
Reporter: Alan Field
Assignee: Pedro Ruivo
Priority: Blocker
Labels: onboard
Fix For: 5.3.0.Final
Attachments: afield-tcp-521-final.txt, benchmark-mapreduce-multifilesize.xml,
dist-udp-no-tx.xml, jgroups-udp.xml, udp-edg-perf01.txt, udp-edg-perf02.txt
Using RadarGun and two nodes to execute the example WordCount Map/Reduce job against a
cache with ~550 keys with a value size of 1MB is producing a thread deadlock. The cache is
distributed with transactions disabled.
TCP transport deadlocks without throwing an exception. Disabling the send queue and
setting UNICAST2.conn_expiry_timeout=0 prevents the deadlock, but the job does not
complete. The nodes send "are-you-alive" messages back and forth, and I have
seen the following exception:
{noformat}
11:44:29,970 ERROR [org.jgroups.protocols.TCP] (OOB-98,default,edg-perf01-1907) failed
sending message to edg-perf02-32536 (76 bytes): java.net.SocketException: Socket closed,
cause: null
at
org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:352)
at
org.radargun.cachewrappers.InfinispanMapReduceWrapper.executeMapReduceTask(InfinispanMapReduceWrapper.java:98)
at org.radargun.stages.MapReduceStage.executeOnSlave(MapReduceStage.java:74)
at org.radargun.Slave$2.run(Slave.java:103)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.util.concurrent.ExecutionException: org.infinispan.CacheException:
org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.infinispan.distexec.mapreduce.MapReduceTask$TaskPart.get(MapReduceTask.java:832)
at
org.infinispan.distexec.mapreduce.MapReduceTask.executeMapPhaseWithLocalReduction(MapReduceTask.java:477)
at
org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:350)
... 9 more
Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending
message to edg-perf02-32536
at org.infinispan.util.Util.rewrapAsCacheException(Util.java:541)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:186)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:515)
11:44:29,978 ERROR [org.jgroups.protocols.TCP] (Timer-3,default,edg-perf01-1907) failed
sending message to edg-perf02-32536 (60 bytes): java.net.SocketException: Socket closed,
cause: null
at
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:175)
at
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:197)
at
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:254)
at org.infinispan.remoting.rpc.RpcManagerImpl.access$000(RpcManagerImpl.java:80)
at org.infinispan.remoting.rpc.RpcManagerImpl$1.call(RpcManagerImpl.java:288)
... 5 more
Caused by: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:390)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:301)
11:44:29,979 ERROR [org.jgroups.protocols.TCP] (Timer-4,default,edg-perf01-1907) failed
sending message to edg-perf02-32536 (63 bytes): java.net.SocketException: Socket closed,
cause: null
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:179)
... 11 more
{noformat}
With UDP transport, both threads are deadlocked. I will attach thread dumps from runs
using TCP and UDP transport.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: