[infinispan-issues] [JBoss JIRA] (ISPN-2836) org.jgroups.TimeoutException after invoking MapCombineCommand in Map/Reduce task with 2 nodes
Alan Field (JIRA)
jira-events at lists.jboss.org
Wed Jun 5 11:59:54 EDT 2013
[ https://issues.jboss.org/browse/ISPN-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779578#comment-12779578 ]
Alan Field edited comment on ISPN-2836 at 6/5/13 11:59 AM:
-----------------------------------------------------------
[~pruivo] My latest version of LoadFileStage.java will solve your decoding problem. If you set the stringData property to "true" in your benchmark file, then the cache values will use String objects. https://github.com/alanfx/radargun/blob/4932376272a0d24cf5258f287da35f7b832fd0ea/framework/src/main/java/org/radargun/stages/LoadFileStage.java
Also, if you look in the RadrGun log, you should see some statistics from the stage about how large the file is and how data was written to the cache. Something like this:
17:01:05,882 DEBUG [org.radargun.Master] (main) Starting 'LoadFileStage' on 1 slave nodes. Details: LoadFile {bucket=null, exitBenchmarkOnSlaveFailure=false, filePath=/qa/services/hudson/static_build_env/jdg/data/william-shakespeare-10MB.txt, printWriteStatistics=false, runOnAllSlaves=false, slaves=null, useSmartClassLoading=true, valueSize=8192 }
17:01:06,592 INFO [org.radargun.stages.LoadFileStage] (main) Received responses from all 1 slaves. Durations [0 = 697 milliseconds]
17:01:06,593 INFO [org.radargun.stages.LoadFileStage] (main) --------------------
17:01:06,594 INFO [org.radargun.stages.LoadFileStage] (main) Size of file '/qa/services/hudson/static_build_env/jdg/data/william-shakespeare-10MB.txt' is 11180386 bytes
17:01:06,594 INFO [org.radargun.stages.LoadFileStage] (main) Value size is '8192' which will produce 1365 keys
17:01:06,595 INFO [org.radargun.stages.LoadFileStage] (main) Slave 0 wrote 1365 values to the cache with a total size of 11180386 bytes
17:01:06,595 INFO [org.radargun.stages.LoadFileStage] (main) --------------------
was (Author: afield):
[~pruivo] My latest version of LoadFileStage.java will solve your decoding problem. If you set the stringData property to "true" in your benchmark file, then the cache values will use String objects. https://github.com/alanfx/radargun/blob/4932376272a0d24cf5258f287da35f7b832fd0ea/framework/src/main/java/org/radargun/stages/LoadFileStage.java
> org.jgroups.TimeoutException after invoking MapCombineCommand in Map/Reduce task with 2 nodes
> ---------------------------------------------------------------------------------------------
>
> Key: ISPN-2836
> URL: https://issues.jboss.org/browse/ISPN-2836
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Execution and Map/Reduce
> Affects Versions: 5.2.1.Final
> Reporter: Alan Field
> Assignee: Pedro Ruivo
> Priority: Blocker
> Labels: onboard
> Fix For: 5.3.0.Final
>
> Attachments: afield-tcp-521-final.txt, benchmark-mapreduce-multifilesize.xml, dist-udp-no-tx.xml, jgroups-udp.xml, udp-edg-perf01.txt, udp-edg-perf02.txt
>
>
> Using RadarGun and two nodes to execute the example WordCount Map/Reduce job against a cache with ~550 keys with a value size of 1MB is producing a thread deadlock. The cache is distributed with transactions disabled.
> TCP transport deadlocks without throwing an exception. Disabling the send queue and setting UNICAST2.conn_expiry_timeout=0 prevents the deadlock, but the job does not complete. The nodes send "are-you-alive" messages back and forth, and I have seen the following exception:
> {noformat}
> 11:44:29,970 ERROR [org.jgroups.protocols.TCP] (OOB-98,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (76 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:352)
> at org.radargun.cachewrappers.InfinispanMapReduceWrapper.executeMapReduceTask(InfinispanMapReduceWrapper.java:98)
> at org.radargun.stages.MapReduceStage.executeOnSlave(MapReduceStage.java:74)
> at org.radargun.Slave$2.run(Slave.java:103)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at org.infinispan.distexec.mapreduce.MapReduceTask$TaskPart.get(MapReduceTask.java:832)
> at org.infinispan.distexec.mapreduce.MapReduceTask.executeMapPhaseWithLocalReduction(MapReduceTask.java:477)
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:350)
> ... 9 more
> Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at org.infinispan.util.Util.rewrapAsCacheException(Util.java:541)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:186)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:515)
> 11:44:29,978 ERROR [org.jgroups.protocols.TCP] (Timer-3,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (60 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:175)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:197)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:254)
> at org.infinispan.remoting.rpc.RpcManagerImpl.access$000(RpcManagerImpl.java:80)
> at org.infinispan.remoting.rpc.RpcManagerImpl$1.call(RpcManagerImpl.java:288)
> ... 5 more
> Caused by: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:390)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:301)
> 11:44:29,979 ERROR [org.jgroups.protocols.TCP] (Timer-4,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (63 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:179)
> ... 11 more
> {noformat}
> With UDP transport, both threads are deadlocked. I will attach thread dumps from runs using TCP and UDP transport.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the infinispan-issues
mailing list