[JBoss JIRA] (ISPN-2836) org.jgroups.TimeoutException after invoking MapCombineCommand in Map/Reduce task with 2 nodes
by Alan Field (JIRA)
[ https://issues.jboss.org/browse/ISPN-2836?page=com.atlassian.jira.plugin.... ]
Alan Field updated ISPN-2836:
-----------------------------
Attachment: dist-udp-no-tx.xml
jgroups-udp.xml
benchmark-mapreduce-multifilesize.xml
These are the config, JGroups, and RadarGun files for the Infinispan 5.2 runs
> org.jgroups.TimeoutException after invoking MapCombineCommand in Map/Reduce task with 2 nodes
> ---------------------------------------------------------------------------------------------
>
> Key: ISPN-2836
> URL: https://issues.jboss.org/browse/ISPN-2836
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Execution and Map/Reduce
> Affects Versions: 5.2.1.Final
> Reporter: Alan Field
> Assignee: Pedro Ruivo
> Priority: Blocker
> Labels: onboard
> Fix For: 5.3.0.Final
>
> Attachments: afield-tcp-521-final.txt, benchmark-mapreduce-multifilesize.xml, dist-udp-no-tx.xml, jgroups-udp.xml, udp-edg-perf01.txt, udp-edg-perf02.txt
>
>
> Using RadarGun and two nodes to execute the example WordCount Map/Reduce job against a cache with ~550 keys with a value size of 1MB is producing a thread deadlock. The cache is distributed with transactions disabled.
> TCP transport deadlocks without throwing an exception. Disabling the send queue and setting UNICAST2.conn_expiry_timeout=0 prevents the deadlock, but the job does not complete. The nodes send "are-you-alive" messages back and forth, and I have seen the following exception:
> {noformat}
> 11:44:29,970 ERROR [org.jgroups.protocols.TCP] (OOB-98,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (76 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:352)
> at org.radargun.cachewrappers.InfinispanMapReduceWrapper.executeMapReduceTask(InfinispanMapReduceWrapper.java:98)
> at org.radargun.stages.MapReduceStage.executeOnSlave(MapReduceStage.java:74)
> at org.radargun.Slave$2.run(Slave.java:103)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at org.infinispan.distexec.mapreduce.MapReduceTask$TaskPart.get(MapReduceTask.java:832)
> at org.infinispan.distexec.mapreduce.MapReduceTask.executeMapPhaseWithLocalReduction(MapReduceTask.java:477)
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:350)
> ... 9 more
> Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at org.infinispan.util.Util.rewrapAsCacheException(Util.java:541)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:186)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:515)
> 11:44:29,978 ERROR [org.jgroups.protocols.TCP] (Timer-3,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (60 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:175)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:197)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:254)
> at org.infinispan.remoting.rpc.RpcManagerImpl.access$000(RpcManagerImpl.java:80)
> at org.infinispan.remoting.rpc.RpcManagerImpl$1.call(RpcManagerImpl.java:288)
> ... 5 more
> Caused by: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:390)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:301)
> 11:44:29,979 ERROR [org.jgroups.protocols.TCP] (Timer-4,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (63 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:179)
> ... 11 more
> {noformat}
> With UDP transport, both threads are deadlocked. I will attach thread dumps from runs using TCP and UDP transport.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months
[JBoss JIRA] (ISPN-2836) org.jgroups.TimeoutException after invoking MapCombineCommand in Map/Reduce task with 2 nodes
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-2836?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo commented on ISPN-2836:
-----------------------------------
[~afield] could you please attach the Radargun, Infinispan and JGroups configuration files? thanks!
> org.jgroups.TimeoutException after invoking MapCombineCommand in Map/Reduce task with 2 nodes
> ---------------------------------------------------------------------------------------------
>
> Key: ISPN-2836
> URL: https://issues.jboss.org/browse/ISPN-2836
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Execution and Map/Reduce
> Affects Versions: 5.2.1.Final
> Reporter: Alan Field
> Assignee: Pedro Ruivo
> Priority: Blocker
> Labels: onboard
> Fix For: 5.3.0.Final
>
> Attachments: afield-tcp-521-final.txt, udp-edg-perf01.txt, udp-edg-perf02.txt
>
>
> Using RadarGun and two nodes to execute the example WordCount Map/Reduce job against a cache with ~550 keys with a value size of 1MB is producing a thread deadlock. The cache is distributed with transactions disabled.
> TCP transport deadlocks without throwing an exception. Disabling the send queue and setting UNICAST2.conn_expiry_timeout=0 prevents the deadlock, but the job does not complete. The nodes send "are-you-alive" messages back and forth, and I have seen the following exception:
> {noformat}
> 11:44:29,970 ERROR [org.jgroups.protocols.TCP] (OOB-98,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (76 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:352)
> at org.radargun.cachewrappers.InfinispanMapReduceWrapper.executeMapReduceTask(InfinispanMapReduceWrapper.java:98)
> at org.radargun.stages.MapReduceStage.executeOnSlave(MapReduceStage.java:74)
> at org.radargun.Slave$2.run(Slave.java:103)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at org.infinispan.distexec.mapreduce.MapReduceTask$TaskPart.get(MapReduceTask.java:832)
> at org.infinispan.distexec.mapreduce.MapReduceTask.executeMapPhaseWithLocalReduction(MapReduceTask.java:477)
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:350)
> ... 9 more
> Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at org.infinispan.util.Util.rewrapAsCacheException(Util.java:541)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:186)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:515)
> 11:44:29,978 ERROR [org.jgroups.protocols.TCP] (Timer-3,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (60 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:175)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:197)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:254)
> at org.infinispan.remoting.rpc.RpcManagerImpl.access$000(RpcManagerImpl.java:80)
> at org.infinispan.remoting.rpc.RpcManagerImpl$1.call(RpcManagerImpl.java:288)
> ... 5 more
> Caused by: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:390)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:301)
> 11:44:29,979 ERROR [org.jgroups.protocols.TCP] (Timer-4,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (63 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:179)
> ... 11 more
> {noformat}
> With UDP transport, both threads are deadlocked. I will attach thread dumps from runs using TCP and UDP transport.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months
[JBoss JIRA] (ISPN-2836) org.jgroups.TimeoutException after invoking MapCombineCommand in Map/Reduce task with 2 nodes
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-2836?page=com.atlassian.jira.plugin.... ]
Work on ISPN-2836 started by Pedro Ruivo.
> org.jgroups.TimeoutException after invoking MapCombineCommand in Map/Reduce task with 2 nodes
> ---------------------------------------------------------------------------------------------
>
> Key: ISPN-2836
> URL: https://issues.jboss.org/browse/ISPN-2836
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Execution and Map/Reduce
> Affects Versions: 5.2.1.Final
> Reporter: Alan Field
> Assignee: Pedro Ruivo
> Priority: Blocker
> Labels: onboard
> Fix For: 5.3.0.Final
>
> Attachments: afield-tcp-521-final.txt, udp-edg-perf01.txt, udp-edg-perf02.txt
>
>
> Using RadarGun and two nodes to execute the example WordCount Map/Reduce job against a cache with ~550 keys with a value size of 1MB is producing a thread deadlock. The cache is distributed with transactions disabled.
> TCP transport deadlocks without throwing an exception. Disabling the send queue and setting UNICAST2.conn_expiry_timeout=0 prevents the deadlock, but the job does not complete. The nodes send "are-you-alive" messages back and forth, and I have seen the following exception:
> {noformat}
> 11:44:29,970 ERROR [org.jgroups.protocols.TCP] (OOB-98,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (76 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:352)
> at org.radargun.cachewrappers.InfinispanMapReduceWrapper.executeMapReduceTask(InfinispanMapReduceWrapper.java:98)
> at org.radargun.stages.MapReduceStage.executeOnSlave(MapReduceStage.java:74)
> at org.radargun.Slave$2.run(Slave.java:103)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at org.infinispan.distexec.mapreduce.MapReduceTask$TaskPart.get(MapReduceTask.java:832)
> at org.infinispan.distexec.mapreduce.MapReduceTask.executeMapPhaseWithLocalReduction(MapReduceTask.java:477)
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:350)
> ... 9 more
> Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at org.infinispan.util.Util.rewrapAsCacheException(Util.java:541)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:186)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:515)
> 11:44:29,978 ERROR [org.jgroups.protocols.TCP] (Timer-3,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (60 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:175)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:197)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:254)
> at org.infinispan.remoting.rpc.RpcManagerImpl.access$000(RpcManagerImpl.java:80)
> at org.infinispan.remoting.rpc.RpcManagerImpl$1.call(RpcManagerImpl.java:288)
> ... 5 more
> Caused by: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:390)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:301)
> 11:44:29,979 ERROR [org.jgroups.protocols.TCP] (Timer-4,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (63 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:179)
> ... 11 more
> {noformat}
> With UDP transport, both threads are deadlocked. I will attach thread dumps from runs using TCP and UDP transport.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months
[JBoss JIRA] (ISPN-761) Cache.keySet(), entrySet(), values(), size() ignore contents of cache loader
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-761?page=com.atlassian.jira.plugin.s... ]
William Burns updated ISPN-761:
-------------------------------
Fix Version/s: 6.0.0.Alpha1
6.0.0.Final
(was: 5.3.0.Final)
> Cache.keySet(),entrySet(),values(),size() ignore contents of cache loader
> -------------------------------------------------------------------------
>
> Key: ISPN-761
> URL: https://issues.jboss.org/browse/ISPN-761
> Project: Infinispan
> Issue Type: Bug
> Components: Loaders and Stores
> Affects Versions: 4.2.0.BETA1
> Reporter: Paul Ferraro
> Assignee: William Burns
> Priority: Critical
> Labels: onboard, potential_performance_hit
> Fix For: 6.0.0.Alpha1, 6.0.0.Final
>
>
> Passivated cache entries are not represented in values returned by the keySet(), entrySet(), values(), and size() Cache methods. This results in inconsistent behavior, since it is possible that a given cache key may not be contained in keySet() even though Cache.get(...) would return a non-null value if the entry was previously passivated.
> I think CacheLoaderInterceptor needs to implement the following methods:
> Object visitSizeCommand(InvocationContext ctx, SizeCommand command) throws Throwable
> Object visitValuesCommand(InvocationContext ctx, ValuesCommand command) throws Throwable
> Object visitEntrySetCommand(InvocationContext ctx, EntrySetCommand command) throws Throwable
> Object visitKeySetCommand(InvocationContext ctx, KeySetCommand command) throws Throwable
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months
[JBoss JIRA] (ISPN-2835) Issues w/ M/R test cases if cache are not explicitly started on all nodes
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-2835?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño commented on ISPN-2835:
----------------------------------------
This looks very similar to ISPN-2353...
> Issues w/ M/R test cases if cache are not explicitly started on all nodes
> -------------------------------------------------------------------------
>
> Key: ISPN-2835
> URL: https://issues.jboss.org/browse/ISPN-2835
> Project: Infinispan
> Issue Type: Bug
> Components: Core API, Distributed Execution and Map/Reduce
> Reporter: Ray Tsang
> Assignee: Galder Zamarreño
> Labels: onboard
> Fix For: 5.3.0.Final
>
> Attachments: mr-test-src.zip
>
>
> I ran into some issues while using M/R. The gist of the issue I was seeing is that:
> Start a cluster of Embedded Caches, like 4 nodes
> Put in 100 elements
> Run a simple M/R job to count the number of keys
> If I run the M/R job using the node I'm inserting elements into as coordinator - the result is 100
> But if I run the M/R job using a different node as coordinator, the result is less than 100
> More interestingly, I can pause for 5 seconds and run the M/R jobs again, the results are always less than 100
> This behavior doens't occur if I explicitly run cacheManager.getCache() for each of the nodes...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months
[JBoss JIRA] (ISPN-2115) Relative path in rhq ant task cause build to fail
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-2115?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño updated ISPN-2115:
-----------------------------------
Assignee: Tristan Tarrant (was: Galder Zamarreño)
> Relative path in rhq ant task cause build to fail
> -------------------------------------------------
>
> Key: ISPN-2115
> URL: https://issues.jboss.org/browse/ISPN-2115
> Project: Infinispan
> Issue Type: Bug
> Components: Build process
> Reporter: Thomas Fromm
> Assignee: Tristan Tarrant
> Fix For: 5.3.0.Final
>
>
> When the directory ../../ above the infinispan source dir is not writable,
> than build fails:
> [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (deploy) on project infinispan-rhq-plugin: An Ant BuildException has occured: Directory /usr/local/inubit/jon/dev-container/jbossas/server/default/deploy/${rhq.earName}/rhq-downloads/rhq-plugins creation was not successful for an unknown reason
> [ERROR] around Ant part ...<mkdir dir="../../..//jon/dev-container/jbossas/server/default/deploy/${rhq.earName}/rhq-downloads/rhq-plugins"/>... @ 4:116 in /usr/local/inubit/tf/infinispan/rhq-plugin/target/antrun/build-main.xml
> [ERROR] -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (deploy) on project infinispan-rhq-plugin: An Ant BuildException has occured: Directory /usr/local/inubit/jon/dev-container/jbossas/server/default/deploy/${rhq.earName}/rhq-downloads/rhq-plugins creation was not successful for an unknown reason
> In current case the /usr/local/inubit/ is not writable for the user.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 9 months