[JBoss JIRA] (ISPN-3184) The DELTA_WRITE flag should force a remote get during state transfer
by Dan Berindei (JIRA)
Dan Berindei created ISPN-3184:
----------------------------------
Summary: The DELTA_WRITE flag should force a remote get during state transfer
Key: ISPN-3184
URL: https://issues.jboss.org/browse/ISPN-3184
Project: Infinispan
Issue Type: Bug
Reporter: Dan Berindei
Assignee: Dan Berindei
AtomicHashMap and FineGrainedAtomicHashMap, as well as custom DeltaAware implementations, use PutKeyValueCommands with the DELTA_WRITE flag to execute incremental updates. These commands need the previous value of the entry in order to work.
If a node is joining and it receives a PutKeyValueCommand with the DELTA_WRITE flag before it has received the value of the affected key, it should do a remote get to retrieve the previous value and apply the change on top of that value, just like we do for conditional commands. Not doing so leads to data loss.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 7 months
[JBoss JIRA] (ISPN-3175) Upgrade the java hotrod client to support remote querying
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-3175?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-3175:
--------------------------------
Description:
Once we'll have ISPN-3169(define query fluent API), ISPN-3173(add a new query operation over hotrod) and ISPN-3174(String-based query language) in place, this is about connecting the dots: invoke the remote query on the server and present the result to the user.
h3.On the client side
As described in ISPN-3173, the query is sent as a byte[] to the server: the payload.
The request payload is query specific (not defined in the HR protocol) and at this stage has the following format: [Q_TYPE] [QUERY_ST] [FIRST_INDEX] [PAGE_SIZE]. This format accommodates the remote query requirements as defined in ISPN-3169.
- Q_TYPE (protobuf's byte) and query identifier, 1 for JPAQL (this is the query type we'll support). In future we'll add different query types as well.
- QUERY_STRING (protobuf's string)- JPAQL string generated by the fluent API (ISPN-3169). Parameters are encoded in this string (vs being sent separately)
- FIRST_INDEX + PAGE_SIZE (protobuf's int)- used for paginating/iterating over the result set.
HR response: [HR_SUCESS_FLAG] [payload]
PAYLOAD = [NUM_EL] [PROJ_SIZE] [ELEMENTS]
- even though at this stage we don't support projections (see ISPN-3169) PROJ_SIZE is added for future compatibility when projection will be supported.
Note that the payload for both request and response should be marshalled with protobuf as this information is read/written over multiple clients.
h3.On the server side
- the server module reads the query operation and the payload (byte[])
- invokes QueryFacade.query(byte[]) : byte[]
- QueryFacade is an interface defined in the server modules
- has an implementation in the remote-query module (new modules)
- the reason for adding this module: RemoteQueryImpl cannot be in the server modules (ASL) as it has a dependency on the query module (LGPL) which would "contaminate" the server module. Another option would be to merge it in the query module itself, but doesn't feel like a natural fit as its responsibility is serializing data (protobuf). OTOH if it is small we can merge it there so that we don't add a new module to the system?
For an overview on the remote querying see https://community.jboss.org/wiki/QueryingDesignInInfinispan
was:
Once we'll have ISPN-3169(define query fluent API), ISPN-3173(add a new query operation over hotrod) and ISPN-3174(String-based query language) in place, this is about connecting the dots: invoke the remote query on the server and present the result to the user.
h3.On the client side
As described in ISPN-3173, the query is sent as a byte[] to the server: the payload.
The request payload is query specific (not defined in the HR protocol) and at this stage has the following format: [Q_TYPE] [QUERY_ST] [FIRST_INDEX] [PAGE_SIZE]. This format accommodates the remote query requirements as defined in ISPN-3169.
- Q_TYPE (protobuf's byte) and query identifier, 1 for JPAQL (this is the query type we'll support). In future we'll add different query types as well.
- QUERY_STRING (protobuf's string)- JPAQL string generated by the fluent API (ISPN-3169). Parameters are encoded in this string (vs being sent separately)
- FIRST_INDEX + PAGE_SIZE (protobuf's int)- used for paginating/iterating over the result set.
HR response: [HR_SUCESS_FLAG] [payload]
PAYLOAD = [NUM_EL] [PROJ_SIZE] [ELEMENTS]
- even though at this stage we don't support projections (see ISPN-3169) PROJ_SIZE is added for future compatibility when projection will be supported.
Note that the payload for both request and response should be marshalled with protobuf as this information is read/written over multiple clients.
h3.On the server side
For an overview on the remote querying see https://community.jboss.org/wiki/QueryingDesignInInfinispan
> Upgrade the java hotrod client to support remote querying
> ---------------------------------------------------------
>
> Key: ISPN-3175
> URL: https://issues.jboss.org/browse/ISPN-3175
> Project: Infinispan
> Issue Type: Feature Request
> Reporter: Mircea Markus
> Assignee: Mircea Markus
>
> Once we'll have ISPN-3169(define query fluent API), ISPN-3173(add a new query operation over hotrod) and ISPN-3174(String-based query language) in place, this is about connecting the dots: invoke the remote query on the server and present the result to the user.
> h3.On the client side
> As described in ISPN-3173, the query is sent as a byte[] to the server: the payload.
> The request payload is query specific (not defined in the HR protocol) and at this stage has the following format: [Q_TYPE] [QUERY_ST] [FIRST_INDEX] [PAGE_SIZE]. This format accommodates the remote query requirements as defined in ISPN-3169.
> - Q_TYPE (protobuf's byte) and query identifier, 1 for JPAQL (this is the query type we'll support). In future we'll add different query types as well.
> - QUERY_STRING (protobuf's string)- JPAQL string generated by the fluent API (ISPN-3169). Parameters are encoded in this string (vs being sent separately)
> - FIRST_INDEX + PAGE_SIZE (protobuf's int)- used for paginating/iterating over the result set.
> HR response: [HR_SUCESS_FLAG] [payload]
> PAYLOAD = [NUM_EL] [PROJ_SIZE] [ELEMENTS]
> - even though at this stage we don't support projections (see ISPN-3169) PROJ_SIZE is added for future compatibility when projection will be supported.
> Note that the payload for both request and response should be marshalled with protobuf as this information is read/written over multiple clients.
> h3.On the server side
> - the server module reads the query operation and the payload (byte[])
> - invokes QueryFacade.query(byte[]) : byte[]
> - QueryFacade is an interface defined in the server modules
> - has an implementation in the remote-query module (new modules)
> - the reason for adding this module: RemoteQueryImpl cannot be in the server modules (ASL) as it has a dependency on the query module (LGPL) which would "contaminate" the server module. Another option would be to merge it in the query module itself, but doesn't feel like a natural fit as its responsibility is serializing data (protobuf). OTOH if it is small we can merge it there so that we don't add a new module to the system?
>
> For an overview on the remote querying see https://community.jboss.org/wiki/QueryingDesignInInfinispan
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 7 months
[JBoss JIRA] (ISPN-3175) Upgrade the java hotrod client to support remote querying
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-3175?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-3175:
--------------------------------
Description:
Once we'll have ISPN-3169(define query fluent API), ISPN-3173(add a new query operation over hotrod) and ISPN-3174(String-based query language) in place, this is about connecting the dots: invoke the remote query on the server and present the result to the user.
h3.On the client side
As described in ISPN-3173, the query is sent as a byte[] to the server: the payload.
The request payload is query specific (not defined in the HR protocol) and at this stage has the following format: [Q_TYPE] [QUERY_ST] [FIRST_INDEX] [PAGE_SIZE]. This format accommodates the remote query requirements as defined in ISPN-3169.
- Q_TYPE (protobuf's byte) and query identifier, 1 for JPAQL (this is the query type we'll support). In future we'll add different query types as well.
- QUERY_STRING (protobuf's string)- JPAQL string generated by the fluent API (ISPN-3169). Parameters are encoded in this string (vs being sent separately)
- FIRST_INDEX + PAGE_SIZE (protobuf's int)- used for paginating/iterating over the result set.
HR response: [HR_SUCESS_FLAG] [payload]
PAYLOAD = [NUM_EL] [PROJ_SIZE] [ELEMENTS]
- even though at this stage we don't support projections (see ISPN-3169) PROJ_SIZE is added for future compatibility when projection will be supported.
Note that the payload for both request and response should be marshalled with protobuf as this information is read/written over multiple clients.
h3.On the server side
For an overview on the remote querying see https://community.jboss.org/wiki/QueryingDesignInInfinispan
was:See #7: https://community.jboss.org/wiki/QueryingDesignInInfinispan
> Upgrade the java hotrod client to support remote querying
> ---------------------------------------------------------
>
> Key: ISPN-3175
> URL: https://issues.jboss.org/browse/ISPN-3175
> Project: Infinispan
> Issue Type: Feature Request
> Reporter: Mircea Markus
> Assignee: Mircea Markus
>
> Once we'll have ISPN-3169(define query fluent API), ISPN-3173(add a new query operation over hotrod) and ISPN-3174(String-based query language) in place, this is about connecting the dots: invoke the remote query on the server and present the result to the user.
> h3.On the client side
> As described in ISPN-3173, the query is sent as a byte[] to the server: the payload.
> The request payload is query specific (not defined in the HR protocol) and at this stage has the following format: [Q_TYPE] [QUERY_ST] [FIRST_INDEX] [PAGE_SIZE]. This format accommodates the remote query requirements as defined in ISPN-3169.
> - Q_TYPE (protobuf's byte) and query identifier, 1 for JPAQL (this is the query type we'll support). In future we'll add different query types as well.
> - QUERY_STRING (protobuf's string)- JPAQL string generated by the fluent API (ISPN-3169). Parameters are encoded in this string (vs being sent separately)
> - FIRST_INDEX + PAGE_SIZE (protobuf's int)- used for paginating/iterating over the result set.
> HR response: [HR_SUCESS_FLAG] [payload]
> PAYLOAD = [NUM_EL] [PROJ_SIZE] [ELEMENTS]
> - even though at this stage we don't support projections (see ISPN-3169) PROJ_SIZE is added for future compatibility when projection will be supported.
> Note that the payload for both request and response should be marshalled with protobuf as this information is read/written over multiple clients.
> h3.On the server side
> For an overview on the remote querying see https://community.jboss.org/wiki/QueryingDesignInInfinispan
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 7 months
[JBoss JIRA] (ISPN-2990) L1ManagerImpl doesn't reliably invalidate with async caches
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-2990?page=com.atlassian.jira.plugin.... ]
William Burns reassigned ISPN-2990:
-----------------------------------
Assignee: William Burns (was: Mircea Markus)
> L1ManagerImpl doesn't reliably invalidate with async caches
> -----------------------------------------------------------
>
> Key: ISPN-2990
> URL: https://issues.jboss.org/browse/ISPN-2990
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.1.Final
> Reporter: Sebastian Tusk
> Assignee: William Burns
> Labels: onboard
>
> B is owner of k,v1
> A has k,v1 in L1
> 1. TX: A puts k,v2
> 2. TX: A sends async PrepareCommand k,v2 to B (one-phase-commit)
> 3. TX: A removes k,v1 from L1
> 4. A putForExternalRead k,v1 and has it in L1 again
> 5. TX: B executes PrepareCommand k,v2 but doesn't send invalidation to origin
> Result: A has k,v1 and B has k,v2
> Solution: For async caches send invalidation to origin too.
> The problem is that the owner updates the cache entry asynchronously. This gives the origin of the transaction time to request the entry. Here an outdated version is received and placed in L1. The owner never invalidates the entry and as result the cache is inconsistent.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 7 months
[JBoss JIRA] (ISPN-2990) L1ManagerImpl doesn't reliably invalidate with async caches
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-2990?page=com.atlassian.jira.plugin.... ]
Work on ISPN-2990 started by William Burns.
> L1ManagerImpl doesn't reliably invalidate with async caches
> -----------------------------------------------------------
>
> Key: ISPN-2990
> URL: https://issues.jboss.org/browse/ISPN-2990
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.1.Final
> Reporter: Sebastian Tusk
> Assignee: William Burns
> Labels: onboard
>
> B is owner of k,v1
> A has k,v1 in L1
> 1. TX: A puts k,v2
> 2. TX: A sends async PrepareCommand k,v2 to B (one-phase-commit)
> 3. TX: A removes k,v1 from L1
> 4. A putForExternalRead k,v1 and has it in L1 again
> 5. TX: B executes PrepareCommand k,v2 but doesn't send invalidation to origin
> Result: A has k,v1 and B has k,v2
> Solution: For async caches send invalidation to origin too.
> The problem is that the owner updates the cache entry asynchronously. This gives the origin of the transaction time to request the entry. Here an outdated version is received and placed in L1. The owner never invalidates the entry and as result the cache is inconsistent.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 7 months
[JBoss JIRA] (ISPN-3173) Define a new query operation over hotrod
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-3173?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-3173:
--------------------------------
Description:
Add a new query predicated to the [Hot Rod Protocol | https://docs.jboss.org/author/display/ISPN/Hot+Rod+Protocol+-+Version+1.2].
For now this operation should send an byte[] (the query encoded) over the network and receive an byte[] as response. The query and the result might get further specified in the future.
This will be used by ISPN-3175 to implement the initial remote querying functionality the java hotrod client.
was:
HR request: [op_id] [payload].
The request payload is query specific (not defined in the HR protocol) and at this stage has the following format: [Q_TYPE] [QUERY_ST] [FIRST_INDEX] [PAGE_SIZE]. This format accommodates the remote query requirements as defined in ISPN-3169.
- Q_TYPE (protobuf's byte) and query identifier, 1 for JPAQL (this is the query type we'll support). In future we'll add different query types as well.
- QUERY_STRING (protobuf's string)- JPAQL string generated by the fluent API (ISPN-3169). Parameters are encoded in this string (vs being sent separately)
- FIRST_INDEX + PAGE_SIZE (protobuf's int)- used for paginating/iterating over the result set.
HR response: [HR_SUCESS_FLAG] [payload]
PAYLOAD = [NUM_EL] [PROJ_SIZE] [ELEMENTS]
- even though at this stage we don't support projections (see ISPN-3169) PROJ_SIZE is added for future compatibility when projection will be supported.
Note that the payload for both request and response should be marshalled with protobuf as this information is read/written over multiple clients.
> Define a new query operation over hotrod
> ----------------------------------------
>
> Key: ISPN-3173
> URL: https://issues.jboss.org/browse/ISPN-3173
> Project: Infinispan
> Issue Type: Feature Request
> Reporter: Mircea Markus
> Assignee: Mircea Markus
>
> Add a new query predicated to the [Hot Rod Protocol | https://docs.jboss.org/author/display/ISPN/Hot+Rod+Protocol+-+Version+1.2].
> For now this operation should send an byte[] (the query encoded) over the network and receive an byte[] as response. The query and the result might get further specified in the future.
> This will be used by ISPN-3175 to implement the initial remote querying functionality the java hotrod client.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 7 months
[JBoss JIRA] (ISPN-2836) org.jgroups.TimeoutException after invoking MapCombineCommand in Map/Reduce task with 2 nodes
by Alan Field (JIRA)
[ https://issues.jboss.org/browse/ISPN-2836?page=com.atlassian.jira.plugin.... ]
Alan Field commented on ISPN-2836:
----------------------------------
It is possible to produce the problem with distributeReducePhase="false", if the amount of data in the cache is large enough. (See comment above. https://issues.jboss.org/browse/ISPN-2836?focusedCommentId=12763055&page=...) With distributeReducePhase="true", the problem is much easier to reproduce.
I guess it just bothers me a little that the configuration of Sync.replTimeout is so sensitive to the data size.
> org.jgroups.TimeoutException after invoking MapCombineCommand in Map/Reduce task with 2 nodes
> ---------------------------------------------------------------------------------------------
>
> Key: ISPN-2836
> URL: https://issues.jboss.org/browse/ISPN-2836
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Execution and Map/Reduce
> Affects Versions: 5.2.1.Final
> Reporter: Alan Field
> Assignee: Pedro Ruivo
> Priority: Blocker
> Labels: onboard
> Fix For: 5.3.0.Final
>
> Attachments: afield-tcp-521-final.txt, benchmark-mapreduce-multifilesize.xml, dist-udp-no-tx.xml, jgroups-udp.xml, udp-edg-perf01.txt, udp-edg-perf02.txt
>
>
> Using RadarGun and two nodes to execute the example WordCount Map/Reduce job against a cache with ~550 keys with a value size of 1MB is producing a thread deadlock. The cache is distributed with transactions disabled.
> TCP transport deadlocks without throwing an exception. Disabling the send queue and setting UNICAST2.conn_expiry_timeout=0 prevents the deadlock, but the job does not complete. The nodes send "are-you-alive" messages back and forth, and I have seen the following exception:
> {noformat}
> 11:44:29,970 ERROR [org.jgroups.protocols.TCP] (OOB-98,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (76 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:352)
> at org.radargun.cachewrappers.InfinispanMapReduceWrapper.executeMapReduceTask(InfinispanMapReduceWrapper.java:98)
> at org.radargun.stages.MapReduceStage.executeOnSlave(MapReduceStage.java:74)
> at org.radargun.Slave$2.run(Slave.java:103)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at org.infinispan.distexec.mapreduce.MapReduceTask$TaskPart.get(MapReduceTask.java:832)
> at org.infinispan.distexec.mapreduce.MapReduceTask.executeMapPhaseWithLocalReduction(MapReduceTask.java:477)
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:350)
> ... 9 more
> Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at org.infinispan.util.Util.rewrapAsCacheException(Util.java:541)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:186)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:515)
> 11:44:29,978 ERROR [org.jgroups.protocols.TCP] (Timer-3,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (60 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:175)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:197)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:254)
> at org.infinispan.remoting.rpc.RpcManagerImpl.access$000(RpcManagerImpl.java:80)
> at org.infinispan.remoting.rpc.RpcManagerImpl$1.call(RpcManagerImpl.java:288)
> ... 5 more
> Caused by: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:390)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:301)
> 11:44:29,979 ERROR [org.jgroups.protocols.TCP] (Timer-4,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (63 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:179)
> ... 11 more
> {noformat}
> With UDP transport, both threads are deadlocked. I will attach thread dumps from runs using TCP and UDP transport.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 7 months
[JBoss JIRA] (ISPN-2836) org.jgroups.TimeoutException after invoking MapCombineCommand in Map/Reduce task with 2 nodes
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-2836?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo commented on ISPN-2836:
-----------------------------------
If you try with distributeReducePhase="false", the problem does not occur.
I checked the code and if distributeReducePhase="true", the results are put in a temporary cache (in our case, word => list of counters). This generates a round-trip communication and maybe that why it is taking too long to execute the task. And don't forget that all nodes perform this put operations. I believe this is used to share the results among the nodes to perform the distributed reduce.
On other hand, if distributeReducePhase="false", the results are sent back to the node where the map/reduce was invoked (that looks faster).
I think that is possible to optimize the distributeReducePhase="true" scenario. Instead of generating a round-trip for each key, the puts can be batched in a transaction and then it can commit multiple keys at the same time.
> org.jgroups.TimeoutException after invoking MapCombineCommand in Map/Reduce task with 2 nodes
> ---------------------------------------------------------------------------------------------
>
> Key: ISPN-2836
> URL: https://issues.jboss.org/browse/ISPN-2836
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Execution and Map/Reduce
> Affects Versions: 5.2.1.Final
> Reporter: Alan Field
> Assignee: Pedro Ruivo
> Priority: Blocker
> Labels: onboard
> Fix For: 5.3.0.Final
>
> Attachments: afield-tcp-521-final.txt, benchmark-mapreduce-multifilesize.xml, dist-udp-no-tx.xml, jgroups-udp.xml, udp-edg-perf01.txt, udp-edg-perf02.txt
>
>
> Using RadarGun and two nodes to execute the example WordCount Map/Reduce job against a cache with ~550 keys with a value size of 1MB is producing a thread deadlock. The cache is distributed with transactions disabled.
> TCP transport deadlocks without throwing an exception. Disabling the send queue and setting UNICAST2.conn_expiry_timeout=0 prevents the deadlock, but the job does not complete. The nodes send "are-you-alive" messages back and forth, and I have seen the following exception:
> {noformat}
> 11:44:29,970 ERROR [org.jgroups.protocols.TCP] (OOB-98,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (76 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:352)
> at org.radargun.cachewrappers.InfinispanMapReduceWrapper.executeMapReduceTask(InfinispanMapReduceWrapper.java:98)
> at org.radargun.stages.MapReduceStage.executeOnSlave(MapReduceStage.java:74)
> at org.radargun.Slave$2.run(Slave.java:103)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at org.infinispan.distexec.mapreduce.MapReduceTask$TaskPart.get(MapReduceTask.java:832)
> at org.infinispan.distexec.mapreduce.MapReduceTask.executeMapPhaseWithLocalReduction(MapReduceTask.java:477)
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:350)
> ... 9 more
> Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at org.infinispan.util.Util.rewrapAsCacheException(Util.java:541)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:186)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:515)
> 11:44:29,978 ERROR [org.jgroups.protocols.TCP] (Timer-3,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (60 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:175)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:197)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:254)
> at org.infinispan.remoting.rpc.RpcManagerImpl.access$000(RpcManagerImpl.java:80)
> at org.infinispan.remoting.rpc.RpcManagerImpl$1.call(RpcManagerImpl.java:288)
> ... 5 more
> Caused by: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
> at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:390)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:301)
> 11:44:29,979 ERROR [org.jgroups.protocols.TCP] (Timer-4,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (63 bytes): java.net.SocketException: Socket closed, cause: null
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:179)
> ... 11 more
> {noformat}
> With UDP transport, both threads are deadlocked. I will attach thread dumps from runs using TCP and UDP transport.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 7 months