[
https://issues.jboss.org/browse/ISPN-6388?page=com.atlassian.jira.plugin....
]
Gustavo Fernandes commented on ISPN-6388:
-----------------------------------------
Here's my theory of what happened in your test.
There were failures during the iteration: either a server was down or for some reason it
stopped responding, maybe due to GC (it does not matter the reason).
When such failures occur, there is a retry with the segments that were not done, and since
from the logs you were using the Hot Rod client version 8.1.0.Final, it was being affected
by
https://issues.jboss.org/browse/ISPN-6234, where after a failover it would retry with
the wrong segments. Since the segments were wrong, the iteration would not be confined to
the local server where it contacted, causing remote RPC to obtain the segments, ultimately
provoking a cascade effect resulting on timeouts.
I believe the timeouts should not occur anymore (I was not able to reproduce), could you
maybe test again with Infinispan 8.2.1.Final (both client and server) and the Spark
connector 0.3?
Spark integration - TimeoutException: Replication timeout on
application execution
-----------------------------------------------------------------------------------
Key: ISPN-6388
URL:
https://issues.jboss.org/browse/ISPN-6388
Project: Infinispan
Issue Type: Bug
Components: Spark
Affects Versions: 8.2.0.Final
Reporter: Matej Čimbora
Assignee: Gustavo Fernandes
Attachments: app_0.txt, driver.txt, server.txt
The issue occurs sporadically while application is executing (e.g. WordCount example). To
some degree it seems to be affected by number of partitions used (i.e. higher the count,
the less likely the issue occurs).
Using 8 node cluster (1 worker/1 ISPN server per physical node), connector v. 0.2.
Attached sample driver, server, application logs.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)