[jboss-jira] [JBoss JIRA] (JGRP-2406) MERGE3 not working with TCP using ForkJoinPool
Bela Ban (Jira)
issues at jboss.org
Mon Mar 9 12:41:00 EDT 2020
[ https://issues.redhat.com/browse/JGRP-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992944#comment-13992944 ]
Bela Ban commented on JGRP-2406:
--------------------------------
[~opeyrusse] OK, I think I know what happened, looking at {{timeline.txt}}.
The FJP has N worker threads, by default the number of cores available. While I'm not very familiar with the FJP, I read that it is a bad idea to submit tasks which can block (e.g. on I/O).
So when the 3 members get a {{VIEW_REQ}}, they send back a {{VIEW_RSP}}. However, that message is not received at all, only *after* the merge request times out. This is an indication that the FJP was exhausted and only processed the {{VIEW_RSP}} messages _after_ the timeout.
With real pools (regular or internal), the rejected messages are either dropped or processed by the internal thread pool or (in the worst case) by spawning a new thread, but they don't block until a worked is available.
So, in a nutshell, I don't recommend use of the FJP. Is there a reason you enabled it?
Note that you can inject your own thread pool into the transport, e.g. in cases where JGroups should use the application's thread pool.
> MERGE3 not working with TCP using ForkJoinPool
> ----------------------------------------------
>
> Key: JGRP-2406
> URL: https://issues.redhat.com/browse/JGRP-2406
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.1.8
> Reporter: Olivier Peyrusse
> Assignee: Bela Ban
> Priority: Minor
> Fix For: 4.2.1
>
> Attachments: logs.tgz, project.zip, timeline.txt
>
>
> With JDK11, using the TCP protocol with the ForkJoinPool is causing constant failures of MERGE3.
> I consistently observed the following, from the point of view of a member M
> - M asks for other coordinator views. It contacts A and B
> - A and B send their views
> - M waits and timeouts for receiving views and abort the merge
> - immediately after aborting the merge, M process messages containing the views of A and B.
> In [^timeline.txt], you will see the extracts for logs from the various members at play.
> After many experiments, the one parameter causing this issue is in the TCP protocol.
> {code:xml}
> <TCP
> ...
> thread_pool.use_fork_join_pool="true" />
> {code}
> Setting {{thread_pool.use_fork_join_pool}} to true repeatedly produces the problem, while using {{thread_pool.use_fork_join_pool}} with false works fine.
> Project details:
> - as tested within Kubernetes, this project uses KUBE_PING as its discovery protocol
> - to understand the reason for the failed merges, I created the protocol MERGE4, that is MERGE3 with additional logs.
> - [^logs.tgz] contains all logs from the various members involved in the test.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
More information about the jboss-jira
mailing list