[jboss-jira] [JBoss JIRA] (JGRP-2406) MERGE3 not working with TCP using ForkJoinPool

Mon Mar 9 12:41:00 EDT 2020

    [ https://issues.redhat.com/browse/JGRP-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992944#comment-13992944 ] 

Bela Ban commented on JGRP-2406:
--------------------------------

[~opeyrusse] OK, I think I know what happened, looking at {{timeline.txt}}.

The FJP has N worker threads, by default the number of cores available. While I'm not very familiar with the FJP, I read that it is a bad idea to submit tasks which can block (e.g. on I/O).

So when the 3 members get a {{VIEW_REQ}}, they send back a {{VIEW_RSP}}. However, that message is not received at all, only *after* the merge request times out. This is an indication that the FJP was exhausted and only processed the {{VIEW_RSP}} messages _after_ the timeout.

With real pools (regular or internal), the rejected messages are either dropped or processed by the internal thread pool or (in the worst case) by spawning a new thread, but they don't block until a worked is available.

So, in a nutshell, I don't recommend use of the FJP. Is there a reason you enabled it?
Note that you can inject your own thread pool into the transport, e.g. in cases where JGroups should use the application's thread pool.

> MERGE3 not working with TCP using ForkJoinPool
> ----------------------------------------------
>
>                 Key: JGRP-2406
>                 URL: https://issues.redhat.com/browse/JGRP-2406
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 4.1.8
>            Reporter: Olivier Peyrusse
>            Assignee: Bela Ban
>            Priority: Minor
>             Fix For: 4.2.1
>
>         Attachments: logs.tgz, project.zip, timeline.txt
>
>
> With JDK11, using the TCP protocol with the ForkJoinPool is causing constant failures of MERGE3.
> I consistently observed the following, from the point of view of a member M
>  - M asks for other coordinator views. It contacts A and B
>  - A and B send their views
>  - M waits and timeouts for receiving views and abort the merge
>  - immediately after aborting the merge, M process messages containing the views of A and B.
> In  [^timeline.txt], you will see the extracts for logs from the various members at play.
> After many experiments, the one parameter causing this issue is in the TCP protocol.
> {code:xml}
> <TCP
>    ...
>    thread_pool.use_fork_join_pool="true" />
> {code}
> Setting {{thread_pool.use_fork_join_pool}} to true repeatedly produces the problem, while using {{thread_pool.use_fork_join_pool}} with false works fine.
> Project details: 
>  - as tested within Kubernetes, this project uses KUBE_PING as its discovery protocol
>  - to understand the reason for the failed merges, I created the protocol MERGE4, that is MERGE3 with additional logs.
>  -  [^logs.tgz] contains all logs from the various members involved in the test.

--
This message was sent by Atlassian Jira
(v7.13.8#713008)