[infinispan-issues] [JBoss JIRA] (ISPN-6799) OOB thread pool fills with threads trying to send remote get responses

Mon Jun 27 07:15:00 EDT 2016

    [ https://issues.jboss.org/browse/ISPN-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13257289#comment-13257289 ] 

Radim Vansa edited comment on ISPN-6799 at 6/27/16 7:14 AM:
------------------------------------------------------------

IIUC the queues in TPE, these don't work well for our purposes as the threads are added (above core level) only if the queue is full. A behavior I'd prefer to see for 'queued' TPE is to use the queue only when it has active == max threads (then it should queue the task in possibly unlimited, or preferably timing-out queue). Have you ever experiment with something like this? (there's no OOTB solution for this, though - you'd have to implement it through {{RejectionExecutionHandler}})

was (Author: rvansa):
IIUC the queues in TPE, these don't work well for our purposes as the threads are added (above core level) only if the queue is full. A behavior I'd prefer to see for 'queued' TPE is to use the queue only when it has active == max threads (then it should queue the task in possibly unlimited, or preferably timing-out queue).

> OOB thread pool fills with threads trying to send remote get responses
> ----------------------------------------------------------------------
>
>                 Key: ISPN-6799
>                 URL: https://issues.jboss.org/browse/ISPN-6799
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 9.0.0.Alpha2, 8.2.2.Final
>            Reporter: Dan Berindei
>             Fix For: 9.0.0.Alpha3
>
>
> Note: This is a scenario that happens in the stress tests, with 4 nodes in dist mode, and 200+ threads per node doing only reads. I have not been able to reproduce it locally, even with a much lower OOB thread pool size and UFC.max_credits.
> We don't use the {{NO_FC}} flag, so threads sending both requests and responses can block in UFC/MFC. Remote gets are executed directly on the OOB thread, so when we run out of credits for one node, the OOB pool can quickly become full with threads waiting to send a remote get response to that node.
> While we can't send responses to that node, we won't send credits to it, either, as credits are only sent *after* the message has been processed by the application. That means OOB threads on all nodes will start blocking, trying to send remote get responses to us.
> This is made a worse by our staggering of remote gets. As remote get responses block, the stagger timeout kicks in and we send even more remote gets, making it even harder for the system to recover.
> UFC/MFC can send a {{CREDIT_REQUEST}} message to ask for more credits. The {{REPLENISH}} messages are handled on JGroups' internal thread pool, so they are not blocked. However, the CREDIT_REQUEST can be sent at most once every {{UFC.max_block_time}} ms, so they can't be relied on to provide enough credits. With the default settings, the throughput would be {{max_credits / max_block_time == 2mb / 0.5s == 4mb/s}}, which is really small compared to regular throughput.

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)