[
https://issues.jboss.org/browse/ISPN-6799?page=com.atlassian.jira.plugin....
]
Dan Berindei commented on ISPN-6799:
------------------------------------
No more context switches on the hot path, please...
I'd say once the OOB thread pool is getting full, we're no longer on the hot path
:)
What about having async FC for responses only? If it's a response
and we run out of credits, the response would be queued, instead of blocking the thread.
How is that better than using NO_FC for response messages?
I think that the core concept for future versions of Infinispan is
that only application threads can block.
I agree. But I also think we need a way to throttle the remote threads without unlimited
queues -- those are a sure way to OOME.
OOB thread pool fills with threads trying to send remote get
responses
----------------------------------------------------------------------
Key: ISPN-6799
URL:
https://issues.jboss.org/browse/ISPN-6799
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 9.0.0.Alpha2, 8.2.2.Final
Reporter: Dan Berindei
Fix For: 9.0.0.Alpha3
Note: This is a scenario that happens in the stress tests, with 4 nodes in dist mode, and
200+ threads per node doing only reads. I have not been able to reproduce it locally, even
with a much lower OOB thread pool size and UFC.max_credits.
We don't use the {{NO_FC}} flag, so threads sending both requests and responses can
block in UFC/MFC. Remote gets are executed directly on the OOB thread, so when we run out
of credits for one node, the OOB pool can quickly become full with threads waiting to send
a remote get response to that node.
While we can't send responses to that node, we won't send credits to it, either,
as credits are only sent *after* the message has been processed by the application. That
means OOB threads on all nodes will start blocking, trying to send remote get responses to
us.
This is made a worse by our staggering of remote gets. As remote get responses block, the
stagger timeout kicks in and we send even more remote gets, making it even harder for the
system to recover.
UFC/MFC can send a {{CREDIT_REQUEST}} message to ask for more credits. The {{REPLENISH}}
messages are handled on JGroups' internal thread pool, so they are not blocked.
However, the CREDIT_REQUEST can be sent at most once every {{UFC.max_block_time}} ms, so
they can't be relied on to provide enough credits. With the default settings, the
throughput would be {{max_credits / max_block_time == 2mb / 0.5s == 4mb/s}}, which is
really small compared to regular throughput.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)