[infinispan-issues] [JBoss JIRA] (ISPN-6799) OOB thread pool fills with threads trying to send remote get responses

Monday, 27 June 2016

    [
https://issues.jboss.org/browse/ISPN-6799?page=com.atlassian.jira.plugin....
] 

Radim Vansa commented on ISPN-6799:
-----------------------------------

{quote}> No more context switches on the hot path, please...
I'd say once the OOB thread pool is getting full, we're no longer on the hot
path{quote}

True, I was talking about executing all reads in the remote thread pool. Checking status
of TP and deciding seems to me rather complicated, and prefer a simple clean design rather
than such workarounds.

I'll add our IRC chat to complement..
{quote}
(02:10:13 PM) dberindei: rvansa: I have also asked Bela about non-blocking FC, but he says
that would defeat the purpose of FC :)
(02:10:53 PM) rvansa: dberindei: I don't agree - it won't send any further
message, so it does block
(02:11:05 PM) emmanuel is now known as emmanuel_off
(02:11:17 PM) rvansa: dberindei: I mean 'block' in the sense of the purpose 
(02:11:45 PM) dberindei: rvansa: if you have a thread that sends GET_NONE messages, the
only way to really throttle it is to block
(02:11:47 PM) rvansa: dberindei: I agree that app threads should be blocked in FC
(02:13:10 PM) dberindei: rvansa: I think our threads should be prevented from sending too
many messages, too
(02:13:31 PM) dberindei: rvansa: I agree that blocking is bad, but we need to come up with
an alternative
(02:14:36 PM) rvansa: dberindei: you will throttle responses through the queue,
(02:14:52 PM) dberindei: rvansa: wouldn't the queue also block in order to throttle?
(02:15:52 PM) rvansa: dberindei: the thing is that, assuming limited number of app
threads, they'll still be throttled on the remote side, because they won't get the
RPC response
(02:16:34 PM) dberindei: rvansa: yeah, except in our tests we usually have more client
threads than OOB threads :)
(02:17:01 PM) rvansa: and?
(02:17:35 PM) dberindei: rvansa: and you still get way too many messages in that queue, I
think
(02:18:05 PM) dberindei: rvansa: it should be based on the capacity of the
"server", not on the capacity of the client to send requests
(02:19:21 PM) dberindei: rvansa: anyway, I'm not the one you need to convince :)
(02:19:26 PM) rvansa: dberindei: It's not client-server, it's p2p
(02:19:47 PM) rvansa: dberindei: you'll apply backpressure to the app threads, when
they won't get the response. True, if the app starts won't react to that, the
other party will blow up as the queue won't be able to handle that
(02:20:30 PM) dberindei: rvansa: exactly, we need to handle async operations just as well
as sync operations
(02:20:45 PM) dberindei: rvansa: it won't be long until the HotRod server is also
async
(02:20:55 PM) rvansa: dberindei: But then you need another mechanism to block the
demanding app
(02:21:05 PM) rvansa: dberindei: deadlock != throttling
(02:21:43 PM) dberindei: rvansa: I'm not saying we have to deadlock, I'm just
saying we need to throttle our threads as well, not just the application threads
(02:23:42 PM) rvansa: And btw., it's not up to Bela to decide what Infinispan needs,
you can always write your ASYNC_UFC/MFC.  Even without the ugly capital letters :)
(02:23:50 PM) rvansa: dberindei: I don't agree here
(02:24:10 PM) rvansa: dberindei: having zillion of threads blocked won't help anyone
{quote}

...
 OOB thread pool fills with threads trying to send remote get
responses
 ----------------------------------------------------------------------

                 Key: ISPN-6799
                 URL: https://issues.jboss.org/browse/ISPN-6799
             Project: Infinispan
          Issue Type: Bug
          Components: Core
    Affects Versions: 9.0.0.Alpha2, 8.2.2.Final
            Reporter: Dan Berindei
             Fix For: 9.0.0.Alpha3

 Note: This is a scenario that happens in the stress tests, with 4 nodes in dist mode, and
200+ threads per node doing only reads. I have not been able to reproduce it locally, even
with a much lower OOB thread pool size and UFC.max_credits.
 We don't use the {{NO_FC}} flag, so threads sending both requests and responses can
block in UFC/MFC. Remote gets are executed directly on the OOB thread, so when we run out
of credits for one node, the OOB pool can quickly become full with threads waiting to send
a remote get response to that node.
 While we can't send responses to that node, we won't send credits to it, either,
as credits are only sent *after* the message has been processed by the application. That
means OOB threads on all nodes will start blocking, trying to send remote get responses to
us.
 This is made a worse by our staggering of remote gets. As remote get responses block, the
stagger timeout kicks in and we send even more remote gets, making it even harder for the
system to recover.
 UFC/MFC can send a {{CREDIT_REQUEST}} message to ask for more credits. The {{REPLENISH}}
messages are handled on JGroups' internal thread pool, so they are not blocked.
However, the CREDIT_REQUEST can be sent at most once every {{UFC.max_block_time}} ms, so
they can't be relied on to provide enough credits. With the default settings, the
throughput would be {{max_credits / max_block_time == 2mb / 0.5s == 4mb/s}}, which is
really small compared to regular throughput. 

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-issues] [JBoss JIRA] (ISPN-6799) OOB thread pool fills with threads trying to send remote get responses