[infinispan-dev] Threadpools in a large cluster

Mon Feb 4 09:02:56 EST 2013

I agree that an application thread pool just pushes the issue of the OOB pool running out of threads elsewhere, but that is only one of the two problems Radim has.

The other is that nodes get suspected and kicked out because heartbeat messages get blocked as well.  Same thing with FC credit messages.  By having a separate application pool, at least we guarantee that the cluster service messages get handled… 

On 3 Feb 2013, at 11:23, Bela Ban <bban at redhat.com> wrote:

> A new thread pool owned by Infinispan is certainly something desirable, 
> as discussed in Palma, but I think it wouldn't solve the issue Radim ran 
> into, namely threads being used despite the fact that they only wait for 
> another blocking RPC to finish.
> 
> If we made the JGroups thread return immediately by transferring control 
> to an Infinispan thread, then we'd simply move the issue from the former 
> to the latter pool. Eventually, the Infinispan pool would run out of 
> threads.
> 
> Coming back to the specific problem Radim ran into: the forwarding of a 
> PUT doesn't hold any locks, so your argument below wouldn't hold. 
> However, of course this is only one specific scenario, and you're 
> probably right that we'd have to consider the more general case of a 
> thread holding locks...
> 
> All said, I believe it would still be worthwhile looking into a more 
> non-blocking way of invoking RPCs, that doesn't occupy threads which 
> essentially only wait on IO (network traffic)... A simple state machine 
> approach could be the solution to this...
> 
> 
> On 2/1/13 10:54 AM, Dan Berindei wrote:
>> Yeah, I wouldn't call this a "simple" solution...
>> 
>> The distribution/replication interceptors are quite high in the 
>> interceptor stack, so we'd have to save the state of the interceptor 
>> stack (basically the thread's stack) somehow and resume processing it 
>> on the thread receiving the responses. In a language that supports 
>> continuations that would be a piece of cake, but since we're in Java 
>> we'd have to completely change the way the interceptor stack works.
>> 
>> Actually we do hold the lock on modified keys while the command is 
>> replicated to the other owners. But think locking wouldn't be a 
>> problem: we already allow locks to be owned by transactions instead of 
>> threads, so it would just be a matter of creating a "lite transaction" 
>> for non-transactional caches. Obviously the 
>> TransactionSynchronizerInterceptor would have to go, but I see that as 
>> a positive thing ;)
>> 
>> So yeah, it could work, but it would take a huge amount of effort and 
>> it's going to obfuscate the code. Plus, I'm not at all convinced that 
>> it's going to improve performance that much compared to a new thread pool.
>> 
>> Cheers
>> Dan
>> 
>> 
>> On Fri, Feb 1, 2013 at 10:59 AM, Radim Vansa <rvansa at redhat.com 
>> <mailto:rvansa at redhat.com>> wrote:
>> 
>>    Yeah, that would work if it is possible to break execution path
>>    into the FutureListener from the middle of interceptor stack - I
>>    am really not sure about that but as in current design no locks
>>    should be held when a RPC is called, it may be possible.
>> 
>>    Let's see what someone more informed (Dan?) would think about that.
>> 
>>    Thanks, Bela
>> 
>>    Radim
>> 
>>    ----- Original Message -----
>>    | From: "Bela Ban" <bban at redhat.com <mailto:bban at redhat.com>>
>>    | To: infinispan-dev at lists.jboss.org
>>    <mailto:infinispan-dev at lists.jboss.org>
>>    | Sent: Friday, February 1, 2013 9:39:43 AM
>>    | Subject: Re: [infinispan-dev] Threadpools in a large cluster
>>    |
>>    | It looks like the core problem is an incoming RPC-1 which triggers
>>    | another blocking RPC-2: the thread delivering RPC-1 is blocked
>>    | waiting
>>    | for the response from RPC-2, and can therefore not be used to serve
>>    | other requests for the duration of RPC-2. If RPC-2 takes a while,
>>    | e.g.
>>    | waiting to acquire a lock in the remote node, then it is clear that
>>    | the
>>    | thread pool will quickly exceed its max size.
>>    |
>>    | A simple solution would be to prevent invoking blocking RPCs *from
>>    | within* a received RPC. Let's take a look at an example:
>>    | - A invokes a blocking PUT-1 on B
>>    | - B forwards the request as blocking PUT-2 to C and D
>>    | - When PUT-2 returns and B gets the responses from C and D (or the
>>    | first
>>    | one to respond, don't know exactly how this is implemented), it
>>    sends
>>    | the response back to A (PUT-1 terminates now at A)
>>    |
>>    | We could change this to the following:
>>    | - A invokes a blocking PUT-1 on B
>>    | - B receives PUT-1. Instead of invoking a blocking PUT-2 on C and D,
>>    | it
>>    | does the following:
>>    |       - B invokes PUT-2 and gets a future
>>    |       - B adds itself as a FutureListener, and it also stores the
>>    | address of the original sender (A)
>>    |       - When the FutureListener is invoked, B sends back the result
>>    |       as a
>>    | response to A
>>    | - Whenever a member leaves the cluster, the corresponding
>>    futures are
>>    | cancelled and removed from the hashmaps
>>    |
>>    | This could probably be done differently (e.g. by sending
>>    asynchronous
>>    | messages and implementing a finite state machine), but the core of
>>    | the
>>    | solution is the same; namely to avoid having an incoming thread
>>    block
>>    | on
>>    | a sync RPC.
>>    |
>>    | Thoughts ?
>>    |
>>    |
>>    |
>>    |
>>    | On 2/1/13 9:04 AM, Radim Vansa wrote:
>>    | > Hi guys,
>>    | >
>>    | > after dealing with the large cluster for a while I find the
>>    way how
>>    | > we use OOB threads in synchronous configuration non-robust.
>>    | > Imagine a situation where node which is not an owner of the key
>>    | > calls PUT. Then the a RPC is called to the primary owner of that
>>    | > key, which reroutes the request to all other owners and after
>>    | > these reply, it replies back.
>>    | > There are two problems:
>>    | > 1) If we do simultanously X requests from non-owners to the
>>    primary
>>    | > owner where X is OOB TP size, all the OOB threads are waiting for
>>    | > the responses and there is no thread to process the OOB response
>>    | > and release the thread.
>>    | > 2) Node A is primary owner of keyA, non-primary owner of keyB
>>    and B
>>    | > is primary of keyB and non-primary of keyA. We got many requests
>>    | > for both keyA and keyB from other nodes, therefore, all OOB
>>    | > threads from both nodes call RPC to the non-primary owner but
>>    | > there's noone who could process the request.
>>    | >
>>    | > While we wait for the requests to timeout, the nodes with depleted
>>    | > OOB threadpools start suspecting all other nodes because they
>>    | > can't receive heartbeats etc...
>>    | >
>>    | > You can say "increase your OOB tp size", but that's not always an
>>    | > option, I have currently set it to 1000 threads and it's not
>>    | > enough. In the end, I will be always limited by RAM and something
>>    | > tells me that even nodes with few gigs of RAM should be able to
>>    | > form a huge cluster. We use 160 hotrod worker threads in JDG, that
>>    | > means that 160 * clusterSize = 10240 (64 nodes in my cluster)
>>    | > parallel requests can be executed, and if 10% targets the same
>>    | > node with 1000 OOB threads, it stucks. It's about scaling and
>>    | > robustness.
>>    | >
>>    | > Not that I'd have any good solution, but I'd really like to
>>    start a
>>    | > discussion.
>>    | > Thinking about it a bit, the problem is that blocking call
>>    (calling
>>    | > RPC on primary owner from message handler) can block non-blocking
>>    | > calls (such as RPC response or command that never sends any more
>>    | > messages). Therefore, having a flag on message "this won't send
>>    | > another message" could let the message be executed in different
>>    | > threadpool, which will be never deadlocked. In fact, the pools
>>    | > could share the threads but the non-blocking would have always a
>>    | > few threads spare.
>>    | > It's a bad solution as maintaining which message could block
>>    in the
>>    | > other node is really, really hard (we can be sure only in case of
>>    | > RPC responses), especially when some locks come. I will welcome
>>    | > anything better.
>>    |
>>    | --
>>    | Bela Ban, JGroups lead (http://www.jgroups.org)
>>    |
>>    | _______________________________________________
>>    | infinispan-dev mailing list
>>    | infinispan-dev at lists.jboss.org
>>    <mailto:infinispan-dev at lists.jboss.org>
>>    | https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>    |
>>    _______________________________________________
>>    infinispan-dev mailing list
>>    infinispan-dev at lists.jboss.org <mailto:infinispan-dev at lists.jboss.org>
>>    https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> -- 
> Bela Ban, JGroups lead (http://www.jgroups.org)
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
manik at jboss.org
twitter.com/maniksurtani

Platform Architect, JBoss Data Grid
http://red.ht/data-grid