[infinispan-dev] Threadpools in a large cluster

Thu Feb 7 16:44:46 EST 2013

On Thu, Feb 7, 2013 at 8:05 PM, Mircea Markus <mmarkus at redhat.com> wrote:

>
> On 1 Feb 2013, at 09:54, Dan Berindei wrote:
>
> Yeah, I wouldn't call this a "simple" solution...
>
> The distribution/replication interceptors are quite high in the
> interceptor stack, so we'd have to save the state of the interceptor stack
> (basically the thread's stack) somehow and resume processing it on the
> thread receiving the responses. In a language that supports continuations
> that would be a piece of cake, but since we're in Java we'd have to
> completely change the way the interceptor stack works.
>
> Actually we do hold the lock on modified keys while the command is
> replicated to the other owners. But think locking wouldn't be a problem: we
> already allow locks to be owned by transactions instead of threads, so it
> would just be a matter of creating a "lite transaction" for
> non-transactional caches. Obviously the TransactionSynchronizerInterceptor
> would have to go, but I see that as a positive thing ;)
>
> The TransactionSynchronizerInterceptor protected the CacheTransaction
> objects from multiple writes, we'd still need that because of the NBST
> forwarding.
>

We wouldn't need it if access to the Collection members in CacheTransaction
was properly synchronized. Perhaps hack is too strong a word, let's just
say I'm seeing TransactionSynchronizerInterceptor as a temporary solution :)

> So yeah, it could work, but it would take a huge amount of effort and it's
> going to obfuscate the code. Plus, I'm not at all convinced that it's going
> to improve performance that much compared to a new thread pool.
>
> +1
>
>
> Cheers
> Dan
>
>
> On Fri, Feb 1, 2013 at 10:59 AM, Radim Vansa <rvansa at redhat.com> wrote:
>
>> Yeah, that would work if it is possible to break execution path into the
>> FutureListener from the middle of interceptor stack - I am really not sure
>> about that but as in current design no locks should be held when a RPC is
>> called, it may be possible.
>>
>> Let's see what someone more informed (Dan?) would think about that.
>>
>> Thanks, Bela
>>
>> Radim
>>
>> ----- Original Message -----
>> | From: "Bela Ban" <bban at redhat.com>
>> | To: infinispan-dev at lists.jboss.org
>> | Sent: Friday, February 1, 2013 9:39:43 AM
>> | Subject: Re: [infinispan-dev] Threadpools in a large cluster
>> |
>> | It looks like the core problem is an incoming RPC-1 which triggers
>> | another blocking RPC-2: the thread delivering RPC-1 is blocked
>> | waiting
>> | for the response from RPC-2, and can therefore not be used to serve
>> | other requests for the duration of RPC-2. If RPC-2 takes a while,
>> | e.g.
>> | waiting to acquire a lock in the remote node, then it is clear that
>> | the
>> | thread pool will quickly exceed its max size.
>> |
>> | A simple solution would be to prevent invoking blocking RPCs *from
>> | within* a received RPC. Let's take a look at an example:
>> | - A invokes a blocking PUT-1 on B
>> | - B forwards the request as blocking PUT-2 to C and D
>> | - When PUT-2 returns and B gets the responses from C and D (or the
>> | first
>> | one to respond, don't know exactly how this is implemented), it sends
>> | the response back to A (PUT-1 terminates now at A)
>> |
>> | We could change this to the following:
>> | - A invokes a blocking PUT-1 on B
>> | - B receives PUT-1. Instead of invoking a blocking PUT-2 on C and D,
>> | it
>> | does the following:
>> |       - B invokes PUT-2 and gets a future
>> |       - B adds itself as a FutureListener, and it also stores the
>> | address of the original sender (A)
>> |       - When the FutureListener is invoked, B sends back the result
>> |       as a
>> | response to A
>> | - Whenever a member leaves the cluster, the corresponding futures are
>> | cancelled and removed from the hashmaps
>> |
>> | This could probably be done differently (e.g. by sending asynchronous
>> | messages and implementing a finite state machine), but the core of
>> | the
>> | solution is the same; namely to avoid having an incoming thread block
>> | on
>> | a sync RPC.
>> |
>> | Thoughts ?
>> |
>> |
>> |
>> |
>> | On 2/1/13 9:04 AM, Radim Vansa wrote:
>> | > Hi guys,
>> | >
>> | > after dealing with the large cluster for a while I find the way how
>> | > we use OOB threads in synchronous configuration non-robust.
>> | > Imagine a situation where node which is not an owner of the key
>> | > calls PUT. Then the a RPC is called to the primary owner of that
>> | > key, which reroutes the request to all other owners and after
>> | > these reply, it replies back.
>> | > There are two problems:
>> | > 1) If we do simultanously X requests from non-owners to the primary
>> | > owner where X is OOB TP size, all the OOB threads are waiting for
>> | > the responses and there is no thread to process the OOB response
>> | > and release the thread.
>> | > 2) Node A is primary owner of keyA, non-primary owner of keyB and B
>> | > is primary of keyB and non-primary of keyA. We got many requests
>> | > for both keyA and keyB from other nodes, therefore, all OOB
>> | > threads from both nodes call RPC to the non-primary owner but
>> | > there's noone who could process the request.
>> | >
>> | > While we wait for the requests to timeout, the nodes with depleted
>> | > OOB threadpools start suspecting all other nodes because they
>> | > can't receive heartbeats etc...
>> | >
>> | > You can say "increase your OOB tp size", but that's not always an
>> | > option, I have currently set it to 1000 threads and it's not
>> | > enough. In the end, I will be always limited by RAM and something
>> | > tells me that even nodes with few gigs of RAM should be able to
>> | > form a huge cluster. We use 160 hotrod worker threads in JDG, that
>> | > means that 160 * clusterSize = 10240 (64 nodes in my cluster)
>> | > parallel requests can be executed, and if 10% targets the same
>> | > node with 1000 OOB threads, it stucks. It's about scaling and
>> | > robustness.
>> | >
>> | > Not that I'd have any good solution, but I'd really like to start a
>> | > discussion.
>> | > Thinking about it a bit, the problem is that blocking call (calling
>> | > RPC on primary owner from message handler) can block non-blocking
>> | > calls (such as RPC response or command that never sends any more
>> | > messages). Therefore, having a flag on message "this won't send
>> | > another message" could let the message be executed in different
>> | > threadpool, which will be never deadlocked. In fact, the pools
>> | > could share the threads but the non-blocking would have always a
>> | > few threads spare.
>> | > It's a bad solution as maintaining which message could block in the
>> | > other node is really, really hard (we can be sure only in case of
>> | > RPC responses), especially when some locks come. I will welcome
>> | > anything better.
>> |
>> | --
>> | Bela Ban, JGroups lead (http://www.jgroups.org)
>> |
>> | _______________________________________________
>> | infinispan-dev mailing list
>> | infinispan-dev at lists.jboss.org
>> | https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> |
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>   Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20130207/aad86077/attachment.html