A new thread pool owned by Infinispan is certainly something desirable,
as discussed in Palma, but I think it wouldn't solve the issue Radim ran
into, namely threads being used despite the fact that they only wait for
another blocking RPC to finish.
If we made the JGroups thread return immediately by transferring control
to an Infinispan thread, then we'd simply move the issue from the former
to the latter pool. Eventually, the Infinispan pool would run out of
threads.
Coming back to the specific problem Radim ran into: the forwarding of a
PUT doesn't hold any locks, so your argument below wouldn't hold.
However, of course this is only one specific scenario, and you're
probably right that we'd have to consider the more general case of a
thread holding locks...
All said, I believe it would still be worthwhile looking into a more
non-blocking way of invoking RPCs, that doesn't occupy threads which
essentially only wait on IO (network traffic)... A simple state machine
approach could be the solution to this...
On 2/1/13 10:54 AM, Dan Berindei wrote:
Yeah, I wouldn't call this a "simple" solution...
The distribution/replication interceptors are quite high in the
interceptor stack, so we'd have to save the state of the interceptor
stack (basically the thread's stack) somehow and resume processing it
on the thread receiving the responses. In a language that supports
continuations that would be a piece of cake, but since we're in Java
we'd have to completely change the way the interceptor stack works.
Actually we do hold the lock on modified keys while the command is
replicated to the other owners. But think locking wouldn't be a
problem: we already allow locks to be owned by transactions instead of
threads, so it would just be a matter of creating a "lite transaction"
for non-transactional caches. Obviously the
TransactionSynchronizerInterceptor would have to go, but I see that as
a positive thing ;)
So yeah, it could work, but it would take a huge amount of effort and
it's going to obfuscate the code. Plus, I'm not at all convinced that
it's going to improve performance that much compared to a new thread pool.
Cheers
Dan
On Fri, Feb 1, 2013 at 10:59 AM, Radim Vansa <rvansa(a)redhat.com
<mailto:rvansa@redhat.com>> wrote:
Yeah, that would work if it is possible to break execution path
into the FutureListener from the middle of interceptor stack - I
am really not sure about that but as in current design no locks
should be held when a RPC is called, it may be possible.
Let's see what someone more informed (Dan?) would think about that.
Thanks, Bela
Radim
----- Original Message -----
| From: "Bela Ban" <bban(a)redhat.com <mailto:bban@redhat.com>>
| To: infinispan-dev(a)lists.jboss.org
<mailto:infinispan-dev@lists.jboss.org>
| Sent: Friday, February 1, 2013 9:39:43 AM
| Subject: Re: [infinispan-dev] Threadpools in a large cluster
|
| It looks like the core problem is an incoming RPC-1 which triggers
| another blocking RPC-2: the thread delivering RPC-1 is blocked
| waiting
| for the response from RPC-2, and can therefore not be used to serve
| other requests for the duration of RPC-2. If RPC-2 takes a while,
| e.g.
| waiting to acquire a lock in the remote node, then it is clear that
| the
| thread pool will quickly exceed its max size.
|
| A simple solution would be to prevent invoking blocking RPCs *from
| within* a received RPC. Let's take a look at an example:
| - A invokes a blocking PUT-1 on B
| - B forwards the request as blocking PUT-2 to C and D
| - When PUT-2 returns and B gets the responses from C and D (or the
| first
| one to respond, don't know exactly how this is implemented), it
sends
| the response back to A (PUT-1 terminates now at A)
|
| We could change this to the following:
| - A invokes a blocking PUT-1 on B
| - B receives PUT-1. Instead of invoking a blocking PUT-2 on C and D,
| it
| does the following:
| - B invokes PUT-2 and gets a future
| - B adds itself as a FutureListener, and it also stores the
| address of the original sender (A)
| - When the FutureListener is invoked, B sends back the result
| as a
| response to A
| - Whenever a member leaves the cluster, the corresponding
futures are
| cancelled and removed from the hashmaps
|
| This could probably be done differently (e.g. by sending
asynchronous
| messages and implementing a finite state machine), but the core of
| the
| solution is the same; namely to avoid having an incoming thread
block
| on
| a sync RPC.
|
| Thoughts ?
|
|
|
|
| On 2/1/13 9:04 AM, Radim Vansa wrote:
| > Hi guys,
| >
| > after dealing with the large cluster for a while I find the
way how
| > we use OOB threads in synchronous configuration non-robust.
| > Imagine a situation where node which is not an owner of the key
| > calls PUT. Then the a RPC is called to the primary owner of that
| > key, which reroutes the request to all other owners and after
| > these reply, it replies back.
| > There are two problems:
| > 1) If we do simultanously X requests from non-owners to the
primary
| > owner where X is OOB TP size, all the OOB threads are waiting for
| > the responses and there is no thread to process the OOB response
| > and release the thread.
| > 2) Node A is primary owner of keyA, non-primary owner of keyB
and B
| > is primary of keyB and non-primary of keyA. We got many requests
| > for both keyA and keyB from other nodes, therefore, all OOB
| > threads from both nodes call RPC to the non-primary owner but
| > there's noone who could process the request.
| >
| > While we wait for the requests to timeout, the nodes with depleted
| > OOB threadpools start suspecting all other nodes because they
| > can't receive heartbeats etc...
| >
| > You can say "increase your OOB tp size", but that's not always
an
| > option, I have currently set it to 1000 threads and it's not
| > enough. In the end, I will be always limited by RAM and something
| > tells me that even nodes with few gigs of RAM should be able to
| > form a huge cluster. We use 160 hotrod worker threads in JDG, that
| > means that 160 * clusterSize = 10240 (64 nodes in my cluster)
| > parallel requests can be executed, and if 10% targets the same
| > node with 1000 OOB threads, it stucks. It's about scaling and
| > robustness.
| >
| > Not that I'd have any good solution, but I'd really like to
start a
| > discussion.
| > Thinking about it a bit, the problem is that blocking call
(calling
| > RPC on primary owner from message handler) can block non-blocking
| > calls (such as RPC response or command that never sends any more
| > messages). Therefore, having a flag on message "this won't send
| > another message" could let the message be executed in different
| > threadpool, which will be never deadlocked. In fact, the pools
| > could share the threads but the non-blocking would have always a
| > few threads spare.
| > It's a bad solution as maintaining which message could block
in the
| > other node is really, really hard (we can be sure only in case of
| > RPC responses), especially when some locks come. I will welcome
| > anything better.
|
| --
| Bela Ban, JGroups lead (
http://www.jgroups.org)
|
| _______________________________________________
| infinispan-dev mailing list
| infinispan-dev(a)lists.jboss.org
<mailto:infinispan-dev@lists.jboss.org>
|
https://lists.jboss.org/mailman/listinfo/infinispan-dev
|
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org <mailto:infinispan-dev@lists.jboss.org>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev