----- Original Message -----
| From: "Dan Berindei" <dan.berindei(a)gmail.com>
| To: "infinispan -Dev List" <infinispan-dev(a)lists.jboss.org>
| Sent: Thursday, March 14, 2013 10:03:10 AM
| Subject: Re: [infinispan-dev] DefaultExecutorFactory and rejection policy
|
| On Thu, Mar 14, 2013 at 9:32 AM, Radim Vansa < rvansa(a)redhat.com >
| wrote:
|
|
| | Blocking OOB threads is the thing we want to avoid, remember?
|
| Well, you have to block somewhere...
|
| I like Adrian's solution, because it's a lot better than
| CallerRunsPolicy: it's blocking the OOB thread until any other
| command finishes executing, not until one particular command
| finishes executing.
I don't like caller-runs policy either. OOB threads shouldn't be waiting for
anything and executing a command within OOB thread could cause that. In our problem, it
would only increase the OOB threadpool size by the ispn thread pool size and cause some
overhead to it. We should always have some OOB threads able to process the responses.
|
| | ...
|
| I don't think you can throttle on the sender, because you don't know
| how many threads the recipient should allocate per sender in the
| Infinispan thread pool.
You don't need to know how many threads exactly are executing on the receiver, because
you would have an unbounded queue there which will, sooner or later, process the messages.
The semaphore is there because of throttling so that you never overwhelm the recipient
with requests.
I don't insist on semaphore-like synchronization, but the receiver should provide some
feedback that it's not able to process more messages and the sender should be
responsible for abiding it. AND, the feedback should be provided in a way that the node is
still able to process other messages. If it is a jammed signal message broadcast after the
queue length grows beyond some limit, and then all messages on senders should be
postponed, or divine intervention, it's just a performance issue. But if there a
situation where the node is not able to process any more messages (on JGroups level), the
OOB issue won't be solved, because there may be a reply other message is waiting for
that is never processed.
|
| E.g. in non-tx concurrent mode, for each user-initiated request, the
| primary owner sends out numOwners requests to the other nodes. If
| the cache is also replicated, you have the coordinator sending
| clusterSize requests to the other nodes for each user-initiated
| request. (So for each request that a node receives from the
| coordinator, it will receive exactly 0 requests from the other
| nodes.)
|
|
| If you start considering that each cache should have its own
| semaphore, because they need different numbers of threads, then
| finding an appropriate initial value is pretty much impossible.
|
Sorry, I miss the point here. It's not necessary to be exactly able to tell how many
messages may go from one node to another, you can just use a common sense to limit the
maximum amount of request that should be processed concurrently between the two nodes
until we say "Hey, give the poor node a break, it's not responding because
it's busy and another message would not help right now".
Radim