[infinispan-dev] DIST-SYNC, put(), a problem and a solution
Radim Vansa
rvansa at redhat.com
Wed Jul 30 08:47:15 EDT 2014
On 07/30/2014 01:59 PM, Dan Berindei wrote:
>
>
>
> On Wed, Jul 30, 2014 at 12:22 PM, Radim Vansa <rvansa at redhat.com
> <mailto:rvansa at redhat.com>> wrote:
>
>
>> Investigation:
>> ------------
>> When I looked at UNICAST3, I saw a lot of missing messages on the
>> receive side and unacked messages on the send side. This
>> caused me to
>> look into the (mainly OOB) thread pools and - voila - maxed out !
>>
>> I learned from Pedro that the Infinispan internal thread pool
>> (with a
>> default of 32 threads) can be configured, so I increased it
>> to 300 and
>> increased the OOB pools as well.
>>
>> This mitigated the problem somewhat, but when I increased the
>> requester
>> threads to 100, I had the same problem again. Apparently, the
>> Infinispan
>> internal thread pool uses a rejection policy of "run" and
>> thus uses the
>> JGroups (OOB) thread when exhausted.
>>
>>
>> We can't use another rejection policy in the remote executor
>> because the message won't be re-delivered by JGroups, and we
>> can't use a queue either.
>
> Can't we just send response "Node is busy" and cancel the
> operation? (at least in cases where this is possible - we can't do
> that safely for CommitCommand, but usually it could be doable,
> right?) And what's the problem with queues, besides that these can
> grow out of memory?
>
>
> No commit commands here, the cache is not transactional :)
Sure, but any change to OOB -> remote thread pool would likely affect
both non-tx and tx.
>
> If the remote thread pool gets full on a backup node, there is no way
> to safely cancel the operation - other backup owners may have already
> applied the write. And even with numOwners=2, there are multiple
> backup owners during state transfer.
I was thinking about delaying the write until backup responds, but
you're right, with 2 and more backups the situation is not that easy.
>
> We do throw an OutdatedTopologyException on the backups and retry the
> operation when the topology changes, we could do something similar
> when the remote executor thread pool is full. But 1) we have trouble
> preserving consistency when we retry, so we'd rather do it only when
> we really have to, and 2) repeated retries can be costly, as the
> primary needs to re-acquire the lock.
>
> The problem with queues is that commands are executed in the order
> they are in the queue. If a node has a remote executor thread pool of
> 100 threads and receives a prepare(tx1, put(k, v1) comand, then 1000
> prepare(tx_i, put(k, v_i)) commands, and finally a commit(tx1)
> command, the commit(tx1) command will block until all but 99 of the
> the prepare(tx_i, put(k, v_i)) commands have timed out.
Makes sense
>
> I have some thoughts on improving that independently of Pedro's work
> on locking [1], and I've just written that up as ISPN-4585 [2]
>
> [1] https://issues.jboss.org/browse/ISPN-2849
> [2] https://issues.jboss.org/browse/ISPN-4585
>
ISPN-2849 sounds a lot like the state machine-based interceptor stack, I
am looking forward to that! (although it's the music of far future -
ISPN 9, 10?)
Thanks for those answers, Dan. I should realize most of that myself, but
I don't have the capacity to hold all the wisdom about NBST algorithms
online in my brain :) I hope some day I could catch a student looking
for diploma thesis willing to model at least the basic Infinispan
algorithms and formally verify that it's (in)correct ;-).
Radim
>
>
> Radim
>
> --
> Radim Vansa<rvansa at redhat.com> <mailto:rvansa at redhat.com>
> JBoss DataGrid QA
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org <mailto:infinispan-dev at lists.jboss.org>
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Radim Vansa <rvansa at redhat.com>
JBoss DataGrid QA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140730/9046bf28/attachment.html
More information about the infinispan-dev
mailing list