Re: [infinispan-dev] DIST-SYNC, put(), a problem and a solution

Wednesday, 30 July 2014

On 07/30/2014 01:59 PM, Dan Berindei wrote:
...

 On Wed, Jul 30, 2014 at 12:22 PM, Radim Vansa <rvansa(a)redhat.com 
 <mailto:rvansa@redhat.com>> wrote:

>         Investigation:
>         ------------
>         When I looked at UNICAST3, I saw a lot of missing messages on the
>         receive side and unacked messages on the send side. This
>         caused me to
>         look into the (mainly OOB) thread pools and - voila - maxed out !
>
>         I learned from Pedro that the Infinispan internal thread pool
>         (with a
>         default of 32 threads) can be configured, so I increased it
>         to 300 and
>         increased the OOB pools as well.
>
>         This mitigated the problem somewhat, but when I increased the
>         requester
>         threads to 100, I had the same problem again. Apparently, the
>         Infinispan
>         internal thread pool uses a rejection policy of "run" and
>         thus uses the
>         JGroups (OOB) thread when exhausted.
>
>
>     We can't use another rejection policy in the remote executor
>     because the message won't be re-delivered by JGroups, and we
>     can't use a queue either.

     Can't we just send response "Node is busy" and cancel the
     operation? (at least in cases where this is possible - we can't do
     that safely for CommitCommand, but usually it could be doable,
     right?) And what's the problem with queues, besides that these can
     grow out of memory?

 No commit commands here, the cache is not transactional :) 
Sure, but any change to OOB -> remote thread pool would likely affect 
both non-tx and tx.

...

 If the remote thread pool gets full on a backup node, there is no way 
 to safely cancel the operation - other backup owners may have already 
 applied the write. And even with numOwners=2, there are multiple 
 backup owners during state transfer. 
I was thinking about delaying the write until backup responds, but 
you're right, with 2 and more backups the situation is not that easy.

...

 We do throw an OutdatedTopologyException on the backups and retry the 
 operation when the topology changes, we could do something similar 
 when the remote executor thread pool is full. But 1) we have trouble 
 preserving consistency when we retry, so we'd rather do it only when 
 we really have to, and 2) repeated retries can be costly, as the 
 primary needs to re-acquire the lock.

 The problem with queues is that commands are executed in the order 
 they are in the queue. If a node has a remote executor thread pool of 
 100 threads and receives a prepare(tx1, put(k, v1) comand, then 1000 
 prepare(tx_i, put(k, v_i)) commands, and finally a commit(tx1) 
 command, the commit(tx1) command will block until all but 99 of the 
 the prepare(tx_i, put(k, v_i)) commands have timed out. 
Makes sense

...

 I have some thoughts on improving that independently of Pedro's work 
 on locking [1], and I've just written that up as ISPN-4585 [2]

 [1] https://issues.jboss.org/browse/ISPN-2849
 [2] https://issues.jboss.org/browse/ISPN-4585

ISPN-2849 sounds a lot like the state machine-based interceptor stack, I 
am looking forward to that! (although it's the music of far future - 
ISPN 9, 10?)

Thanks for those answers, Dan. I should realize most of that myself, but 
I don't have the capacity to hold all the wisdom about NBST algorithms 
online in my brain :) I hope some day I could catch a student looking 
for diploma thesis willing to model at least the basic Infinispan 
algorithms and formally verify that it's (in)correct ;-).

Radim

...

     Radim

     -- 
     Radim Vansa&lt;rvansa(a)redhat.com&gt;  <mailto:rvansa@redhat.com>
     JBoss DataGrid QA

     _______________________________________________
     infinispan-dev mailing list
     infinispan-dev(a)lists.jboss.org <mailto:infinispan-dev@lists.jboss.org>
     https://lists.jboss.org/mailman/listinfo/infinispan-dev

 _______________________________________________
 infinispan-dev mailing list
 infinispan-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev 

-- 
Radim Vansa <rvansa(a)redhat.com&gt;
JBoss DataGrid QA

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] DIST-SYNC, put(), a problem and a solution