<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 19, 2013 at 11:55 PM, Pedro Ruivo <span dir="ltr">&lt;<a href="mailto:pedro@infinispan.org" target="_blank">pedro@infinispan.org</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im"><br>

<br>

On 03/19/2013 08:41 PM, Dan Berindei wrote:<br>

&gt;<br>

&gt; On Mon, Mar 18, 2013 at 6:09 PM, Pedro Ruivo &lt;<a href="mailto:pedro@infinispan.org">pedro@infinispan.org</a><br>

</div><div><div class="h5">&gt; &lt;mailto:<a href="mailto:pedro@infinispan.org">pedro@infinispan.org</a>&gt;&gt; wrote:<br>

&gt;<br>

&gt;     Hi all,<br>

&gt;<br>

&gt;     To solve ISPN-2808 (avoid blocking JGroups threads in order to allow to<br>

&gt;     deliver the request responses), I&#39;ve created another thread pool to move<br>

&gt;     the possible blocking commands (i.e. the commands that may block until<br>

&gt;     some state is achieved).<br>

&gt;<br>

&gt;     Problem description:<br>

&gt;<br>

&gt;     With this solution, the new thread pool should be large in order to be<br>

&gt;     able to handle the remote commands without deadlocks. The problem is<br>

&gt;     that all the threads can be block to process the command that may<br>

&gt;     unblock other commands.<br>

&gt;<br>

&gt;     Example: a bunch of commands are blocked waiting for a new topology ID<br>

&gt;     and the command that will increment the topology ID is in the thread<br>

&gt;     pool queue.<br>

&gt;<br>

&gt;     Solution:<br>

&gt;<br>

&gt;     Use a smart command dispatcher, i.e., keep the command in the queue<br>

&gt;     until we are sure that it will not wait for other commands. I&#39;ve already<br>

&gt;     implemented some kind of executor service (ConditionalExecutorService,<br>

&gt;     in ISPN-2635 and ISPN-2636 branches, Total Order stuff) that only put<br>

&gt;     the Runnable (more precisely a new interface called ConditionalRunnable)<br>

&gt;     in the thread pool when it is ready to be processed. Creative guys, it<br>

&gt;     may need a better name :)<br>

&gt;<br>

&gt;     The ConditionalRunnable has a new method (boolean isReady()) that should<br>

&gt;     return true when the runnable should not block.<br>

&gt;<br>

&gt;     Example how to apply this to ISPN-2808:<br>

&gt;<br>

&gt;     Most of the commands awaits for a particular topology ID and/or for lock<br>

&gt;     acquisition.<br>

&gt;<br>

&gt;<br>

&gt; Well, the original problem description was about the<br>

&gt; DistributionInterceptor forwarding the command from the primary owner to<br>

&gt; the backup owners and waiting for a response from them :)<br>

&gt; The forwarding done by StateTransferInterceptor is also synchronous and<br>

&gt; can block.<br>

&gt;<br>

&gt; It&#39;s true that you can&#39;t check how long a remote call will take<br>

&gt; beforehand...<br>

&gt;<br>

&gt;     In this way, the isReady() implementation can be something<br>

&gt;     like:<br>

&gt;<br>

&gt;     isReady()<br>

&gt;        return commandTopologyId &lt;= currentTopologyId &amp;&amp; (for all keys; do if<br>

&gt;     !lock(key).tryLock(); return false; done)<br>

&gt;<br>

&gt;<br>

&gt; Shouldn&#39;t you release the locks you managed to lock already if one of<br>

&gt; the lock acquisitions failed?<br>

<br>

</div></div>you are right. I have to release the locks for the pessimist mode. In<br>

optimistic mode, the locks are only released with the rollback command.<br>

<div class="im">&gt;<br>

&gt; Actually you may want to release the locks even if lock acquisition did<br>

&gt; succeed... In non-tx mode, the locks are owned by the current thread, so<br>

&gt; you can&#39;t lock a key on one thread and unlock it on another (though you<br>

&gt; could skip the check completely in non-tx mode). And in transactional<br>

&gt; mode, you could have a deadlock because 2 txs lock the same keys in a<br>

&gt; different order.<br>

</div>Non-tx caches cannot use this optimization. I&#39;ve seen that problem early<br>

today when I start debugging it.<br>

<div class="im">&gt;<br>

&gt; Which leads me to a different point, how would you handle deadlocks?<br>

&gt; With pessimistic mode, if tx1 holds lock k1 and wants to acquire k2, but<br>

&gt; tx2 holds k2 and wants to acquire k1, will the LockCommands tx1:k2 and<br>

&gt; tx2:k1 ever be scheduled? In general, can we make the time that a<br>

&gt; Lock/PrepareCommand spends in the ConditionalExecutorService queue count<br>

&gt; against lockAcquisitionTimeout?<br>

<br>

</div>If I got a DeadLockException, I will send it back immediately and<br>

release all the locks.<br>

<br></blockquote><div><br></div><div>Hmmm, the default LockManager doesn&#39;t throw DeadlockDetectedExceptions, you have to enable deadlockDetection explicitly in the configuration. I&#39;m guessing that&#39;s for performance reasons...<br>


<br></div><div>It&#39;s true that you would need a new LockManager.tryLock method anyway, so you could implement deadlock detection in LockManagerImpl.tryLock. But probably the same performance considerations would apply.<br>


<br></div> <br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

The lockAcquisitionTimeout is a problem that I haven&#39;t solved yet.<br>

<div><div class="h5"><br>

&gt;<br>

&gt;     With this, I believe we can keep the number of thread low and avoid the<br>

&gt;     thread deadlocks.<br>

&gt;<br>

&gt;     Now, I have two possible implementations:<br>

&gt;<br>

&gt;     1) put a reference for StateTransferManager and/or LockManager in the<br>

&gt;     commands, and invoke the methods directly (a little dirty)<br>

&gt;<br>

&gt;     2) added new method in the CommandInterceptor like: boolean<br>

&gt;     preProcess&lt;command&gt;(Command, InvocationContext). each interceptor will<br>

&gt;     check if the command will block on it (returning false) or not (invoke<br>

&gt;     the next interceptor). For example, the StateTransferInterceptor returns<br>

&gt;     immediately false if the commandToplogyId is higher than the<br>

&gt;     currentTopologyId and the *LockingIntercerptor will return false if it<br>

&gt;     cannot acquire some lock.<br>

&gt;<br>

&gt;     Any other suggestions? If I was not clear let me know.<br>

&gt;<br>

&gt;<br>

&gt; TBH I would only check for the topology id before scheduling the<br>

&gt; commands to the Infinispan thread pool, because checking the locks is<br>

&gt; very complicated with all the configuration options that we have.<br>

&gt;<br>

&gt; This is how I was imagining solving the lock problem: The OOB threads<br>

&gt; would execute directly the commands that do not acquire locks<br>

&gt; (CommitCommand, TxCompletionNotificationCommand) directly and submit the<br>

&gt; others to the ISPN thread pool; if the command&#39;s topology id was higher<br>

&gt; than the current topology id, the OOB thread would just stick it in a<br>

&gt; queue. A separate thread would read only the commands with the current<br>

&gt; topology id from the queue, and process them just like an OOB thread.<br>

&gt;<br>

&gt; However... the CommitCommands may very well have a *lower* topology id<br>

&gt; by the time they are executed, so the OOB/executor thread may well block<br>

&gt; while waiting for the results of the forwarding RPC. Maybe we could work<br>

&gt; around it by having StateTransferInterceptor submit a task with the<br>

&gt; command forwarding only to the thread pool, but again it gets quite<br>

&gt; complicated. So for the first phase I recommend handling only the<br>

&gt; topology id problem.<br>

</div></div>I have a similar implementation, but in your example, the CommitCommand<br>

will be scheduler to the thread pool only if the command topology ID is<br>

lower or equals than the current topology ID (like all topology affected<br>

commands)<br>

<br>

Currently, I only have 3 tests failing in<br>

org.infinispan.replication.FlagsReplicationTest (2 assertions and 1<br>

timeout) that I&#39;m checking now...<br>

<div class="HOEnZb"><div class="h5">&gt;<br></div></div></blockquote><div><br></div><div>I guess it&#39;s time to look at your code :)<br></div></div></div></div>