[infinispan-dev] Remote command smarter dispatcher (merge ISPN-2808 and ISPN-2849)

Wed Mar 20 14:18:40 EDT 2013

On Wed, Mar 20, 2013 at 12:59 PM, Pedro Ruivo <pedro at infinispan.org> wrote:

>
>
> On 03/20/2013 07:53 AM, Dan Berindei wrote:
> >
> > On Tue, Mar 19, 2013 at 11:55 PM, Pedro Ruivo <pedro at infinispan.org
> > <mailto:pedro at infinispan.org>> wrote:
> >
> >
> >
> >     On 03/19/2013 08:41 PM, Dan Berindei wrote:
> >      >
> >      > On Mon, Mar 18, 2013 at 6:09 PM, Pedro Ruivo
> >     <pedro at infinispan.org <mailto:pedro at infinispan.org>
> >      > <mailto:pedro at infinispan.org <mailto:pedro at infinispan.org>>>
> wrote:
> >      >
> >      >     Hi all,
> >      >
> >      >     To solve ISPN-2808 (avoid blocking JGroups threads in order
> >     to allow to
> >      >     deliver the request responses), I've created another thread
> >     pool to move
> >      >     the possible blocking commands (i.e. the commands that may
> >     block until
> >      >     some state is achieved).
> >      >
> >      >     Problem description:
> >      >
> >      >     With this solution, the new thread pool should be large in
> >     order to be
> >      >     able to handle the remote commands without deadlocks. The
> >     problem is
> >      >     that all the threads can be block to process the command that
> may
> >      >     unblock other commands.
> >      >
> >      >     Example: a bunch of commands are blocked waiting for a new
> >     topology ID
> >      >     and the command that will increment the topology ID is in the
> >     thread
> >      >     pool queue.
> >      >
> >      >     Solution:
> >      >
> >      >     Use a smart command dispatcher, i.e., keep the command in the
> >     queue
> >      >     until we are sure that it will not wait for other commands.
> >     I've already
> >      >     implemented some kind of executor service
> >     (ConditionalExecutorService,
> >      >     in ISPN-2635 and ISPN-2636 branches, Total Order stuff) that
> >     only put
> >      >     the Runnable (more precisely a new interface called
> >     ConditionalRunnable)
> >      >     in the thread pool when it is ready to be processed. Creative
> >     guys, it
> >      >     may need a better name :)
> >      >
> >      >     The ConditionalRunnable has a new method (boolean isReady())
> >     that should
> >      >     return true when the runnable should not block.
> >      >
> >      >     Example how to apply this to ISPN-2808:
> >      >
> >      >     Most of the commands awaits for a particular topology ID
> >     and/or for lock
> >      >     acquisition.
> >      >
> >      >
> >      > Well, the original problem description was about the
> >      > DistributionInterceptor forwarding the command from the primary
> >     owner to
> >      > the backup owners and waiting for a response from them :)
> >      > The forwarding done by StateTransferInterceptor is also
> >     synchronous and
> >      > can block.
> >      >
> >      > It's true that you can't check how long a remote call will take
> >      > beforehand...
> >      >
> >      >     In this way, the isReady() implementation can be something
> >      >     like:
> >      >
> >      >     isReady()
> >      >        return commandTopologyId <= currentTopologyId && (for all
> >     keys; do if
> >      >     !lock(key).tryLock(); return false; done)
> >      >
> >      >
> >      > Shouldn't you release the locks you managed to lock already if
> one of
> >      > the lock acquisitions failed?
> >
> >     you are right. I have to release the locks for the pessimist mode. In
> >     optimistic mode, the locks are only released with the rollback
> command.
> >      >
> >      > Actually you may want to release the locks even if lock
> >     acquisition did
> >      > succeed... In non-tx mode, the locks are owned by the current
> >     thread, so
> >      > you can't lock a key on one thread and unlock it on another
> >     (though you
> >      > could skip the check completely in non-tx mode). And in
> transactional
> >      > mode, you could have a deadlock because 2 txs lock the same keys
> in a
> >      > different order.
> >     Non-tx caches cannot use this optimization. I've seen that problem
> early
> >     today when I start debugging it.
> >      >
> >      > Which leads me to a different point, how would you handle
> deadlocks?
> >      > With pessimistic mode, if tx1 holds lock k1 and wants to acquire
> >     k2, but
> >      > tx2 holds k2 and wants to acquire k1, will the LockCommands
> >     tx1:k2 and
> >      > tx2:k1 ever be scheduled? In general, can we make the time that a
> >      > Lock/PrepareCommand spends in the ConditionalExecutorService
> >     queue count
> >      > against lockAcquisitionTimeout?
> >
> >     If I got a DeadLockException, I will send it back immediately and
> >     release all the locks.
> >
> >
> > Hmmm, the default LockManager doesn't throw DeadlockDetectedExceptions,
> > you have to enable deadlockDetection explicitly in the configuration.
> > I'm guessing that's for performance reasons...
> >
> > It's true that you would need a new LockManager.tryLock method anyway,
> > so you could implement deadlock detection in LockManagerImpl.tryLock.
> > But probably the same performance considerations would apply.
>
> oh... I though you was talking about the DeadlockDetectingLockManager.
> In the normal, I am not able to detect if we have a deadlock or not.
>
>
The issue I'm seeing is that without deadlock detection the commands that
would normally time out are never executed on the thread pool, so after
running for a while you can end up with a lot "expired" commands in the
queue. The sender will give up after the replication timeout expires, but
the recipient will keep on checking if the commands can acquire their locks.

Actually the sender should roll back the transaction when it gets a RPC
timeout, so the commands should be able to run at some point. It's just
that you'll be ignoring lockAcquisitionTimeout, which is usually smaller
than replTimeout. I'm not sure how big an issue this would be... Mircea,
Manik?

> Actually, I think I can avoid the deadlocks if I use what you suggest.
> (and here I'm talking about tx caches).
>
> when isReady() fails to acquire a lock I should release the locks
> acquired. this way, the only reason why the isReady can fail to acquire
> a lock is if the lock is acquired by a running command (i.e. a command
> that is in the thread pool).
>
> Do you think this may work?
>
>
If you release the locks then you allow another lock/prepare command to
execute before you and acquire those locks, so you will now block in the
LockingInterceptor to acquire the locks again - just what you were trying
to avoid. Under high load I think these would be pretty common, so you I
don't think you'd gain anything over executing the commands without any
preprocessing.

The deadlock can be distributed (k1 is on node A and k2 is on node B, and
tx1 and tx2 try to acquire both in a different order). In a pessimistic
cache, I don't think A will ever see the lock command for k2, and B won't
see the lock command for k1, so deadlock detection won't work. (Mircea, can
you confirm?)

You can also have a deadlock without having any properly locked keys, if
the locks were acquired on another node and transferred to the current node
via state transfer (speaking of which, you will have to duplicate
waitForTransactionsToComplete as well in your preProcess method).

>
> >
> >     The lockAcquisitionTimeout is a problem that I haven't solved yet.
> >
>

I would definitely like to have this, I have a feeling more users are using
a low lockAcquisitionTimeout to handle deadlocks than the
DeadlockDetectingLockManager.

> >      >
> >      >     With this, I believe we can keep the number of thread low and
> >     avoid the
> >      >     thread deadlocks.
> >      >
> >      >     Now, I have two possible implementations:
> >      >
> >      >     1) put a reference for StateTransferManager and/or
> >     LockManager in the
> >      >     commands, and invoke the methods directly (a little dirty)
> >      >
> >      >     2) added new method in the CommandInterceptor like: boolean
> >      >     preProcess<command>(Command, InvocationContext). each
> >     interceptor will
> >      >     check if the command will block on it (returning false) or
> >     not (invoke
> >      >     the next interceptor). For example, the
> >     StateTransferInterceptor returns
> >      >     immediately false if the commandToplogyId is higher than the
> >      >     currentTopologyId and the *LockingIntercerptor will return
> >     false if it
> >      >     cannot acquire some lock.
> >      >
> >      >     Any other suggestions? If I was not clear let me know.
> >      >
> >      >
> >      > TBH I would only check for the topology id before scheduling the
> >      > commands to the Infinispan thread pool, because checking the
> locks is
> >      > very complicated with all the configuration options that we have.
> >      >
> >      > This is how I was imagining solving the lock problem: The OOB
> threads
> >      > would execute directly the commands that do not acquire locks
> >      > (CommitCommand, TxCompletionNotificationCommand) directly and
> >     submit the
> >      > others to the ISPN thread pool; if the command's topology id was
> >     higher
> >      > than the current topology id, the OOB thread would just stick it
> in a
> >      > queue. A separate thread would read only the commands with the
> >     current
> >      > topology id from the queue, and process them just like an OOB
> thread.
> >      >
> >      > However... the CommitCommands may very well have a *lower*
> >     topology id
> >      > by the time they are executed, so the OOB/executor thread may
> >     well block
> >      > while waiting for the results of the forwarding RPC. Maybe we
> >     could work
> >      > around it by having StateTransferInterceptor submit a task with
> the
> >      > command forwarding only to the thread pool, but again it gets
> quite
> >      > complicated. So for the first phase I recommend handling only the
> >      > topology id problem.
> >     I have a similar implementation, but in your example, the
> CommitCommand
> >     will be scheduler to the thread pool only if the command topology ID
> is
> >     lower or equals than the current topology ID (like all topology
> affected
> >     commands)
> >
> >     Currently, I only have 3 tests failing in
> >     org.infinispan.replication.FlagsReplicationTest (2 assertions and 1
> >     timeout) that I'm checking now...
> >      >
> >
> >
> > I guess it's time to look at your code :)
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20130320/06ac2e1d/attachment.html