[infinispan-issues] [JBoss JIRA] (ISPN-2240) Per-key lock container leads to superfluous TimeoutExceptions on concurrent access to same key

Wed Oct 24 12:22:01 EDT 2012

    [ https://issues.jboss.org/browse/ISPN-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728931#comment-12728931 ] 

Dan Berindei commented on ISPN-2240:
------------------------------------

Mircea, it doesn't seem to be the same as ISPN-2381 to me.

It looks like Robert is using a non-transactional cache, and NonTransactionalLockingInterceptor still acquires the lock on the originator and on all the owners - not just on the primary owner.

I'm pretty sure the fix in the description is not correct, because `l.hasQueuedThreads()` can return false even while another thread is busy working with the key.

The solution described in https://issues.jboss.org/browse/ISPN-2240?focusedCommentId=12715426&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12715426 seems reasonable, mimicking the 2-phase commit in transactional caches. Perhaps doing the RPCs in the DistributionInterceptor would be more reasonable though.

I'm not sure about the problem with L1 invalidations arriving later, I thought we explicitly do not send L1 invalidations to the origin of the PutKeyValueCommand, but it could be that that check is only used when the L1 invalidation threshold is non-zero (0 is the default). This probably warrants a separate issue.

> Per-key lock container leads to superfluous TimeoutExceptions on concurrent access to same key
> ----------------------------------------------------------------------------------------------
>
>                 Key: ISPN-2240
>                 URL: https://issues.jboss.org/browse/ISPN-2240
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Locking and Concurrency
>    Affects Versions: 5.1.6.FINAL, 5.1.x
>            Reporter: Robert Stupp
>            Assignee: Manik Surtani
>             Fix For: 5.2.0.Final
>
>         Attachments: ISPN-2240_fix_TimeoutExceptions.patch, somehow.zip
>
>
> Hi,
> I've encountered a lot of TimeoutExceptions just running a load test against an infinispan cluster.
> I tracked down the reason and found out, that the code in org.infinispan.util.concurrent.locks.containers.AbstractPerEntryLockContainer#releaseLock() causes these superfluous TimeoutExceptions.
> A small test case (which just prints out timeouts, too late timeouts and "paints" a lot of dots to the console - more dots/second on the console means better throughput ;-)
> In a short test I extended the class ReentrantPerEntryLockContainer and changed the implementation of releaseLock() as follows:
> {noformat}
>     public void releaseLock(Object lockOwner, Object key) {
>         ReentrantLock l = locks.get(key);
>         if (l != null) {
>             if (!l.isHeldByCurrentThread())
>                 throw new IllegalStateException("Lock for [" + key + "] not held by current thread " + Thread.currentThread());
>             while (l.isHeldByCurrentThread())
>                 unlock(l, lockOwner);
>             if (!l.hasQueuedThreads())
>                 locks.remove(key);
>         }
>         else
>             throw new IllegalStateException("No lock for [" + key + ']');
>     }
> {noformat}
> The main improvement is that locks are not removed from the concurrent map as long as other threads are waiting on that lock.
> If the lock is removed from the map while other threads are waiting for it, they may run into timeouts and force TimeoutExceptions to the client.
> The above methods "paints more dots per second" - means: it gives a better throughput for concurrent accesses to the same key.
> The re-implemented method should also fix some replication timeout exceptions.
> Please, please add this to 5.1.7, if possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira