[infinispan-dev] ISPN-293 getAsync impl requires more reengineering

Wed Feb 9 05:25:15 EST 2011

On Feb 9, 2011, at 10:54 AM, Mircea Markus wrote:

> 
> On 9 Feb 2011, at 08:14, Galder Zamarreño wrote:
> 
>> Hi,
>> 
>> Re: https://issues.jboss.org/browse/ISPN-293
>> 
>> I have an issue with my implementation that simply wraps the realRemoteGet in DistributionInterceptor around a Callable:
>> 
>> Assume that cache is configured with Distribution(numOwners=1, l1=enabled), no transactions, and we have a cluster of 2 nodes:
>> 
>> - [main-thread] Put k0 in a cache that should own it.
>> - [main-thread] Do a getAsync for k0 from a node that does not own it:
>> - [async-thread] This leads to a remote get which updates the L1 and updates the context created by the main-thread and putting the updated entry in there (however, this thread does not release the locks)
> Can't you use a different InvocationContext instance for this? But even with the same IC, wondering why IC doesn't get cleaned up: InvocationContextInterceptor.handleAll calls ic.reset() in its finally block. Are you starting the async thread after InvocationContextInterceptor?

The async thread does get started after InvocationContextInterceptor because you only want to start it if you have to go remote with the get and right now you only now that in DI. This is consistent with the other async* operations, the only thing that's made sync is when you actually have to go remote:

    * Asynchronous version of {@link #put(Object, Object, long, TimeUnit, long, TimeUnit)}.  This method does not block
    * on remote calls, even if your cache mode is synchronous.  Has no benefit over {@link #put(Object, Object, long,
    * TimeUnit, long, TimeUnit)} if used in LOCAL mode.

Now, using a different IC for when you know that the async thread needs to go remote *and* you need to store in L1 might be a viable option. However, my worry here goes back to locks. So, you'd have created a new IC, then acquired the locks on X, then reset the IC without clearing the locks. Would the fact that you didn't clear the locks cause issues? Hmmmm....

>> - [main-thread] Next up, we put a new key, i.e. k1, in the node that didn't own k0 (the node where we updated the L1 cache):
>> - Now this thread has used the same context that the async thread used,
>> so when it comes to releasing locks, it finds two entries in the context, the one locked by the async-thread and one for the main-thread and it fails with java.lang.IllegalMonitorStateException
>> 
>> Now, what's happening here is the async-thread is acquiring the lock but not releasing it because the release happens in the DistLockInterceptor/LockingInterceptor, so a different interceptor to where the lock is being acquired. So, in theory, the solution would be for DistLockInterceptor to wrap the invokeNext() and afterCall() for when an async get so that all "remote get", "l1 update", and "release locks", happen in a separate thread. However, for this to work, it will need to figure out whether a remoteGet() will be necessary in the first place, otherwise is useless. Whether the remoteGet should happen is determined by this code in DistributionInterceptor:
>> 
>> if (ctx.isOriginLocal() && !(isMappedToLocalNode = dm.isLocal(key)) && isNotInL1(key)) {
>> 
>> Also, if DistLockInterceptor does this check, we need to make sure DistributionInterceptor does not do it again, otherwise it's a waste. I think this might work although it does require some further reengineering.
>> 
>> I'm gonna try to implement this but wondering whether anyone can see any potential flaws here, or if anyone has any better ideas :)
> I might be missing something but don't we already do the same thing for asyncPut operationd and can't we follow the exact same pattern?

No, it's a very different one. In the rest of async* ops, the thing that's made async is the actual sending of the replication, once you've already updated the local cache. So, you never find yourself with having to update the local cache in the async thread.

But the problem here is that after the async get, more stuff happens (L1 update), so as Manik says in the JIRA, you can't just do the remote get and forget to update the L1 cache. So, this forces you to do a put within the async thread.

I think the solution might involve a combination of my suggestion to make sure locks are clear *and* use a different context for the different thread (only when L1 is enabled)

>> 
>> Cheers,
>> --
>> Galder Zamarreño
>> Sr. Software Engineer
>> Infinispan, JBoss Cache
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache