On Sep 15, 2011, at 1:31 PM, Dan Berindei wrote:
On Thu, Sep 15, 2011 at 12:25 PM, Galder Zamarreño
<galder(a)redhat.com> wrote:
>
> On Sep 14, 2011, at 11:40 AM, Dan Berindei wrote:
>
>> Going back to your original question Galder, the exception is most
>> likely thrown because of this sequence of events:
>>
>> 0. Given a cluster {A, B}, a key k and a node C joining.
>> 1. Put acquires the "transaction lock" on node A (blocking rehashing)
>> 2. Put acquires lock for key k on node A
>> 3. Rehashing starts on node B, blocking transactions
>> 4. Put tries to acquire transaction lock on node B
>>
>> Since it's impossible to finish rehashing while the put operation
>> keeps the transaction lock on node A, the best option was to kill the
>> put operation by throwing a RehashInProgressException.
>>
>> I was thinking in the context of transactions when I wrote this code
>> (see
https://github.com/danberindei/infinispan/commit/6ed94d3b2e184d4a48d4e781...,
>> this scenario became just a footnote to the generic case with multiple
>> caches), but the scenario also occurs without transactions. Actually I
>> renamed it "state transfer lock" and I moved it to a separate
>> interceptor in my ISPN-1194 branch.
>
> Right, but it shouldn't happen when transactions are off, right?
>
It will still happen with transactions off, because state transfer
will acquire the state transfer lock in exclusive mode and
WriteCommands will acquire the state transfer lock in shared mode
instead of PrepareCommands with transactions on.
If we don't acquire the state transfer lock than we run the risk of
pushing stale data to other nodes, are you saying that without
transactions enabled this risk is acceptable?
Hmmm, at first glance no, it does not look acceptable. Refactoring though, as you did, is
in order though to move it somewhere else, cos clearly it's logic that's required
by both transactional and non-transactional caches.
>>
>> Maybe the locking changes in 5.1 will eliminate this scenario, but
>> otherwise we could improve the user experience by retrying the command
>> after the rehashing finishes.
>
> I'd prefer that, otherwise users are likely to code similar logic which is not
nice.
>
After looking at the scenario again it became clear that step 2. is
not necessary at all and the locking enhancements will not change
anything.
We could change DistributionInterceptor to not hold the state transfer
lock during remote calls, which will allow rehashing to proceed on the
origin. But then the ownership of the keys might change during the
call, so we'll need a retry phase anyway.
Dan
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache