[infinispan-dev] API behaviour when using DIST

Mircea Markus mircea.markus at jboss.com
Tue Apr 21 13:01:24 EDT 2009


Manik Surtani wrote:
>
> On 21 Apr 2009, at 16:54, Mircea Markus wrote:
>
>> Manik Surtani wrote:
>>> Ok, after discussing this on IRC and the clustering conf call, here 
>>> is what we will do:
>>>
>>> 1.  Non-transactional
>>>
>>> * Only support DIST in SYNC mode.  Throw a configuration exception 
>>> if DIST and ASYNC are detected.
>>> * Allow DIST_ASYNC if an unsafe flag is set to true (breakApi=true?) 
>>> in the config, so that the user is explicitly aware that API return 
>>> values will be unreliable.
>>> * Make sure this is documented in the Javadocs as well as the FAQs
>> I like that the user needs to explicitly enable that, i.e. to ack 
>> that he/she won't rely in return values.
>> One more thing (and I think this might be applied in 2 and 3)
>> what if the user previously retrieved the values, in some sort of 
>> batch operation. I.e.:
>> cache.retrieve("key1", "key2", "keyn");
>> //then operate on these keys only oldVal = 
>> cache.replace("key1","newVal");//at this point the data would already 
>> be in the context/L1 and the API is not broken.
>> If user tries to use an "keyX" that was not fetched, then fail with 
>> an exception (unless breakApi is true).
>
> The breakApi workaround (calling this unreliableReturnValues) is a 
> configuration sanity check flag.  It only applies when the cache 
> starts up, and not while it is running.
>
>> the advantage of retreive operation would be that it minimize network 
>> round trips, i,e. "key1" and "key2" might be on the same node, so 
>> only one roundtrip.
>> (we can use cache.startBatch() and cache.endBatch() instead of the 
>> cache.retrieve(),  though  I think  latter is more descriptive).
>
> Not so sure I understand.  Return values need to be provided on a 
> per-invocation basis.  How would you preempt which keys would be 
> needed?  :-)  Or are you suggesting make this an explicit API call 
> that the user would invoke before calling the write methods?
yes, indeed. that's what cache.retrieve("key1", "key2", "keyn"...); 
would do, fetch all the remote values at once (multiple keys mapped to 
one node will result in one aggregated get).
>   Sounds pretty clunky though ...
Might be. I think it is easy to grasp though, and can have significant 
benefits for clients that know all the key set they will manipulate in 
one session.
>
>>> 2.  Transactional - Async
>>>
>>> * As above, do not support unless breakApi = true.  In which case 
>>> this is simple - no return values, only broadcast modifications in 
>>> prepare as we do with REPL_SYNC.
>>>
>>> 3.  Transactional - Sync
>>>
>>> * Offer 2 options here.  Option 1 is simple and we already have 
>>> this.  Option 2 will need to be written.
>>>
>>> Option 1:  break API.  This will behave the same as async, in that 
>>> we don't care about return values, but we still use a 2-phase commit 
>>> protocol for the transaction boundary.  Users will still need to 
>>> specify breakApi = true so this is explicit.
>>>
>>> Option 2: Before any write RPC call is broadcast, a remote get is 
>>> performed.  This pulls back the necessary values so that a reliable 
>>> return value is available.
>> guess this will also use a 2pc? Also, are we locking progressively or 
>> at commit time only, when broadcasting the modifications?
>
> Yes, 2PC.  And yes, when the prepare command is broadcast.
>
> Although now that I think about this, this could result in 
> inconsistent return values again!  E.g.,
>
> Cluster = {A, B, C, D}
> K maps to A and B
>
> on C
> ----
> 1.  tx.begin()
> 2.  C.remove(K) // does a remote get from A or B, no locks yet on A or 
> B.  Return value accordingly based on local operation.
> 3.  tx.commit() // broadcast, acquire locks, etc
>
> on D
> ----
> 1.  tx.begin()
> 2.  D.remove(K) // this happens *before* C commits.  So we still see 
> the old value before C's remove is applied to D
> 3.  tx.commit()
>
> I guess this is a classic read-committed case which is permitted, or a 
> write-skew case in repeatable_read, which could even happen in local 
> mode.
>
> The other option is eager command RPC but that is inefficient for a 
> number of reasons (locks held unnecessarily longer, plus extra work in 
> fitting this with our existing transaction processing sequences)
>
> Cheers
> -- 
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>




More information about the infinispan-dev mailing list