[infinispan-dev] API behaviour when using DIST

Manik Surtani manik at jboss.org
Fri Apr 24 10:41:18 EDT 2009


Ok, so this is my final, final solution.  :-)

Both sync and async are supported in DIST.  DIST will always do a get  
before a write, and this call will be synchronous.  (the actual write  
command will be executed remotely either sync or async based on cache  
mode).

This will ensure proper return values as well as proper behaviour of  
conditional invocations such as putIfAbsent, replace, etc.

I will still hang on to the <unsafe unreliableReturnValues /> option  
though, which will apply to both sync and async mode, which will allow  
for skipping the get-before-write, but only where this is unnecessary  
for accurate functioning of a method.

I.e., even if we set unreliableReturnValues to true, a get-before- 
write will *still* be performed in the following case: performing a  
conditional command (putIfAbsent, replace x 2, conditional remove)  
within a tx.

This provides total correctness with regards to all method operations  
in all cases, and only optionally breaking the return value contract  
and nothing else.

On 24 Apr 2009, at 10:35, Manik Surtani wrote:

>
> On 24 Apr 2009, at 06:52, Bela Ban wrote:
>
>> Whatever you do, I think quick'n'dirty fire&forget calls are  
>> important: a user might want to push data into the grid at a very  
>> high rate and does not care about return values (for now).
>>
>> Maybe an additional API ?
>>
>> void putAsync(key, value);
>>
>>
>> Or a batch API:
>>
>> void putBatch(Map<K,V> data);
>
> Well, putAll(Map<K, V> data) behaves like this since it returns  
> void...
>
>
>>
>>
>> ?
>>
>>
>>
>> Manik Surtani wrote:
>>> Thinking about this a bit more (and implementing tests, etc for  
>>> this) I think even for the async case we need to do the remote get  
>>> first (and as a side-effect this would provide reliable return  
>>> values).  The reason is that not doing this causes txs to behave  
>>> very weird and would need a lot of hacks to behave cleanly without  
>>> doing the eager get.
>>>
>>> Take this example (assuming dist-async)
>>>
>>> 1.  tx.begin
>>> 2.  putIfAbsent(k) // k exists elsewhere
>>> 3.  get(k) // this will return the OLD val of k, since the tx  
>>> hasnt completed and the owners haven't seen the WriteCommand in 2  
>>> yet!
>>>
>>> We could hack this to make a record of commands that will be  
>>> executed later, but in the case of conditional writes (like  
>>> putIfAbsent) we dont know if they will succeed.  So we could do a  
>>> get first as well, but in this case then we may as well stick with  
>>> a get-before-write approach and thereby provide reliable retvals.
>>>
>>> The actual commit would still be 1-phase and async though.
>>>
>>> WDYT?
>>>
>>> Also, for consistency, I suggest the same for non-tx-writes.  This  
>>> then serves the added benefit of removing the weird unsafe flag,  
>>> etc.  Ok, so it means doing a remote get before a put, but the  
>>> cost of this is mitigated because a) it is unicast  to a small set  
>>> of servers b) the RPC call returns as soon as we get the first  
>>> valid response and c) with MVCC, the get is very quick - no  
>>> locking needed on the remote end.
>>>
>>> Comments?
>>>
>>>
>>> On 21 Apr 2009, at 18:24, Manik Surtani wrote:
>>>
>>>>
>>>> On 21 Apr 2009, at 18:18, Mircea Markus wrote:
>>>>
>>>>> Manik Surtani wrote:
>>>>>>
>>>>>> On 21 Apr 2009, at 18:01, Mircea Markus wrote:
>>>>>>
>>>>>>> yes, indeed. that's what cache.retrieve("key1", "key2",  
>>>>>>> "keyn"...); would do, fetch all the remote values at once  
>>>>>>> (multiple keys mapped to one node will result in one  
>>>>>>> aggregated get).
>>>>>>>> Sounds pretty clunky though ...
>>>>>>> Might be. I think it is easy to grasp though, and can have  
>>>>>>> significant benefits for clients that know all the key set  
>>>>>>> they will manipulate in one session.
>>>>>>
>>>>>> But the keys retrieved could still be wiped out.
>>>>>>
>>>>>> 1.  start tx
>>>>>> 2.  retrieve(k1, k2, k3)
>>>>>> 3.  // go make coffee; other processes changing stuff, which  
>>>>>> removes keys from the L1, negating the effect of step 2
>>>>> Isn't that exactly what happens now with read mvcc entries being  
>>>>> held in context? This won't break neither read_committed nor  
>>>>> repetable_read.
>>>>
>>>> Not quite.  The return value is calculated atomically when the  
>>>> command is performed, even though the old value is cached in  
>>>> context.  E.g.,  locally,
>>>>
>>>> 1.  tx.begin
>>>> 2.  read K
>>>> 3.  // go make coffee
>>>> 4.  replace K.  This command is atomic and the retval is  
>>>> extracted from the datacontainer as this command is  
>>>> perform()'ed.  So what this invocation returns is accurate  
>>>> regardless of interleaving writes between step 2 & 4
>>>> 5. ...
>>>>
>>>>>> 4.  replace(k1, v1) // will return incorrect retval.  Or will  
>>>>>> need to do a remote get again at this point
>>>>>> 5.  end tx
>>>>>>
>>>>>> -- 
>>>>>> Manik Surtani
>>>>>> manik at jboss.org
>>>>>> Lead, Infinispan
>>>>>> Lead, JBoss Cache
>>>>>> http://www.infinispan.org
>>>>>> http://www.jbosscache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> -- 
>>>> Manik Surtani
>>>> manik at jboss.org
>>>> Lead, Infinispan
>>>> Lead, JBoss Cache
>>>> http://www.infinispan.org
>>>> http://www.jbosscache.org
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> -- 
>>> Manik Surtani
>>> manik at jboss.org
>>> Lead, Infinispan
>>> Lead, JBoss Cache
>>> http://www.infinispan.org
>>> http://www.jbosscache.org
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> -- 
>> Bela Ban
>> Lead JGroups / Clustering Team
>> JBoss - a division of Red Hat
>>
>
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org







More information about the infinispan-dev mailing list