[infinispan-dev] HotRod?? [ISPN-29 and a "custom protocol"]

Manik Surtani manik at jboss.org
Mon Dec 14 07:39:51 EST 2009


On 11 Dec 2009, at 17:42, Alex Kluge wrote:

>> In a cluster, you can't just have an atomic long starting at 0 that you 
>> increase everytime a node is modified, because you could easily find 
>> yourself in a situation where two nodes modifying the same key might 
>> generate the same cas id.
> 
> Hmmm, in a cluster no mater who writes the key/vale, they have to wind up
> modifying the same data, no? Otherwise the cluster becomes inconsistent.
> That is the point, where, on modification, you can increment the counter.
> 
> Eviction, however, raises more challenges. Clearly, when a value is
> evicted, you can't simply restart the counter when the data is rewritten.
> This would confuse the clients. However, I also get uncomfortable putting
> something time consuming into the main write path. I guess I am a bit
> jaded from when System.currentTimeMillis() took a long time.
> 
> But then again:
>  http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6440250
> 
> Also note from this that sometimes nanoTime just returns currentTimeMillis*10^6.

All valid points above.  nanotime() should never, IMO, be on any critical path.

> Perhaps setting an initial value in a time consuming way, then a simple
> increment thereafter. The goal is to get consistent, usable behaviour,
> while maintaining performance.
> 
> We definitely shouldn't use completely independent CAS values for
> different protocols.
> 
>                            Got to run,
>                                     Alex
> 
> --- On Fri, 12/11/09, Galder Zamarreno <galder at redhat.com> wrote:
> 
>> From: Galder Zamarreno <galder at redhat.com>
>> Subject: Re: [infinispan-dev] HotRod?? [ISPN-29 and a "custom protocol"]
>> To: infinispan-dev at lists.jboss.org
>> Date: Friday, December 11, 2009, 2:30 AM
>> 
>> 
>> On 12/10/2009 08:19 PM, Alex Kluge wrote:
>>>>>> Yeah, u can get the cas id via gets
>> command.
>>>>>> 
>>>>>>> 
>>>>>>>> In the txt protocol, I implemented
>> the cas id using System.nanoTime().
>>>>>>> 
>>>>>>> Is this cas id meant to be
>> unique?  If so, System.nanoTime() may not be ...
>>>>>> 
>>>>>> Not over time. You want to be unique from
>> the minute you retrieved it
>>>>>> until you use it and for operations on
>> that particular key.
>>>>>> 
>>>>>> The cas Id is simply used for comparison,
>> to check whether someone else
>>>>>> has changed the entry since you retrieved
>> it. System.nanoTime() gives
>>>>>> you precisely that, anyone that modifies
>> that entry in that machine will
>>>>>> definitely have a different nano time.
>> Also, since nano time is based
>>>>>> off an arbitrary time, the chances that a
>> different machine will produce
>>>>>> the same nano time when modifying that
>> very same key are almost very
>>>>>> very small.
>>>>> 
>>>>> Improbable but certainly not impossible, even
>> in 1 machine.
>>>>> 
>>>>> Why not just use what we use for 'versioning'
>> internally - object references? 
>> System.identityHashCode(), for example?
>>>> 
>>>> Remember that you have to deal with
>> eviction/expiration too. You could
>>>> get a case like this:
>>>> - get a cas based on identity hash code
>>>> - the entry is expired
>>>> - a new entry is put, with diff values, which
>> happens to have the same
>>>> identify hash code.
>>>> - doing a cas will succeed when it shouldn't.
>>>> 
>>>> I don't think this can happen with
>> System.nanoTime().
>>>> 
>>>> When looking into this, I checked java.util.UUID
>> but it didn't work, for
>>>> at least for the memcached txt protocol, where a
>> 64bit value is required
>>>> and UUID is 128bits. Maybe we could try to
>> compress it somehow?
>>> 
>>>> Actually, according to the memcached-txt protocol
>> definition, this is a
>>>> 64-bit integer, so no real chance of compacting
>> that. You need to
>>>> provide a long of some sort:
>>>> 
>>>> http://github.com/memcached/memcached/blob/master/doc/protocol.txt
>>> 
>>>    Is there any expected use of this
>> value where it will be used outside of
>>> the context of a specific key? If not, then it does
>> not have to be
>>> globally unique, and indeed, can simply be a
>> modification count.
>>> Easily computed, and easily fit within 64 bits even
>> over a long lifetime
>>> of a server.
>> 
>> To do that, might as well use System.nanoTime(). It's much
>> better 
>> because than modification count because each VM will start
>> off a 
>> different offset.
>> 
>> In a cluster, you can't just have an atomic long starting
>> at 0 that you 
>> increase everytime a node is modified, because you could
>> easily find 
>> yourself in a situation where two nodes modifying the same
>> key might 
>> generate the same cas id. If you don't start at 0, that's
>> pretty much 
>> what you get with System.nanoTime() and I think it's better
>> to call that 
>> all the time, than taking the given long and adding one at
>> the time.
>> 
>> Someone might think that you could have a counter per cache
>> entry but 
>> this definitely cannot start at 0, since after eviction,
>> when the entry 
>> is gone, it'll go back to 0 and you can have issues like
>> the one I 
>> explained above.
>> 
>> Given the constraints of at least the memcached txt
>> protocol, I think 
>> System.nanoTime() is the best option in that case. For
>> hotrod, a 128-bit 
>> UUID would avoid issues mentioned here. We'd only need to
>> make sure to 
>> find a performant solution.
>> 
>>> 
>>>    For now though the real question is
>> can we fit a quantity that will
>>> fulfil the requirements for the field into the space
>> alloted. I think
>>> we can.
>>> 
>>> 
>>>                
>>                
>>          Alex
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
> 
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> -- 
>> Galder Zamarreño
>> Sr. Software Engineer
>> Infinispan, JBoss Cache
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org








More information about the infinispan-dev mailing list