[infinispan-dev] HotRod: Digging further into the encoding of data

Galder Zamarreno galder at redhat.com
Fri Feb 19 08:59:19 EST 2010


Hmmm, good point. I do realise that supporting a small subset of encoding  
is probably worst cos it's half baked rather than simply having no  
encoding and allowing clients to gives us stuff as byte[]. Besides, we'll  
always have situations were people store byte[], or keys that do not have  
a reasonable String[] representation, so we need to be able to deal with  
those. I can see a complex encoding system being be hard to maintain as  
well.

Ok, let's go with Manik's suggestion and keep all as byte[] so that it  
remains as simple as possible both for the server and client side.

I do like the idea of the client having pluggable marshalling.

I'll amend the Hot Rod wiki later today.

On Thu, 18 Feb 2010 18:07:01 +0100, Manik Surtani <manik at jboss.org> wrote:

>
> On 18 Feb 2010, at 16:39, Galder Zamarreno wrote:
>
>> On Thu, 18 Feb 2010 17:11:49 +0100, Manik Surtani <manik at jboss.org>  
>> wrote:
>>
>>>
>>> On 18 Feb 2010, at 15:51, Galder Zamarreno wrote:
>>>
>>>> That's indeed an option but then debugging of logs could be quite a
>>>> nightmare if all we store in the cache a byte[] and there was no
>>>> readable
>>>> presentation for that. This would also be a problem if in the future  
>>>> we
>>>> decided to allow some tool to inspect the contents of the cache.
>>>>
>>>> The idea behind the encoding is to at least have a way to encode the
>>>> simplest objects that can be passed: primitives and Strings and
>>>> collections of these. Obviously, if users want to simply store a blog,
>>>> i.e. an image, then that's a byte[] and nothing can be done about  
>>>> that.
>>>>
>>>> I agree that maybe some stuff can be simplified. We could simply  
>>>> support
>>>> primitives and Strings and treat the rest as a byte[].
>>>
>>> That makes more sense, although you still then have the issue of
>>> collections and arrays.
>>
>> What issue exactly? Anyone sending collections or arrays needs to  
>> marshall
>> them into a byte[] and pass that. I don't see a problem. Mircea, maybe  
>> you
>> have thought of how you expect to present this to the clients?
>
> Well, if you want to store strings and primitives as they are for for  
> debugging purposes, then you lose that benefit the moment someone passes  
> in an array of strings or primitives.  :)
>
>>
>>> Personally, I'd be in favour of just using byte[]s, unless we have a
>>> really good argument against this.  No point in reimplementing a
>>> marshalling layer and object encoding for HotRod.
>>
>> Is debugging not enough of a reason to at least support the basic types?
>> Imagine having to debug through a Infinispan Hot Rod server storing  
>> String
>> k,v pairs but having to figure out each String's byte[] format to find  
>> the
>> one where things fail. Anything other than primitive types we have no
>> other option. We'd just have to make sure that if client logs are
>> available, things are logged properly. At the very least, we should make
>> it easy to pass Strings and maybe forget about the rest of types.
>
> You will also need to deal with collections of {Strings, primitives,  
> byte[]s}.  See above.
>
>>
>>>
>>> Cheers
>>> Manik
>>>
>>>>
>>>> Thoughts?
>>>>
>>>> On Thu, 18 Feb 2010 16:40:32 +0100, Manik Surtani <manik at jboss.org>
>>>> wrote:
>>>>
>>>>> Not sure I understand.  Why does this need to be as complex as this?
>>>>> When you say encoding of data, surely all this data is, is a byte[]?
>>>>>
>>>>> On 18 Feb 2010, at 15:26, Galder Zamarreno wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've been looking again at the encoding of data in Hot Rod
>>>>>> (http://community.jboss.org/wiki/HotRodProtocol) and there's a few
>>>>>> things
>>>>>> I'm not too happy about or they're not totally clear:
>>>>>>
>>>>>> 1. The type appears to be pretty wasteful because the majority of
>>>>>> types
>>>>>> cannot combined with others, for example: You cannot have a type  
>>>>>> that
>>>>>> is
>>>>>> Boolean and Long at the same time, or Double and Character. However,
>>>>>> you
>>>>>> can potentially have an Arrays of Serialized. So, I propose instead
>>>>>> separating between a meta type and type.
>>>>>>
>>>>>> Meta types would be: Array, Map, Primitive and Compressed. The meta
>>>>>> type
>>>>>> would be encoded using bit ops, so you could combine them in diff
>>>>>> ways,
>>>>>> i.e. Array of primitives. I don't think we should support  
>>>>>> combinations
>>>>>> off
>>>>>> diff collections, i.e Map of Array, or Array of Maps. It would
>>>>>> complexity
>>>>>> and don't forsee an immediate req for this.
>>>>>>
>>>>>> Type would be: Byte, Boolean, Character, String, Date, Double,  
>>>>>> Float,
>>>>>> Integer, Long, Short, Serialized, StringBuilder, and Any. These  
>>>>>> would
>>>>>> literals from 1 to N. Note that I've added Any to separate between  
>>>>>> two
>>>>>> different collections. For example, if you send an Array of String,
>>>>>> each
>>>>>> individual element just follows together with its size. However, if
>>>>>> you
>>>>>> send an Array of Any, each individual entry must define its type.
>>>>>>
>>>>>> For Maps, we've got two options: First, no type assumptions made and
>>>>>> let
>>>>>> each key/value define its own type. Or allow maps meta-type
>>>>>> definitions
>>>>>> to
>>>>>> be followed by not one but two type fields. Even if the map was of
>>>>>> mixed
>>>>>> types, you could have Any, Any. My preference is for the latter.
>>>>>>
>>>>>> Both type and metatype would be variable length integers.
>>>>>>
>>>>>> 2. Serialized will be stored as byte[] internally, no attempt to
>>>>>> unmarshalling will be done in Hot Rod. Clients decide how they wanna
>>>>>> marshall this Serialized types. They just need to gives us a byte[]
>>>>>> and
>>>>>> its length.
>>>>>>
>>>>>> 3. To clarify something that Alex mentioned in the previous encoding
>>>>>> data
>>>>>> email, Arrays and Maps are followed by the number of items in Hot  
>>>>>> Rod
>>>>>> and
>>>>>> not the number of bytes. In case of Arrays of Any, each individual
>>>>>> field
>>>>>> gives us its size, type and the data. In case of Array of Booleans,
>>>>>> each
>>>>>> individual field comes with size and data. Size might have been
>>>>>> optional
>>>>>> in each field of Array of Boolean, but it simplifies deadling with
>>>>>> Array
>>>>>> of Serialized, where each individual field is a byte[] of arbitrary
>>>>>> length.
>>>>>>
>>>>>> For Maps, a similar thing happens. If we have a Map of String,
>>>>>> Boolean,
>>>>>> we
>>>>>> get key/value pair like: k=[size+data]v=[size+data]. If it's a Map  
>>>>>> of
>>>>>> Any,
>>>>>> Any, we get k=[type+size+data],v=[size+data]
>>>>>
>>>>>
>>>>> --
>>>>> Manik Surtani
>>>>> manik at jboss.org
>>>>> Lead, Infinispan
>>>>> Lead, JBoss Cache
>>>>> http://www.infinispan.org
>>>>> http://www.jbosscache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Galder Zamarreño
>>>> Sr. Software Engineer
>>>> Infinispan, JBoss Cache
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> --
>>> Manik Surtani
>>> manik at jboss.org
>>> Lead, Infinispan
>>> Lead, JBoss Cache
>>> http://www.infinispan.org
>>> http://www.jbosscache.org
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> --
>> Galder Zamarreño
>> Sr. Software Engineer
>> Infinispan, JBoss Cache
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


-- 
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache




More information about the infinispan-dev mailing list