[infinispan-dev] HotRod: Digging further into the encoding of data

Alex Kluge java_kluge at yahoo.com
Thu Feb 18 14:25:41 EST 2010


 > The type appears to be pretty wasteful because the majority of types
          > cannot combined with others, for example: You cannot have a type that is
          > Boolean and Long at the same time, or Double and Character. However, you
          > can potentially have an Arrays of Serialized. So, I propose instead
          > separating between a meta type and type.
 
           Well, in a case like that, simply set the array bit, but set no type bit. The encoder
           would then encode each object separately.  And the decoder decodes each object
           separately. However, in the case that they are all ints, or all characters, providing
           that data at the beginning of the stream allows the data to be written without
           the type.

          > Meta types would be: Array, Map, Primitive and Compressed. The meta type
          > would be encoded using bit ops, so you could combine them in diff ways,
          > i.e. Array of primitives. I don't think we should support combinations off
          > diff collections, i.e Map of Array, or Array of Maps. It would complexity
          > and don't forsee an immediate req for this.

          This follows pretty naturally from my above comments, including arrays of
          maps or maps, etc.

          > For Maps, we've got two options: First, no type assumptions made and let
          > each key/value define its own type.

          Which is what I did.

          > Or allow maps meta-type definitions to be followed by not one but two type
          > fields. Even if the map was of mixed types, you could have Any, Any. My
          > preference is for the latter.

          Now that we have more time, this becomes a viable alternative.

          > 2. Serialized will be stored as byte[] internally, no attempt to
          > unmarshalling will be done in Hot Rod. Clients decide how they wanna
          > marshall this Serialized types. They just need to gives us a byte[] and
          > Its length.

          If they pass a byte array, then store it as a byte array. However, if they pass an
          object that doesn't match any known type, then serialize it. But, you do have
          to be careful not to attempt to deserialize these objects within the cache. It is
          more than likely that the cache has no definition for the object.

          > 3. To clarify something that Alex mentioned in the previous encoding data
          > email, Arrays and Maps are followed by the number of items in Hot Rod and
          > not the number of bytes. In case of Arrays of Any, each individual field
          > gives us its size, type and the data. In case of Array of Booleans, each
           > individual field comes with size and data. Size might have been optional
           > in each field of Array of Boolean, but it simplifies dealing with Array
           > of Serialized, where each individual field is a byte[] of arbitrary length.

           My original plan was to use the count of objects, and in most cases this works.
           However, when the data is compressed, you need to read the whole compressed
           block (at least as I had implemented the code). We have the time now, perhaps,
           to come up with a more elegant approach.

--- On Thu, 2/18/10, Manik Surtani <manik at jboss.org> wrote:

> From: Manik Surtani <manik at jboss.org>
> Subject: Re: [infinispan-dev] HotRod: Digging further into the encoding of data
> To: "infinispan -Dev List" <infinispan-dev at lists.jboss.org>
> Date: Thursday, February 18, 2010, 9:40 AM
> Not sure I understand.
>  Why does this need to be as complex as this?
>  When you say encoding of data, surely all this data
> is, is a byte[]?
> On 18 Feb 2010, at 15:26, Galder
> Zamarreno wrote:
> Hi,
> 
> I've been looking again at the encoding of data in Hot
> Rod  
> (http://community.jboss.org/wiki/HotRodProtocol)
> and there's a few things  
> I'm not too happy about or they're not totally
> clear:
> 
> 1. The type appears to be pretty wasteful because the
> majority of types  
> cannot combined with others, for example: You cannot have a
> type that is  
> Boolean and Long at the same time, or Double and Character.
> However, you  
> can potentially have an Arrays of Serialized. So, I propose
> instead  
> separating between a meta type and type.
> 
> Meta types would be: Array, Map, Primitive and Compressed.
> The meta type  
> would be encoded using bit ops, so you could combine them
> in diff ways,  
> i.e. Array of primitives. I don't think we should
> support combinations off  
> diff collections, i.e Map of Array, or Array of Maps. It
> would complexity  
> and don't forsee an immediate req for this.
> 
> Type would be: Byte, Boolean, Character, String, Date,
> Double, Float,  
> Integer, Long, Short, Serialized, StringBuilder, and Any.
> These would  
> literals from 1 to N. Note that I've added Any to
> separate between two  
> different collections. For example, if you send an Array of
> String, each  
> individual element just follows together with its size.
> However, if you  
> send an Array of Any, each individual entry must define its
> type.
> 
> For Maps, we've got two options: First, no type
> assumptions made and let  
> each key/value define its own type. Or allow maps meta-type
> definitions to  
> be followed by not one but two type fields. Even if the map
> was of mixed  
> types, you could have Any, Any. My preference is for the
> latter.
> 
> Both type and metatype would be variable length integers.
> 
> 2. Serialized will be stored as byte[] internally, no
> attempt to  
> unmarshalling will be done in Hot Rod. Clients decide how
> they wanna  
> marshall this Serialized types. They just need to gives us
> a byte[] and  
> its length.
> 
> 3. To clarify something that Alex mentioned in the
> previous encoding data  
> email, Arrays and Maps are followed by the number of items
> in Hot Rod and  
> not the number of bytes. In case of Arrays of Any, each
> individual field  
> gives us its size, type and the data. In case of Array of
> Booleans, each  
> individual field comes with size and data. Size might have
> been optional  
> in each field of Array of Boolean, but it simplifies
> deadling with Array  
> of Serialized, where each individual field is a byte[] of
> arbitrary length.
> 
> For Maps, a similar thing happens. If we have a Map of
> String, Boolean, we  
> get key/value pair like: k=[size+data]v=[size+data]. If
> it's a Map of Any,  
> Any, we get k=[type+size+data],v=[size+data]
> 
> 
> 
> --Manik
> Surtanimanik at jboss.orgLead,
> InfinispanLead, JBoss Cachehttp://www.infinispan.orghttp://www.jbosscache.org
> 
> 
> 
> 
> 
> 
> -----Inline Attachment Follows-----
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


      




More information about the infinispan-dev mailing list