[infinispan-dev] Enconding of data WAS Re: Hot Rod - pt3

Alex Kluge java_kluge at yahoo.com
Mon Feb 15 15:01:38 EST 2010


Hi,

  There were a few questions, and it will be easier to understand the answers with a
 little background on the implementation.  The key and value objects are mapped to
 bytes for transmission over the network using an encoder object, an instance of
     org.jboss.cache.tcpcache.messages.encoder.BaseEncoder,
 and then decoded by a decoder object, which is an instance of
     org.jboss.cache.tcpcache.messages.decoder.BaseDecoder. 
 One of the goals of these objects is to, where possible, map the data into a
 representation that can be interpreted on any platform. As long as there is no loss of
 information, these classes can be adjusted as needed to encode and decode data fields
 efficiently.

  > Wrt the discussion below about encoding data, I wanted to know what
  > exactly COMPRESSED type meant in your wiki, or what exactly it represents.

 If a value encodes to a byte array longer than a threshold value, compressionThreshold
 in the encoder, an instance of java.util.zip.Deflater is used to compress the bytes before
 transmission over the wire, and the COMPRESSED flag is set. On receipt, the decoder
 sees the COMPRESSED flag and decompresses the bytes, then interprets them in the
 normal way. So, compression happens after converting the data to bytes, and
 decompression happens before converting from bytes.

 > Also, I wanted to know what bits you send when the object sent accross
 > is an instance of java.lang.Long. Apart from marking LONG, do you also
 > marked as SERIALIZED? Or do you just use SERIALIZED for user-specific
 > classes?

 A  java.lang.Long would only have the long bit set. A long (not an object) would have
 the long and primitive bits set. An object that the encoder does not understand would
 then be serialized using Java object serialization, and the serialized bit would be set.
 The long and Long values will be encoded identically, however, the decoder will return
 the appropriate type as determined by the flags.

 > With regards to individual fields in Map/Array, according to your wiki, 
 > each field then only contains the size of the field and the field. So, 
 > in a map of booleans, each field would be represented as: 0x01 and 
 > 0x01/0x00 with 1st being the size and 2nd being the actual value. Correct?

 Arrays and maps are treated a bit differently. Arrays are generally of a single known
 type, where maps tend to be between arbitrary types of objects. The encoding can
 encode the single type of an array, but not the multiple types of a map.

 As a result, arrays are encoded with a single size field, followed by all of the data bytes.
 The current, suboptimal, implementation places a byte count for all of the data followed
 by the type flags, followed by the data. Thus an array of two Integers would be encoded
 as

  A byte count of 12,
   four bytes for flags (INTEGER & ARRAY),
   four bytes for integer 1,
   four bytes for integer 2.

 This requires the whole array of encoded integers be loaded at once. Which should not
 be necessary. However, I was a bit pressed for time.

 A map is of necessity treated differently. The map also starts with the total byte
 count, and is then followed by the MAP flag. Each key and value in the map are then
 encoded as individual objects with their own, size, type and data fields.

 I would like to find a way of writing only the number of fields at the beginning of these,
 however, time pressure and the need to properly handle compressed data forced me to
 place the total number of bytes as the first field.
 
                                                                  Alex

--- On Tue, 1/12/10, Galder Zamarreno <galder at redhat.com> wrote:

> From: Galder Zamarreno <galder at redhat.com>
> Subject: Re: [infinispan-dev] Enconding of data WAS Re:  Hot Rod - pt3
> To: infinispan-dev at lists.jboss.org
> Date: Tuesday, January 12, 2010, 10:48 AM
> 
> 
> On 01/12/2010 05:46 PM, Galder Zamarreno wrote:
> > And another question:
> >
> > With regards to individual fields in Map/Array,
> according to your wiki,
> > each field then only contains the size of the field
> and the field. So,
> > in a map of booleans, each field would be represented
> as: 0x01 and
> > 0x01/0x00 with 1st being the size and 2nd being the
> actual value. Correct?
> 
> Well, this would be more like an array actually.
> 
> By the way, in a Map, how do you send key/value pairs? I
> suppose 
> [key][value][key][value]? And how do you provide type of
> key vs type of 
> value?
> 
> >
> > On 01/12/2010 04:46 PM, Galder Zamarreno wrote:
> >> Hi Alex,
> >>
> >> Wrt the discussion below about encoding data, I
> wanted to know what
> >> exactly COMPRESSED type meant in your wiki, or
> what exactly it represents.
> >>
> >> Also, I wanted to know what bits you send when the
> object sent accross
> >> is an instance of java.lang.Long. Apart from
> marking LONG, do you also
> >> marked as SERIALIZED? Or do you just use
> SERIALIZED for user-specific
> >> classes?
> >>
> >> Cheers,
> >>
> >> On 01/05/2010 10:52 AM, Galder Zamarreno wrote:
> >>>
> >>>
> >>> On 01/04/2010 10:44 PM, Alex Kluge wrote:
> >>>>>>      </snip>
> >>>
> >>>>
> >>>>>>      - What happens
> if the key or the value is not text? I have a way of
> >>>>>>       
> representing the data to allow for a wide variety of data
> types,
> >>>>>>        even
> allowing for arrays or maps. This will make the protocol
> more
> >>>>>>       
> complex, but the assumption that the data is a string is
> rather
> >>>>>>       
> limiting. This is already sketched out in the wiki.
> >>>
> >>> Hmmmmm, I don't think I've made any
> assumptions in the wiki that keys or
> >>> values are Strings unless I've made a mistake
> somewhere (maybe in the
> >>> example where I've used a particular encoding
> for Strings?). My thoughts
> >>> around this was that I was gonna treat them
> both as byte[]...
> >>>
> >>>>>
> >>>>> </snip>
> >>>>
> >>>>      The idea is to prefix
> each data block with a data type. This is a
> >>>>      lightweight binary
> protocol, so a full fledged mime type would be
> >>>>      overkill. There is a
> discussion and example in the Encoding Data
> >>>>      section of this page:
> >>>>
> >>>>        http://community.jboss.org/wiki/RemoteCacheInteractions
> >>>>
> >>>>      Data types are limited
> to things like integer, byte, string, boolean, etc.
> >>>>      Or, if it isn't a
> recognised type, the native platform serialisation is
> >>>>      used. There can be
> arrays of these types, or maps as well.
> >>>>
> >>>>      Each data type is
> represented by a bit, and they can be used in
> >>>>      combinations. An array
> of bytes would have the array, byte and
> >>>>      primitive bits set.
> The set of recognised data types can of course be
> >>>>      expanded.
> >>>
> >>> ... but that looks rather useful (at least
> based on the cached version
> >>> of that wiki that Google shows ;) - no wiki
> access right now :( ),
> >>> particularly the way you can combine different
> bytes to create composed
> >>> types. I don't see a problem with including
> this in Hot Rod.
> >>>
> >>> I suppose you included type length into length
> field so that in the
> >>> future you can support other types and
> possibly longer type fields?
> >>>
> >
> 
> -- 
> Galder Zamarreño
> Sr. Software Engineer
> Infinispan, JBoss Cache
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 


      




More information about the infinispan-dev mailing list