Re: [infinispan-dev] Enconding of data WAS Re: Hot Rod - pt3

Monday, 15 February 2010

Hi,

  There were a few questions, and it will be easier to understand the answers with a
 little background on the implementation.  The key and value objects are mapped to
 bytes for transmission over the network using an encoder object, an instance of
     org.jboss.cache.tcpcache.messages.encoder.BaseEncoder,
 and then decoded by a decoder object, which is an instance of
     org.jboss.cache.tcpcache.messages.decoder.BaseDecoder. 
 One of the goals of these objects is to, where possible, map the data into a
 representation that can be interpreted on any platform. As long as there is no loss of
 information, these classes can be adjusted as needed to encode and decode data fields
 efficiently.

...
 Wrt the discussion below about encoding data, I wanted to know what
 exactly COMPRESSED type meant in your wiki, or what exactly it represents. 
 If a value encodes to a byte array longer than a threshold value, compressionThreshold
 in the encoder, an instance of java.util.zip.Deflater is used to compress the bytes
before
 transmission over the wire, and the COMPRESSED flag is set. On receipt, the decoder
 sees the COMPRESSED flag and decompresses the bytes, then interprets them in the
 normal way. So, compression happens after converting the data to bytes, and
 decompression happens before converting from bytes.

...
 Also, I wanted to know what bits you send when the object sent
accross
 is an instance of java.lang.Long. Apart from marking LONG, do you also
 marked as SERIALIZED? Or do you just use SERIALIZED for user-specific
 classes? 
 A  java.lang.Long would only have the long bit set. A long (not an object) would have
 the long and primitive bits set. An object that the encoder does not understand would
 then be serialized using Java object serialization, and the serialized bit would be set.
 The long and Long values will be encoded identically, however, the decoder will return
 the appropriate type as determined by the flags.

...
 With regards to individual fields in Map/Array, according to your
wiki, 
 each field then only contains the size of the field and the field. So, 
 in a map of booleans, each field would be represented as: 0x01 and 
 0x01/0x00 with 1st being the size and 2nd being the actual value. Correct? 
 Arrays and maps are treated a bit differently. Arrays are generally of a single known
 type, where maps tend to be between arbitrary types of objects. The encoding can
 encode the single type of an array, but not the multiple types of a map.

 As a result, arrays are encoded with a single size field, followed by all of the data
bytes.
 The current, suboptimal, implementation places a byte count for all of the data followed
 by the type flags, followed by the data. Thus an array of two Integers would be encoded
 as

  A byte count of 12,
   four bytes for flags (INTEGER & ARRAY),
   four bytes for integer 1,
   four bytes for integer 2.

 This requires the whole array of encoded integers be loaded at once. Which should not
 be necessary. However, I was a bit pressed for time.

 A map is of necessity treated differently. The map also starts with the total byte
 count, and is then followed by the MAP flag. Each key and value in the map are then
 encoded as individual objects with their own, size, type and data fields.

 I would like to find a way of writing only the number of fields at the beginning of
these,
 however, time pressure and the need to properly handle compressed data forced me to
 place the total number of bytes as the first field.

                                                                  Alex

--- On Tue, 1/12/10, Galder Zamarreno <galder(a)redhat.com&gt; wrote:

...
 From: Galder Zamarreno <galder(a)redhat.com&gt;
 Subject: Re: [infinispan-dev] Enconding of data WAS Re:  Hot Rod - pt3
 To: infinispan-dev(a)lists.jboss.org
 Date: Tuesday, January 12, 2010, 10:48 AM

 On 01/12/2010 05:46 PM, Galder Zamarreno wrote:
 > And another question:
 >
 > With regards to individual fields in Map/Array,
 according to your wiki,
 > each field then only contains the size of the field
 and the field. So,
 > in a map of booleans, each field would be represented
 as: 0x01 and
 > 0x01/0x00 with 1st being the size and 2nd being the
 actual value. Correct?

 Well, this would be more like an array actually.

 By the way, in a Map, how do you send key/value pairs? I
 suppose 
 [key][value][key][value]? And how do you provide type of
 key vs type of 
 value?

 >
 > On 01/12/2010 04:46 PM, Galder Zamarreno wrote:
 >> Hi Alex,
 >>
 >> Wrt the discussion below about encoding data, I
 wanted to know what
 >> exactly COMPRESSED type meant in your wiki, or
 what exactly it represents.
 >>
 >> Also, I wanted to know what bits you send when the
 object sent accross
 >> is an instance of java.lang.Long. Apart from
 marking LONG, do you also
 >> marked as SERIALIZED? Or do you just use
 SERIALIZED for user-specific
 >> classes?
 >>
 >> Cheers,
 >>
 >> On 01/05/2010 10:52 AM, Galder Zamarreno wrote:
 >>>
 >>>
 >>> On 01/04/2010 10:44 PM, Alex Kluge wrote:
 >>>>>>      </snip>
 >>>
 >>>>
 >>>>>>      - What happens
 if the key or the value is not text? I have a way of
 >>>>>>       
 representing the data to allow for a wide variety of data
 types,
 >>>>>>        even
 allowing for arrays or maps. This will make the protocol
 more
 >>>>>>       
 complex, but the assumption that the data is a string is
 rather
 >>>>>>       
 limiting. This is already sketched out in the wiki.
 >>>
 >>> Hmmmmm, I don't think I've made any
 assumptions in the wiki that keys or
 >>> values are Strings unless I've made a mistake
 somewhere (maybe in the
 >>> example where I've used a particular encoding
 for Strings?). My thoughts
 >>> around this was that I was gonna treat them
 both as byte[]...
 >>>
 >>>>>
 >>>>> </snip>
 >>>>
 >>>>      The idea is to prefix
 each data block with a data type. This is a
 >>>>      lightweight binary
 protocol, so a full fledged mime type would be
 >>>>      overkill. There is a
 discussion and example in the Encoding Data
 >>>>      section of this page:
 >>>>
 >>>>        http://community.jboss.org/wiki/RemoteCacheInteractions
 >>>>
 >>>>      Data types are limited
 to things like integer, byte, string, boolean, etc.
 >>>>      Or, if it isn't a
 recognised type, the native platform serialisation is
 >>>>      used. There can be
 arrays of these types, or maps as well.
 >>>>
 >>>>      Each data type is
 represented by a bit, and they can be used in
 >>>>      combinations. An array
 of bytes would have the array, byte and
 >>>>      primitive bits set.
 The set of recognised data types can of course be
 >>>>      expanded.
 >>>
 >>> ... but that looks rather useful (at least
 based on the cached version
 >>> of that wiki that Google shows ;) - no wiki
 access right now :( ),
 >>> particularly the way you can combine different
 bytes to create composed
 >>> types. I don't see a problem with including
 this in Hot Rod.
 >>>
 >>> I suppose you included type length into length
 field so that in the
 >>> future you can support other types and
 possibly longer type fields?
 >>>
 >

 -- 
 Galder Zamarreño
 Sr. Software Engineer
 Infinispan, JBoss Cache
 _______________________________________________
 infinispan-dev mailing list
 infinispan-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Enconding of data WAS Re: Hot Rod - pt3