[infinispan-dev] Enconding of data WAS Re: Hot Rod - pt3
Galder Zamarreno
galder at redhat.com
Thu Feb 18 07:12:35 EST 2010
Alex,
Thanks for the clarifications. One last question, is the implementation
you did open source or closed source?
Cheers,
On Mon, 15 Feb 2010 21:01:38 +0100, Alex Kluge <java_kluge at yahoo.com>
wrote:
> Hi,
>
> There were a few questions, and it will be easier to understand the
> answers with a
> little background on the implementation. The key and value objects are
> mapped to
> bytes for transmission over the network using an encoder object, an
> instance of
> org.jboss.cache.tcpcache.messages.encoder.BaseEncoder,
> and then decoded by a decoder object, which is an instance of
> org.jboss.cache.tcpcache.messages.decoder.BaseDecoder.
> One of the goals of these objects is to, where possible, map the data
> into a
> representation that can be interpreted on any platform. As long as
> there is no loss of
> information, these classes can be adjusted as needed to encode and
> decode data fields
> efficiently.
>
> > Wrt the discussion below about encoding data, I wanted to know what
> > exactly COMPRESSED type meant in your wiki, or what exactly it
> represents.
>
> If a value encodes to a byte array longer than a threshold value,
> compressionThreshold
> in the encoder, an instance of java.util.zip.Deflater is used to
> compress the bytes before
> transmission over the wire, and the COMPRESSED flag is set. On receipt,
> the decoder
> sees the COMPRESSED flag and decompresses the bytes, then interprets
> them in the
> normal way. So, compression happens after converting the data to bytes,
> and
> decompression happens before converting from bytes.
>
> > Also, I wanted to know what bits you send when the object sent accross
> > is an instance of java.lang.Long. Apart from marking LONG, do you also
> > marked as SERIALIZED? Or do you just use SERIALIZED for user-specific
> > classes?
>
> A java.lang.Long would only have the long bit set. A long (not an
> object) would have
> the long and primitive bits set. An object that the encoder does not
> understand would
> then be serialized using Java object serialization, and the serialized
> bit would be set.
> The long and Long values will be encoded identically, however, the
> decoder will return
> the appropriate type as determined by the flags.
>
> > With regards to individual fields in Map/Array, according to your
> wiki,
> > each field then only contains the size of the field and the field. So,
> > in a map of booleans, each field would be represented as: 0x01 and
> > 0x01/0x00 with 1st being the size and 2nd being the actual value.
> Correct?
>
> Arrays and maps are treated a bit differently. Arrays are generally of
> a single known
> type, where maps tend to be between arbitrary types of objects. The
> encoding can
> encode the single type of an array, but not the multiple types of a map.
>
> As a result, arrays are encoded with a single size field, followed by
> all of the data bytes.
> The current, suboptimal, implementation places a byte count for all of
> the data followed
> by the type flags, followed by the data. Thus an array of two Integers
> would be encoded
> as
>
> A byte count of 12,
> four bytes for flags (INTEGER & ARRAY),
> four bytes for integer 1,
> four bytes for integer 2.
>
> This requires the whole array of encoded integers be loaded at once.
> Which should not
> be necessary. However, I was a bit pressed for time.
>
> A map is of necessity treated differently. The map also starts with the
> total byte
> count, and is then followed by the MAP flag. Each key and value in the
> map are then
> encoded as individual objects with their own, size, type and data
> fields.
>
> I would like to find a way of writing only the number of fields at the
> beginning of these,
> however, time pressure and the need to properly handle compressed data
> forced me to
> place the total number of bytes as the first field.
> Alex
>
> --- On Tue, 1/12/10, Galder Zamarreno <galder at redhat.com> wrote:
>
>> From: Galder Zamarreno <galder at redhat.com>
>> Subject: Re: [infinispan-dev] Enconding of data WAS Re: Hot Rod - pt3
>> To: infinispan-dev at lists.jboss.org
>> Date: Tuesday, January 12, 2010, 10:48 AM
>>
>>
>> On 01/12/2010 05:46 PM, Galder Zamarreno wrote:
>> > And another question:
>> >
>> > With regards to individual fields in Map/Array,
>> according to your wiki,
>> > each field then only contains the size of the field
>> and the field. So,
>> > in a map of booleans, each field would be represented
>> as: 0x01 and
>> > 0x01/0x00 with 1st being the size and 2nd being the
>> actual value. Correct?
>>
>> Well, this would be more like an array actually.
>>
>> By the way, in a Map, how do you send key/value pairs? I
>> suppose
>> [key][value][key][value]? And how do you provide type of
>> key vs type of
>> value?
>>
>> >
>> > On 01/12/2010 04:46 PM, Galder Zamarreno wrote:
>> >> Hi Alex,
>> >>
>> >> Wrt the discussion below about encoding data, I
>> wanted to know what
>> >> exactly COMPRESSED type meant in your wiki, or
>> what exactly it represents.
>> >>
>> >> Also, I wanted to know what bits you send when the
>> object sent accross
>> >> is an instance of java.lang.Long. Apart from
>> marking LONG, do you also
>> >> marked as SERIALIZED? Or do you just use
>> SERIALIZED for user-specific
>> >> classes?
>> >>
>> >> Cheers,
>> >>
>> >> On 01/05/2010 10:52 AM, Galder Zamarreno wrote:
>> >>>
>> >>>
>> >>> On 01/04/2010 10:44 PM, Alex Kluge wrote:
>> >>>>>> </snip>
>> >>>
>> >>>>
>> >>>>>> - What happens
>> if the key or the value is not text? I have a way of
>> >>>>>> representing the data to allow for a wide variety of data
>> types,
>> >>>>>> even
>> allowing for arrays or maps. This will make the protocol
>> more
>> >>>>>> complex, but the assumption that the data is a string is
>> rather
>> >>>>>> limiting. This is already sketched out in the wiki.
>> >>>
>> >>> Hmmmmm, I don't think I've made any
>> assumptions in the wiki that keys or
>> >>> values are Strings unless I've made a mistake
>> somewhere (maybe in the
>> >>> example where I've used a particular encoding
>> for Strings?). My thoughts
>> >>> around this was that I was gonna treat them
>> both as byte[]...
>> >>>
>> >>>>>
>> >>>>> </snip>
>> >>>>
>> >>>> The idea is to prefix
>> each data block with a data type. This is a
>> >>>> lightweight binary
>> protocol, so a full fledged mime type would be
>> >>>> overkill. There is a
>> discussion and example in the Encoding Data
>> >>>> section of this page:
>> >>>>
>> >>>> http://community.jboss.org/wiki/RemoteCacheInteractions
>> >>>>
>> >>>> Data types are limited
>> to things like integer, byte, string, boolean, etc.
>> >>>> Or, if it isn't a
>> recognised type, the native platform serialisation is
>> >>>> used. There can be
>> arrays of these types, or maps as well.
>> >>>>
>> >>>> Each data type is
>> represented by a bit, and they can be used in
>> >>>> combinations. An array
>> of bytes would have the array, byte and
>> >>>> primitive bits set.
>> The set of recognised data types can of course be
>> >>>> expanded.
>> >>>
>> >>> ... but that looks rather useful (at least
>> based on the cached version
>> >>> of that wiki that Google shows ;) - no wiki
>> access right now :( ),
>> >>> particularly the way you can combine different
>> bytes to create composed
>> >>> types. I don't see a problem with including
>> this in Hot Rod.
>> >>>
>> >>> I suppose you included type length into length
>> field so that in the
>> >>> future you can support other types and
>> possibly longer type fields?
>> >>>
>> >
>>
>> --
>> Galder Zamarreño
>> Sr. Software Engineer
>> Infinispan, JBoss Cache
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache
More information about the infinispan-dev
mailing list