See below:
On Jul 22, 2010, at 2:46 AM, Mircea Markus wrote:
On 21 Jul 2010, at 12:06, Galder Zamarreño wrote:
> Hi all,
>
> Re:
https://jira.jboss.org/browse/ISPN-508. This is a bit of a long one, so take your
time :)
>
> Over past few days I've been trying to come up with a Infinispan Marshaller
implementation based on one of those portable serialization libraries (i.e. protobuf,
thrift...etc) and I've reached the conclusion that it is possible to do so but
requires a fair bit of work on our side and performance/usability would decrease. Before
explaining my findings, let me explain the model that I'm trying to achieve:
>
> I want to build a generic marshalling/unmarshalling implementation that's based
on language neutral serialization libraries and which does not require the unmarshalling
code to have knowledge of the target type. Think about Java Serialization or JBoss
Marshalling where the code written can be generic enough w/o needing to know what
we're trying to deserialize. The payload in these cases has enough information for the
underlying library to figure out the type, instantiate it and populate it accordingly.
>
> The problem is that there's no such library in the space of portable
serialization libraries that provides this out of the box and let me explain why:
>
> Pretty much any code using those libraries must use class type or schema name to
deserialize the payload. However, Infinispan based Marshaller does not have such
information when it's trying to deserialize the payload. Infinispan Marshaller based
on JDK serialization or JBoss Marshalling uses information in the payload to instantiate
the object. However, none of the libraries out there use this mechanism and instead some
(i.e. Protobufs) force the client code to do things like: Pojo.parseFrom(byte[]) to
generate instances of Pojo. In these cases for example, nothing stops you from writing a
UTF-8 string with the class name and put it at the beginning of the payload so the
deserialization part can be class agnostic, but this payload would not be portable. What
would a python client do with a String containing the java class name?
I only know Protbuf from literature, so I might be wrong with this.
As an example, let's say you want keep in the grid a Person object that can be read
from both a java and c++ client through hotrod.
The way I see it is: you first describe it in Person.proto from which you generate the
Proto.java and Proto.cpp using protoc.
Now for each existing client you have to have "ProtoBufMarshaller" that
serialize a Person object as follows:
- first writes the .proto name of that object (in this case "Person")
- the writes the Person using protobufs (language neutral)
When reading:
- it reads "Person"
- based on a configuration/mapping it knows that "Person" maps to a Person
object
I reckon the serialisation of metadata in a language neutral way is the most difficult
thing - unless you can define metadata itself in a .proto file?
This is pretty much what I've done in
https://jira.jboss.org/secure/attachment/12335620/protobuf-sandbox2.zip with
DynamicMarshallingProtobuf and what I explained below. I write the proto name of the
descriptor and then use it on the reading part to map it to java class.
However, see the disadvantages of doing this. Requires extra compilation strep need to
generate the file descriptor set and then some configuration for them to pass it to us. I
haven't seen any client code to read .proto files directly and the idea of
FileDescriptorSet is that all .protos involved are bundled into a binary file. On top of
that, the approach relies on using reflection to deserialize which is slower than pure
Person.parseFrom(byte[]). So, the advantages are not so clear when the user can simply
pass the byte[] to us and they can very easily transform it back to a Pojo in a single
line. All this without any extra configuration, no extra compilation, no extra bytes in
payload, and no reflection.
>
>
> Based on the FileDescriptorSet information in
http://code.google.com/apis/protocolbuffers/docs/techniques.html I was able to hack
something that might work in a portable way. Given a FileDescriptorSet generated a class
compile time, I was able to match the protobuf name of a class with its java counter part.
So, before writing the protobuf generated byte[], i prepend it with protobuf class name so
that when reading, I can take the name, get the java class name and using reflection call
parseFrom method to convert the byte[] into a pojo. Note that DynamicMessage class hinted
in the the techniques page won't work cos it cannot create instances of pojos. It can
only create generic objects with fields that are accessed in a reflection style.
>
> I also looked at what Avro offers but it does not fully fit either. They have a
reflection based serialization mechanism that doesn't require any precompilation, but
it requires some kind of type knowledge on the client code to deserialize, plus Avro
themselves recommend against it and I'm not sure how performant it'd be. Avro also
includes other marshalling mechanisms called specific (like protobufs one with precompiled
classes) and generic used to build dynamic objects on the fly. None of these two fit the
bill. The specific one is like protobufs with the disadvantage of having ugly code like
http://is.gd/dzwj3 where a static object has a strong reference to a <String,Class>
CHM, which would leak in an AS env.
>
> Thrift has the same problems as stated for Protobufs but coudn't see an
equivalent way to get find the file descriptor set. Docu is way below what Protobuf offers
and latest version which is 0.2 has issues generating classes as stated in JIRA.
>
> MessagePack has the capability to deserialize an object given a String representation
of the schema, so a solution like the protobuf one might be hackable. However, the
generated classes do not have an equals implementation (??) which is rather odd. Maybe
it's due to lack of maturity? Latest version is 0.3, so that might explain it. API
wise, MessagePack provides the API that suits best to what we'd want to do and avoids
having to use reflection to resolve the payload. However, looking around I couldn't
see similar API for the python language for example and similar to Protobuf, we'd have
to prepend the schema name key to the payload to then have the reading part lookup the
entire schema based on it.
>
> So, I can see two solutions here bearing in mind that pretty much any solution would
require precompiling some classes:
>
> - Try to build a marshaller using Protobuf or MessagePack where we enhance it to pass
a string key that permits the reading part to deserialize the payload in a generic way.
>
> - Or try to build some wikis on how to integrate Protobuf/Thrift/MessagePack with Hot
Rod client so that they can generate the byte[] with these libraries and pass it to the
corresponding Hot Rod client. For the moment, we'd do this for the Hot Rod java
client. We would add more info once other language clients are available. The reason I
said about potentially showing various libraries is cos whereas Protobuf only supports
Java/Python/C++, Thrift supports loads more languages ( C++, Java, Python, PHP, Ruby,
Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml)
>
> In terms of usability , I think the second option is a bit better because building
the marshaller requires steps for clients to generate the FileDescriptorSet and somehow
pass this to Infinispan so that the marshaller can use it. The second option has none of
this and the only added code on the client side is retrieving the byte[] and calling
Pojo.parseFrom(byte[]). Performance wise, the 2nd option would be better cos there's
no extra hacking needed to pass the String key around and so less bytes are sent around,
and no need to use reflection to resolve the payload.
>
> It's a pity that I couldn't find a tool that fully fits our use case and I
wonder whether the need to deal with different languages makes coming up with such
solution difficult. I believe reflection based instantiation is present in C++ or Python
but not sure about other languages. It's also true that our use case is very specific
where we're building a tool where we don't have control over what people will put
in the cache. Generics could give us some hints for example but it's not mandatory and
would not solve the issue entirely.
>
> Thoughts?
Coherence is using POF for same purpose[1], but I think same thing can be achieved with
ProtoBuf more easily: POF requires client to write the serialisation by hand; it also
doesn't support circular dependencies (does ProtoBufs support that?)
[
1]http://coherence.oracle.com/display/COH35UG/The+Portable+Object+Format
Yeah, I'm leaning towards something like that where we can use Protobuf to write basic
type/collections in a portable way, and anyone wanting to use custom objects to do
something along those lines. These custom objects would use a facade we'd provide to
write strings, ints...etc and they'd call that. Behind that there'll be Protobuf
or something similar that writes stuff in a portable way.
> --
> Galder Zamarreño
> Sr. Software Engineer
> Infinispan, JBoss Cache
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache