Hi,
I would like to summarize a discussion we had on IRC to get some more feedback and come to
a decision on how to move forward.
I am currently in the need of extending our serialization support for the distributed
Search
deployment scenarios. Basically we are serializing our different LuceneWork instances from
slave
to master in this case. This includes things like Lucene's Document instances, which
are part
of add/update operations. Historically, this needs arose with Lucene dropping all
serialization
support for their classes, so we were forced to implement our custom serialization. To do
so we
defined an SPI (org.hibernate.search.indexes.serialization.spi.*) and provided two
implementations,
one based on native Java serialization and one based on Avro [1]. The two implementations
are provided
as separate artifacts (the serialization/java and serialization/avro modules in our build)
and
theoretically it should be possible to switch between them by exchanging jar files.
I am saying theoretically, since I found out during my recent work, that the Java
serialization module
is broken at several places. In its current state it would not work (I guess we never
noticed since
the default is Avro and we do not even document the possibility to change implementation.
However, it also
shows that no one has even tried).
The question is, what do we do now? Do we want two implementations and should the Java
serialization
be fixed and then extended with the new functionality (btw, I need to serialize DocValues
now) or is
it time to drop this module, reducing the amount of code we have to maintain and making it
a bit
easier to implement new serialization requirements. With dropping the module I mean to
remove the serialization/java module leaving everything else in place. So you still can
write
your own serialization implementation, however, we provide no alternative to our preferred
choice
of Avro (which is afaik considerably faster than native Java serialization which was one
of the
driving factors of using it).
I think on IRC we already "kind of" agreed that we should drop native Java
serialization. I
just wanted to put it out once more for everyone to comment/vote.
--Hardy
[1]
http://avro.apache.org/