|
Hardy Ferentschik Let me provide some context to the serialisation code.
We / I considered having a dedicated intermediary object model for the the indexed document rather than Lucene's but ended up deciding against it as there was no significant value and the intermediary object model is already represented by the Avro schema. If you look at Avro's (de)serialisation, it already extracts only the meaningful data from the set of LuceneWork and serialise it in something that was as agnostic as necessary.
The fact that the code needs to change to serialise the objects from Lucene 4 was the plan all along. The key is that the Avro schema is sufficiently generic that the Lucene 4 operations should be representable. There is a major / minor version number to the protocol to support backward / forward messages and express when things will break. The receiver reads the version number and uses the right deserialiser.
BTW it's more than just strings and key/value pairs unfortunately. Numeric fields, binaryFields, customisable fields etc have to be accounted for.
|