I think one of the driving forces is the fact that we try to serialize Lucene internas which has shown to be fragile. I'd rather see us serializing our own "model/data". The bulk of the data is the key/value pairs for the document which are basically just strings. Then we have the different indexing options. Here we can either create our own classes or even better send up-front some sort of indexing "schema" to the backend. During serialization we would then just send some schema id and the actual index data. The backend would know how to build the Lucene Document from looking up the right schema and reading the raw indexing data. I see two main benefits:
-
Serialization is more independent from Lucene classes
-
We are sending less data over the wire which can boost performance
|