Re: [hibernate-dev] HSEARCH State to transfer

Thursday, 4 August 2011

...
>> 1) Remember all operations are implicitly scoped to a single
index, so
>> for example you don't need to make a difference between
>> Optimize(classType) and OptimizeAll, they will do the same: optimize
>> the index.
> 
> That's a good point. Is that your last word though? Will we want to limit
messages passing through by regrouping several backends under one message?

 I'm 99% confident on it; do you foresee a good reason to re-shard /
 split again the index?
 [..,]
 advantage of it. You're designing an upgradeable protocol right :P ? 
I've removed Optimize

...
> So far I'm looking at MessagePack without much success.
Documentation is sparse and they don't seem to support reading the version before the
rest of the message.

 Since MessagePack is "JSon like", as far as I understood it should be
 able to always succeed in parsing, so we would need the version number
 only *after* it parsed the message to see how we can interpret it.
 Though I'm inclined to think that Proto Buffers is a better fit, if
 you're considering external libraries, as it helps with the problem of
 adding/removing data in different releases. 
I gave up on MessagePack, the doc is simply too sparse. But from my trials I don't
think you are correct. The only way to make it read a byte[] was to have a corresponding
object. You can't say store the version and then store the rest. Nor is the protocol
self documenting like JSON is. I might be wrong, if someone wants to take another look
feel free.

Protocol Buffer requires a schema and requires class generation (ie not as flexible as
JSON).

BERT might be a good candidate for what you want to achieve http://bert-rpc.org/ But I
don't think there is a Java implementation.

I went for Apache Avro which I think will do what we want (though it's not a JSON like
model).
Avro requires a schema but has an API to dynamically read and use the data (based on a
given schema), so we don't need to generate classes. Avro has some well defined rules
for a reader at version n receiving a message written by a writer at version m. For
example:

- you can add enums, as long as the messages don't use the new enum value, the reader
will be able to parse and process them for n < m (and always for n > m)
- you can add new HSEARCH operations (new element in a union), as long as the messages
don't use the new operation,  the reader will be able to parse and process them for n
< m (and always for n > m)

So we could have a soft forward compatibility.

For stronger breaks, we will need to add a version *before* the byte[] of serialized Avro
data. `<version><avro bytes>` so that we can say if we can read the schema or
not. By keeping older versions of the schema, we can read old but incompatible data and do
our best.

You can have a look at http://github.com/emmanuelbernard/hibernate-search/tree/745 esp
AvroTest (and in test/resource).
Note that I have not wired LuceneWork and Avro yet though all the structure is here.

Emmanuel

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [hibernate-dev] HSEARCH State to transfer