On Wed 2013-04-10 18:55, Manik Surtani wrote:
On 10 Apr 2013, at 18:18, Emmanuel Bernard <emmanuel(a)hibernate.org> wrote:
> I favor the first options for a few reasons:
>
> - much easier client side implementations
> Frankly rewriting the analyzer logic of Lucene in every languages is
> not a piece of cake and you are out of luck for custom analysers
I'm not suggesting all the analyser logic. Just the extraction of indexed fields
into name/value pairs, to be sent alongside the blob value.
Which means you make a selection already and possibly already reduce
your precision for a given field. Which makes reindexing impossible.
> - more robust client implementation: if we change how indexing is done
> clients don't have to change
> - reindexing: if there is a need to rebuild the index, or if the user
> decides to reindex data differently, you must be able to read the data
> on the server side
> - validation: if you want to implement (cross entry) validation, the
> server needs to be able to read the data.
> - async, validation and indexing can be done in an async way on the
> server and avoid perceived latency from a client requiest to the
> result
Valid points above though.
> I'm not sure JSON should be the format though. As you said it's quite
> verbose and string is not exactly the most efficient way to process
> data.
What would that format be, then?
Good question :) BSON is not necessarily smaller than JSON, it is meant
to be more parseable afair. I did use Avro in Hibernate Search as I find
ProtBuffer and the others too rigid for my needs to pass arbitrary
datasets. But if we have a schema and expect a given object type, then
we can start saving space a lot.
On other words, no idea that needs to be investigated.