[infinispan-dev] data interoperability and remote querying

Wed Apr 10 14:46:28 EDT 2013

On Wed 2013-04-10 18:55, Manik Surtani wrote:
> 
> On 10 Apr 2013, at 18:18, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> 
> > I favor the first options for a few reasons:
> > 
> > - much easier client side implementations
> >  Frankly rewriting the analyzer logic of Lucene in every languages is
> >  not a piece of cake and you are out of luck for custom analysers
> 
> I'm not suggesting all the analyser logic.  Just the extraction of indexed fields into name/value pairs, to be sent alongside the blob value.

Which means you make a selection already and possibly already reduce
your precision for a given field. Which makes reindexing impossible.

> 
> > - more robust client implementation: if we change how indexing is done
> >  clients don't have to change
> > - reindexing: if there is a need to rebuild the index, or if the user
> >  decides to reindex data differently, you must be able to read the data
> >  on the server side
> > - validation: if you want to implement (cross entry) validation, the
> >  server needs to be able to read the data.
> > - async, validation and indexing can be done in an async way on the
> >  server and avoid perceived latency from a client requiest to the
> >  result
> 
> Valid points above though.
> 
> > I'm not sure JSON should be the format though. As you said it's quite
> > verbose and string is not exactly the most efficient way to process
> > data.
> 
> What would that format be, then?

Good question :) BSON is not necessarily smaller than JSON, it is meant
to be more parseable afair. I did use Avro in Hibernate Search as I find
ProtBuffer and the others too rigid for my needs to pass arbitrary
datasets. But if we have a schema and expect a given object type, then
we can start saving space a lot.
On other words, no idea that needs to be investigated.