[infinispan-dev] data interoperability and remote querying

Wed Apr 10 15:57:53 EDT 2013

Weird, when I wrote my previous reply there where no other answers and
the rest of the thread appeared to me just now.

Good to see that Emmanuel had replied highlighting the same problems..
we can continue from there on this topic,
just read mine to understand that there are a lot of options that need
to be defined for each field: specifying it's a "varchar" is not
enough.

some more thoughts inline:

On 10 April 2013 19:46, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> On Wed 2013-04-10 18:55, Manik Surtani wrote:
>>
>> On 10 Apr 2013, at 18:18, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>
>> > I favor the first options for a few reasons:
>> >
>> > - much easier client side implementations
>> >  Frankly rewriting the analyzer logic of Lucene in every languages is
>> >  not a piece of cake and you are out of luck for custom analysers
>>
>> I'm not suggesting all the analyser logic.  Just the extraction of indexed fields into name/value pairs, to be sent alongside the blob value.
>
> Which means you make a selection already and possibly already reduce
> your precision for a given field. Which makes reindexing impossible.

+1
It also adds larger payloads, and complexity and overhead to the
clients, while the user might not be able to scale the client compute
capability as it can with the data grid.

>
>>
>> > - more robust client implementation: if we change how indexing is done
>> >  clients don't have to change
>> > - reindexing: if there is a need to rebuild the index, or if the user
>> >  decides to reindex data differently, you must be able to read the data
>> >  on the server side
>> > - validation: if you want to implement (cross entry) validation, the
>> >  server needs to be able to read the data.
>> > - async, validation and indexing can be done in an async way on the
>> >  server and avoid perceived latency from a client requiest to the
>> >  result
>>
>> Valid points above though.
>>
>> > I'm not sure JSON should be the format though. As you said it's quite
>> > verbose and string is not exactly the most efficient way to process
>> > data.
>>
>> What would that format be, then?
>
> Good question :) BSON is not necessarily smaller than JSON, it is meant
> to be more parseable afair. I did use Avro in Hibernate Search as I find
> ProtBuffer and the others too rigid for my needs to pass arbitrary
> datasets. But if we have a schema and expect a given object type, then
> we can start saving space a lot.
> On other words, no idea that needs to be investigated.

Right, let's keep this to collecting requirements:
 - being able to upgrade the server without losing data
 - being able to change the (soft) schema on the server
 - read/write fields from different languages
 - deal with multi-version control of values (i.e. being able to read
an older value through an evoluted schema, doing comparisons of same
value even if it was stored using different schema generations)

Sanne