[infinispan-dev] data interoperability and remote querying

Randall Hauch rhauch at redhat.com
Wed Apr 10 16:42:00 EDT 2013


Although I think generally the indexing functionality should be transparent to clients, ModeShape does need more control over how the indexable information is extracted from the cached values.

Therefore, it would be great if there were a way for clients to specify the actual "metadata" representation (perhaps another POJO) that could be processed as discussed earlier.

The simple reason why ModeShape needs something like this is that the value objects that ModeShape puts into the Infinispan cache are DeltaAware objects that each wrap a single a JSON/BSON document, and there's no POJO with annotations that Hibernate Search can directly understand. Also, the fields within the JSON/BSON documents contain namespaced values, and ModeShape's namespace registry can change at any time, so any "bridge" object created by Infinispan would need a reference to the ModeShape repository instance.



On Apr 10, 2013, at 2:57 PM, Sanne Grinovero <sanne at infinispan.org> wrote:

> Weird, when I wrote my previous reply there where no other answers and
> the rest of the thread appeared to me just now.
> 
> Good to see that Emmanuel had replied highlighting the same problems..
> we can continue from there on this topic,
> just read mine to understand that there are a lot of options that need
> to be defined for each field: specifying it's a "varchar" is not
> enough.
> 
> some more thoughts inline:
> 
> On 10 April 2013 19:46, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>> On Wed 2013-04-10 18:55, Manik Surtani wrote:
>>> 
>>> On 10 Apr 2013, at 18:18, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
>>> 
>>>> I favor the first options for a few reasons:
>>>> 
>>>> - much easier client side implementations
>>>> Frankly rewriting the analyzer logic of Lucene in every languages is
>>>> not a piece of cake and you are out of luck for custom analysers
>>> 
>>> I'm not suggesting all the analyser logic.  Just the extraction of indexed fields into name/value pairs, to be sent alongside the blob value.
>> 
>> Which means you make a selection already and possibly already reduce
>> your precision for a given field. Which makes reindexing impossible.
> 
> +1
> It also adds larger payloads, and complexity and overhead to the
> clients, while the user might not be able to scale the client compute
> capability as it can with the data grid.
> 
>> 
>>> 
>>>> - more robust client implementation: if we change how indexing is done
>>>> clients don't have to change
>>>> - reindexing: if there is a need to rebuild the index, or if the user
>>>> decides to reindex data differently, you must be able to read the data
>>>> on the server side
>>>> - validation: if you want to implement (cross entry) validation, the
>>>> server needs to be able to read the data.
>>>> - async, validation and indexing can be done in an async way on the
>>>> server and avoid perceived latency from a client requiest to the
>>>> result
>>> 
>>> Valid points above though.
>>> 
>>>> I'm not sure JSON should be the format though. As you said it's quite
>>>> verbose and string is not exactly the most efficient way to process
>>>> data.
>>> 
>>> What would that format be, then?
>> 
>> Good question :) BSON is not necessarily smaller than JSON, it is meant
>> to be more parseable afair. I did use Avro in Hibernate Search as I find
>> ProtBuffer and the others too rigid for my needs to pass arbitrary
>> datasets. But if we have a schema and expect a given object type, then
>> we can start saving space a lot.
>> On other words, no idea that needs to be investigated.
> 
> Right, let's keep this to collecting requirements:
> - being able to upgrade the server without losing data
> - being able to change the (soft) schema on the server
> - read/write fields from different languages
> - deal with multi-version control of values (i.e. being able to read
> an older value through an evoluted schema, doing comparisons of same
> value even if it was stored using different schema generations)
> 
> Sanne
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev




More information about the infinispan-dev mailing list