[infinispan-dev] Design change in Infinispan Query

Tue Feb 18 08:27:03 EST 2014

On 18 February 2014 13:01, Emmanuel Bernard <emmanuel at hibernate.org> wrote:
> On Tue 2014-02-18 14:02, Adrian Nistor wrote:
>> Well, OGM and Infinispan are different species :) So, Infinispan being what
>> it is today - a non-homogenous, schema-less KV store, without support for
>> entity associations (except embedding) - which simplifies the whole thing a
>> lot, should we or should we not provide transparent cross-cacheManager
>> search capabilities, in this exact context? Vote?
>
> Yes it makes sense to do queries like
>
>     where name or title = "foo" AND description or content contains "bar"
>
> over a heterogeneous set (say books and DVDs)

Right

>
> But if you had in mind to do joins between different entries in the
> cache, then this would require some cross-cache map reduce and be
> inefficient so that's not a good use case.

+1

>
>>
>> There were some points raised previously like /"if you search for more than
>> one cache transparently, then you probably need to CRUD for more than one
>> cache transparently as well"/. In the SQL world you would also probably CRUD
>> against a table or set of tables and then query against a view - a bit like
>> what we're doing here. I don't see any problem with this in principle. There
>> is however something currently missing in the query result set API - it
>> currently does not provide you the keys of the matching entities. People
>
> Really? I think we have the info in the index at least when the
> "ProvidedId" and the keys are the same.

We have this info in the engine, but the results to the user don't
usually include the keys.
For some this is a bit unnatural: a different perspective would be to
return _only_ the keys and avoid doing the lookup.

We provide a "LazyIterator" on the results which fetches only each
matching entry on demand, which I think covers a good deal of use
cases but there might be other usages for these keys.

I would be great if we had Lambda support to allow users to say what
they want us to do with the resultset, rather than fetching it.

>
>> work around this by storing the key in the entity.  Now with the addition of
>> the cross-cacheManager search we'll probably need to fix the result api  and
>> also provide a reference to the cache (or just the name?) where the entity
>> is stored.
>
> Right, I'm not sure Sanne agrees with me yet but you need to store the
> cache name in the index. Hibernate Search can reason at query time to
> see if it can avoid using this term to speed things up (massively). That
> will depend whether or no indexes are shared between caches.

I do agree that this would be required, but I'm sad on the
implications this has.
To allow those not familiar with Lucene to understand the
consequences: deleting a single entry from the index by using a single
term - like the key could be - is many orders of magnitude more
efficient than deleting from an index by "composite keys", like it
would be if we need to delete by tuples { cachename, typename, id }.

Considering that in Infinispan I can never be sure if a key already
existed or not (which is a fundamental difference when comparing to
Search/ORM), ANY WRITE on Infinispan triggers a delete operation
first.
Not least, such a delete requires an index flush, while we normally
just flush at the end of the batch (transaction).

In other words if we could avoid needing to discriminate an index
entry by Cache Name, each and every operation would be many orders of
magniture more efficient.

To be noted that even today we aren't achieving this higher efficiency
mode because we're using the tuple { typename, id}, but that's a
legacy mapping related to how Search could handle multi-table
structures and I was planning to finally enable this very interesting
optimization in the next few weeks in the scope of Search5.

I do agree that supporting Queries on multiple Caches (cross-cache but
no joins) makes sense, but if only we could figure out a way to move
away from "dynamically defined indexed types" we could apply many of
these optimizations transparently, when we know there is no risk of
key ambiguity.

We've been through a lot of trouble just to allow the user to not
register his indexed types upfront, but I don't think it's worth it.
After all, the user still has to annotate or provide a schema: listing
the types would be the lesser pain.

- Sanne

>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev