[infinispan-dev] Infinispan Query entity discovery (Was: Re: new Infinispan Query API - ISPN-194)

Fri Apr 29 15:16:16 EDT 2011

2011/4/29 Israel Lacerra <israeldl at gmail.com>:
> We want to search the object Example with example.name = "israel". In the
> moment, the Infinispan still does not know this class.
>
> If we are in scenario D):
> We search specifying the class, and we have an empty result... as we don't
> have any Example object in index.
>
>
> If we use the classes collected in QueryInterceptor:
> We search just specifying the field... we will not have any Example on
> result anyway. Maybe we have another classes with the specified field...
>
> So, as I see, by default the infinispan could use the classes collected in
> QueryInterceptor. And the user could filter by class too...
>
>
> I know that I am probably missing some point... but what point!? :)

no problem, I'll take the opportunity to refresh the problem to everyone:

let's say you restart a node, or you just add an existing node to a
running cluster, and you are sharing the existing index.
Or - very similar - after some maintenance activity, you restart all
nodes, as you had nice CacheLoaders configured and all grid values and
indexes are safely backed up elsewhere.

Now both the index and the grid will contain instances of Example
entities, but the QueryInterceptor doesn't know about them yet.

So know if you run:
searchManager.getQuery( query, Example.class ).list();

you'll find the example instances as expected,
but if you run instead
searchManager.getQuery( query ).list()
you will not.

Or as Emmanuel pointed out,
searchManager.getQuery(query, SuperClassOfExample.class ).list()
will not return them either, while this should be the case.

of course you don't have this issue if you happen to do any query
targeting Example before, or even if you put a single instance of
Example in the cache as the QueryInterceptor will notice, pick it up
and reconfigure itself.

So currently the proper way to use it is to "warmup" a newly created
SearchManager by explicitly running a query targeting all your indexed
classes. A warmup query is usually recommended anyway for performance
reasons of Lucene, but I still consider this a bit unpolished and am
unhappy to have it unsolved at candidate release phase.

Sanne

>
>
> Israel
>
> On Fri, Apr 29, 2011 at 3:46 PM, Israel Lacerra <israeldl at gmail.com> wrote:
>>
>> Now I'm a little confused about why we can't use the classes colected on
>> QueryInterceptor...
>>
>> > Infinispan Query as it might not have discovered all types yet (1)
>>
>> Probably I am missing some point, but to me it is not a problem.
>>
>> On Fri, Apr 29, 2011 at 3:36 PM, Sanne Grinovero
>> <sanne.grinovero at gmail.com> wrote:
>>>
>>> 2011/4/29 Israel Lacerra <israeldl at gmail.com>:
>>> > What about use D) and also give a way to user specify the default
>>> > classes to
>>> > all queries?
>>>
>>> Yes that's the idea; but we need to figure out how the user specifies
>>> the default classes; so far nobody liked any proposal.
>>>
>>> Sanne
>>>
>>> >
>>> > On Wed, Apr 27, 2011 at 5:10 AM, Emmanuel Bernard
>>> > <emmanuel at hibernate.org>
>>> > wrote:
>>> >>
>>> >> On 27 avr. 2011, at 08:57, Sanne Grinovero wrote:
>>> >>
>>> >> > 2011/4/27 Emmanuel Bernard <emmanuel at hibernate.org>:
>>> >> >> Users can put indexed or nit indexed superclasses in the query
>>> >> >> target
>>> >> >> type. That would not work for you as you can't discover known
>>> >> >> subtypes wo
>>> >> >> scanning or having a closure of types somewhere.
>>> >> >
>>> >> > sure they can with Hibernate Search. but should they be able with
>>> >> > Infinispan Query?
>>> >> > If the answer is yes, then we still need to find an alternative.
>>> >>
>>> >> Well it's an OO query and thus subtype polymorphism should apply.
>>> >>
>>> >> >
>>> >> >
>>> >> >> On 26 avr. 2011, at 23:32, Sanne Grinovero
>>> >> >> <sanne.grinovero at gmail.com>
>>> >> >> wrote:
>>> >> >>
>>> >> >>> Hello,
>>> >> >>> I'm forking off this thread, as we never resolved how to cope with
>>> >> >>> the
>>> >> >>> main issue:
>>> >> >>>
>>> >> >>> how is Infinispan Query going to be aware of which entities are to
>>> >> >>> be
>>> >> >>> considered as default targets for a Query?
>>> >> >>>
>>> >> >>> the realistic ideas so far:
>>> >> >>> A) class scanning: seems nobody liked this idea, but I'll still
>>> >> >>> mention it as the other options aren't looking great either.
>>> >> >>> B) scan known indexes (need to define what the "known indexes" are
>>> >> >>> as
>>> >> >>> we usually infer that from the classes)
>>> >> >>>   -- could enforce a single index
>>> >> >>> C) have to list all fully qualified class names in the
>>> >> >>> configuration
>>> >> >>> D) don't care: consider it a good practice to specify all targeted
>>> >> >>> types when performing a Query.
>>> >> >>> E) please suggest :)
>>> >> >>>
>>> >> >>> The currently implemented solution is D, as it requires no coding
>>> >> >>> at
>>> >> >>> all :)
>>> >> >>> considering the simplicity of it I'm liking it more the more I
>>> >> >>> think
>>> >> >>> about it; I could even polish the approach by adding a single line
>>> >> >>> to
>>> >> >>> log a warning when the user doesn't respect the best practice, or
>>> >> >>> to
>>> >> >>> mandate it.
>>> >> >>>
>>> >> >>> Considering that when a Query is invoked specifying the target
>>> >> >>> types
>>> >> >>> there is no doubt we know the classes, I could add a warning in
>>> >> >>> case
>>> >> >>> the Query is performed without specifying the type: in that case
>>> >> >>> it
>>> >> >>> usually implies the query targets all known types, which is always
>>> >> >>> fine when using Hibernate Search, but could be inconsistent with
>>> >> >>> Infinispan Query as it might not have discovered all types yet
>>> >> >>> (1), so
>>> >> >>> a very simple solution is to mandate the type parameter.
>>> >> >>>
>>> >> >>> [1] - when the Cache interceptor hits an event adding a new type,
>>> >> >>> the
>>> >> >>> Search engine is reconfigured.
>>> >> >>>
>>> >> >>> thoughts?
>>> >> >>>
>>> >> >>> Sanne
>>> >> >>>
>>> >> >>>
>>> >> >>> 2011/4/5 Emmanuel Bernard <emmanuel at hibernate.org>:
>>> >> >>>>
>>> >> >>>> On 5 avr. 2011, at 13:38, Sanne Grinovero wrote:
>>> >> >>>>
>>> >> >>>>> 2011/4/5 Emmanuel Bernard <emmanuel at hibernate.org>:
>>> >> >>>>>>
>>> >> >>>>>> On 5 avr. 2011, at 12:20, Galder Zamarreño wrote:
>>> >> >>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>> On Apr 4, 2011, at 6:23 PM, Sanne Grinovero wrote:
>>> >> >>>>>>>
>>> >> >>>>>>>> </snip>
>>> >> >>>>>>>>
>>> >> >>>>>>>> there's one catch:
>>> >> >>>>>>>> when searching for a class type, it will only include results
>>> >> >>>>>>>> from
>>> >> >>>>>>>> known subtypes. The targeted type is automatically added to
>>> >> >>>>>>>> the
>>> >> >>>>>>>> known
>>> >> >>>>>>>> classes, but eventually existing subtypes are not discovered.
>>> >> >>>>>>>>
>>> >> >>>>>>>> Bringing this issue to an extreme, if the query is not
>>> >> >>>>>>>> targeting
>>> >> >>>>>>>> any
>>> >> >>>>>>>> type, and no indexed types where added to the grid (even if
>>> >> >>>>>>>> some
>>> >> >>>>>>>> exist
>>> >> >>>>>>>> already as they might have been inserted by other JVMs or
>>> >> >>>>>>>> previous
>>> >> >>>>>>>> runs), all queries will return no results.
>>> >> >>>>>>>> How to solve this?
>>> >> >>>>>>>> - class scanning?
>>> >> >>>>>>>
>>> >> >>>>>>> Nope, too expensive.
>>> >> >>>>>>>
>>> >> >>>>>>>> - explicitly list indexed entities in Infinispan
>>> >> >>>>>>>> configuration?
>>> >> >>>>>>>
>>> >> >>>>>>> No
>>> >> >>>>>>>
>>> >> >>>>>>>> - a metadata cache maintaining a distributed&stored copy of
>>> >> >>>>>>>> known
>>> >> >>>>>>>> types
>>> >> >>>>>>>
>>> >> >>>>>>> That sounds more appealing. It could be a good middle ground
>>> >> >>>>>>> until
>>> >> >>>>>>> Search can search for types.
>>> >> >>>>>>
>>> >> >>>>>> Do you have any specific idea in mind?
>>> >> >>>>>>
>>> >> >>>>>> To magically find types:
>>> >> >>>>>>  - we scan every file system, databases, caches available to
>>> >> >>>>>> the
>>> >> >>>>>> app and look for Lucene metadata => unrealistic
>>> >> >>>>>>  - there is some kind of convention on where the indexes are
>>> >> >>>>>> and we
>>> >> >>>>>> do index scanning at startup => scanning are very likely to be
>>> >> >>>>>> slower that
>>> >> >>>>>> class scanning (potential remote access, bigger dataset etc)
>>> >> >>>>>>  - we enforce one or a fixed number of Lucene indexes for all
>>> >> >>>>>> data
>>> >> >>>>>> in Infinispan => not sure that's a good idea but this can be
>>> >> >>>>>> explored
>>> >> >>>>>>  - we somehow ask the framework using HSearch to fill up
>>> >> >>>>>> classes
>>> >> >>>>>>
>>> >> >>>>>> other approaches?
>>> >> >>>>>
>>> >> >>>>> why was class scanning discarded in the first answer? as H.
>>> >> >>>>> Search
>>> >> >>>>> can
>>> >> >>>>> auto-discover classes by working on top of JPA entity
>>> >> >>>>> autodiscovery,
>>> >> >>>>> I
>>> >> >>>>> guess that each application node could look into it's own known
>>> >> >>>>> classpath.
>>> >> >>>>> After all if some type is not visible to him as it was added
>>> >> >>>>> from
>>> >> >>>>> another node from a different app, he won't be able to return
>>> >> >>>>> instances of it either.
>>> >> >>>>> We could face the opposite problem of building metadata of
>>> >> >>>>> classes
>>> >> >>>>> people doesn't mean to index in this cache.
>>> >> >>>>
>>> >> >>>> Right. scanning (class or index) will be a bit aggressive and
>>> >> >>>> could
>>> >> >>>> build unneeded metadata (or even worse, return unexpected
>>> >> >>>> classes).
>>> >> >>
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> infinispan-dev mailing list
>>> >> infinispan-dev at lists.jboss.org
>>> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> >
>>> >
>>
>
>