[infinispan-dev] Design change in Infinispan Query

Tue Apr 10 13:10:04 EDT 2012

Hello all,
currently Infinispan Query is an interceptor registering on the
specific Cache instance which has indexing enabled; one such
interceptor is doing all what it needs to do in the sole scope of the
cache it was registered in.

If you enable indexing - for example - on 3 different caches, there
will be 3 different Hibernate Search engines started in background,
and they are all unaware of each other.

After some design discussions with Ales for CapeDwarf, but also
calling attention on something that bothered me since some time, I'd
evaluate the option to have a single Hibernate Search Engine
registered in the CacheManager, and have it shared across indexed
caches.

Current design limitations:

  A- If they are all configured to use the same base directory to
store indexes, and happen to have same-named indexes, they'll share
the index without being aware of each other. This is going to break
unless the user configures some tricky parameters, and even so
performance won't be great: instances will lock each other out, or at
best write in alternate turns.
  B- The search engine isn't particularly "heavy", still it would be
nice to share some components and internal services.
  C- Configuration details which need some care - like injecting a
JGroups channel for clustering - needs to be done right isolating each
instance (so large parts of configuration would be quite similar but
not totally equal)
  D- Incoming messages into a JGroups Receiver need to be routed not
only among indexes, but also among Engine instances. This prevents
Query to reuse code from Hibernate Search.

Problems with a unified Hibernate Search Engine:

   1#- Isolation of types / indexes. If the same indexed class is
stored in different (indexed) caches, they'll share the same index. Is
it a problem? I'm tempted to consider this a good thing, but wonder if
it would surprise some users. Would you expect that?
   2#- configuration format overhaul: indexing options won't be set on
the cache section but in the global section. I'm looking forward to
use the schema extensions anyway to provide a better configuration
experience than the current <properties />.
   3#- Assuming 1# is fine, when a search hit is found I'd need to be
able to figure out from which cache the value should be loaded.
      3#A  we could have the cache name encoded in the index, as part
of the identifier: {PK,cacheName}
      3#B  we actually shard the index, keeping a physically separate
index per cache. This would mean searching on the joint index view but
extracting hits from specific indexes to keep track of "which index"..
I think we can do that but it's definitely tricky.

It's likely easier to keep indexed values from different caches in
different indexes. that would mean to reject #1 and mess with the user
defined index name, to add for example the cache name to the user
defined string.

Any comment?

Cheers,
Sanne