Hi there
On 4 May 2010, at 20:42, Israel Lacerra wrote:
I'm studying ISPN-200 cause I thinking about resolve this issue
in my M. Sc. topic. About this, I want to make a couple of questions (and maybe they
don't make sense):
- Currently, If we have "-Dinfinispan.query.indexLocalOnly=true" the indexes
are just local, right? And if "-Dinfinispan.query.indexLocalOnly=false", the
indexes are global shared. Am I right?
Yes. Basically Lucene handles and stores the indexes. Now you could have 2 scenarios.
Scenario 1: where each node has its own private, non-shared set of indexes. Scenario 2:
there is a shared, global index, where each node writes to and updates this global index
(perhaps stored on NFS, etc). The relevant scenario depends on how you configure Lucene.
Now the switch in Infinispan controls which node(s) in the cluster actually do the
indexing whenever there is a change in data in the cluster. If you have configured Lucene
to maintain non-shared indexes, then *every* node in the cache needs to update their own
private index whenever there is a change in any entry, anywhere in the cluster.
-Dinfinispan.query.indexLocalOnly=false will force Infinispan nodes to index changes that
happen anywhere in the cluster.
If the indexes are global and shared, then there is no need for each node to update the
indexes. Only the node that initiated the change should update the indexes, and
-Dinfinispan.query.indexLocalOnly=true will force this behaviour.
- So, how ISPN-200 will work on this two possibilities?
As for ISPN-200, this is part of what we need to think about. Ideally, the only approach
that will truly scale is for each node to maintain not just shared or non-shared indexes,
but a fragment of the global index. A fragment that pertains to just the data it owns.
So, assume we have this setup with 4 nodes:
Caches: {A, B, C, D}
Keys:
K1 -> {A, B}
K2 -> {B, C}
K3 -> {C, D}
A's index would have {K1}
B's index would have {K1, K2}
C's index would have {K2, K3}
D's index would have {K3}
So if we were to write a query that matches K1, that query would be sent to every node in
the cluster and the results returned would look like:
A: {K1}
B: {K1}
C: {}
D: {}
Similarly, if we were to write a query that matches K1 and K2, that query would be sent to
every node in the cluster and the results returned would look like:
A: {K1}
B: {K1, K2}
C: {K2}
D: {}
Now the tricky part will be to efficiently collate these partial results into a proper
resultset to pass back to the user, including removing duplicates, proper ranking and
ordering, etc.
Hope this helps!
Cheers
Manik
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org