Thanks Manik!
Israel Lacerra
On Wed, May 5, 2010 at 6:27 AM, Manik Surtani <manik(a)jboss.org> wrote:
Hi there
On 4 May 2010, at 20:42, Israel Lacerra wrote:
> I'm studying ISPN-200 cause I thinking about resolve this issue in my M.
Sc. topic. About this, I want to make a couple of questions (and maybe they
don't make sense):
>
> - Currently, If we have "-Dinfinispan.query.indexLocalOnly=true" the
indexes are just local, right? And if
"-Dinfinispan.query.indexLocalOnly=false", the indexes are global shared. Am
I right?
Yes. Basically Lucene handles and stores the indexes. Now you could have
2 scenarios. Scenario 1: where each node has its own private, non-shared
set of indexes. Scenario 2: there is a shared, global index, where each
node writes to and updates this global index (perhaps stored on NFS, etc).
The relevant scenario depends on how you configure Lucene.
Now the switch in Infinispan controls which node(s) in the cluster actually
do the indexing whenever there is a change in data in the cluster. If you
have configured Lucene to maintain non-shared indexes, then *every* node in
the cache needs to update their own private index whenever there is a change
in any entry, anywhere in the cluster.
-Dinfinispan.query.indexLocalOnly=false will force Infinispan nodes to
index changes that happen anywhere in the cluster.
If the indexes are global and shared, then there is no need for each node
to update the indexes. Only the node that initiated the change should
update the indexes, and -Dinfinispan.query.indexLocalOnly=true will force
this behaviour.
> - So, how ISPN-200 will work on this two possibilities?
As for ISPN-200, this is part of what we need to think about. Ideally, the
only approach that will truly scale is for each node to maintain not just
shared or non-shared indexes, but a fragment of the global index. A
fragment that pertains to just the data it owns. So, assume we have this
setup with 4 nodes:
Caches: {A, B, C, D}
Keys:
K1 -> {A, B}
K2 -> {B, C}
K3 -> {C, D}
A's index would have {K1}
B's index would have {K1, K2}
C's index would have {K2, K3}
D's index would have {K3}
So if we were to write a query that matches K1, that query would be sent to
every node in the cluster and the results returned would look like:
A: {K1}
B: {K1}
C: {}
D: {}
Similarly, if we were to write a query that matches K1 and K2, that query
would be sent to every node in the cluster and the results returned would
look like:
A: {K1}
B: {K1, K2}
C: {K2}
D: {}
Now the tricky part will be to efficiently collate these partial results
into a proper resultset to pass back to the user, including removing
duplicates, proper ranking and ordering, etc.
Hope this helps!
Cheers
Manik
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev