Re: [infinispan-dev] [hibernate-dev] Distributed queries

Thursday, 24 September 2009

Hmm, this is what I have in mind (sorry if some of it is repetitive  
and has already been discussed).

Phase 1 (Basic index handling)

Indexes may either be shared (stored on NFS or shared disk), or shared  
via Infinispan (see comments re: Lukasz's InfinispanDirectory,  
somewhat based o my earlier JBoss Cache Directory), or local (local  
disk, RAMDirectory).

* Shared
    * Indexes need only be generated by the node where changes occur.   
For this, you use the -Dinfinispan.query.indexLocalOnly=true param. [1]
    * Queries can be executed on any node, and correct results will be  
returned since indexes are shared.
    * Will work with any cache mode.

* Non-shared
    * Indexes need to be generated by everyone.  For this, you use the  
-Dinfinispan.query.indexLocalOnly=false.
    * Queries can be executed on any node.
    * But only works with REPLICATED cache mode, so all nodes see  
changes and can build/update indexes.

Phase 2 (Distributed querying)

This is dependent on some other work in Infinispan [2] to provide a  
mechanism to distribute workloads.  This is the 'holy grail' I want to  
reach.  Essentially, this is what it entails:

* Indexes are _never_ shared.
    * Each node maintains local indexes for state it is responsible  
for (-Dinfinispan.query.indexLocalOnly=true).
    * Indexes could be in memory or disk.
* Queries themselves are distributed.
    * The query object is built and broadcast to the entire cluster.
    * Each node executes the query on its own _local_ index, returning  
results.
    * The calling node returns a CacheQuery impl that lazily fetches  
and collates results from the cluster.
    * I expect this Map/Reduce model to perform very well since the  
workload is split up and happens in parallel across multiple CPUs  
against much smaller (individual) datasets.
    * Works with all cache modes, including DIST.
    * Need to make sure duplicates are handled, as well as failover.

TODO
====

We already have Phase 1 in tech preview.  For Phase 1 to come out of  
tech preview, we need:
    * A few more JIRAs to be resolved (see [3] for what is outstanding)
    * Detailed docs (perhaps explaining the options available, as I  
did in the bullet points under "Phase 1" above, maybe with some pretty  
pictures and diagrams)
    * Lukasz's work to be rolled into a proper release

And for Phase 2, we need [2] in place, and the issue with dupes/ 
failover for indexes to be resolved.

If folks want to help with any of the above, please raise your hand -  
this is pretty exciting stuff!  ;)

Cheers
Manik

[1] This param will be replaced with a proper config option in  
Infinispan 4.1.0, once the Query API is out of 'tech preview'.
[2] https://jira.jboss.org/jira/browse/ISPN-39
[3] https://jira.jboss.org/jira/browse/ISPN-194

On 22 Sep 2009, at 10:27, Michael Neale wrote:

...
 well as part of a separate project to do with "cloud" stuff
we will
 have appliances/images (or will have) which can help setting up
 reasonably large clusters for testing (and if we can do the testing
 within an hour, should only cost a couple of dollars at a time for
 100's of nodes).

 On Tue, Sep 22, 2009 at 7:23 PM, Navin Surtani <nsurtani(a)redhat.com&gt;  
 wrote:
>
> On 22 Sep 2009, at 03:00, Michael Neale wrote:
>
>> I guess that could make sense for some cases - if the data changes  
>> are
>> small-ish, and the index calculation cost isn't huge... I guess if  
>> the
>> objects you are parking in infinispan are large, then it could end  
>> up
>> more efficient to only do the index once and then spread it around
>> (sticky to the data that it represents).
>
>
> I believe the plan is to build in a few different configs that would
> work for different use-cases. For example, a lot of "small" objects
> but not necessarily many nodes so they all share the same index or a
> lot of "big" objects sitting on disk on each individual node (where
> replication could be expensive).
>
> Or this is what I understood when speaking with Manik a couple of
> weeks ago that is :-).
>
>
>
>>
>> On Mon, Sep 21, 2009 at 11:09 PM, Ray Hilton <ray(a)wirestorm.net&gt;
>> wrote:
>>> Im guessing that something similar already happens so that  
>>> infinispan
>>> can re-jigg the data around the grid.  Forgive my lack of intimate
>>> knowledge of how infinispan works here, but at some point the data
>>> that was hosted by a bad node needs to be re-distributed?
>>>
>>> On Mon, Sep 21, 2009 at 11:02 PM, Emmanuel Bernard
>>> <emmanuel(a)hibernate.org&gt; wrote:
>>>> could be possible. That would likely be chatty though each time a
>>>> node
>>>> comes or go.
>>>> Typically when a node goes down potentially due to network error,
>>>> you
>>>> don't wanna be chatty I imagine ;)
>>>>
>>>> On 21 sept. 09, at 14:59, Ray Hilton wrote:
>>>>
>>>>> Yes, point taken.
>>>>>
>>>>> Is there perhaps a way to only index an object on one node.  For
>>>>> example, if each node new there were currently 3 copies, and it  
>>>>> was
>>>>> the node with the lowest id, for example, it would index the
>>>>> document.
>>>>> When a new node joins or a node fails, the strategy is re-applied
>>>>> and
>>>>> the node-local indices are updated accordingly.
>>>>>
>>>>> On Mon, Sep 21, 2009 at 6:06 PM, Emmanuel Bernard
>>>>> <emmanuel(a)hibernate.org&gt; wrote:
>>>>>> Hello
>>>>>> See inline
>>>>>>
>>>>>> On 20 sept. 09, at 06:01, Ray Hilton wrote:
>>>>>>
>>>>>>> Hi guys,
>>>>>>>
>>>>>>> I've been following the distributed query stuff with
interest,
>>>>>>> but
>>>>>>> this is the first time I'm posting, so please excuse the
lack  
>>>>>>> of
>>>>>>> intimate knowledge of Infinispan.  Basically, I have been  
>>>>>>> working
>>>>>>> on a
>>>>>>> project that could really do with the Holy Grail of a  
>>>>>>> distributed
>>>>>>> query-able cache and I really liked the look of using
>>>>>>> JBossCache +
>>>>>>> the
>>>>>>> Lucene Directory implementation that Manik wrote a while  
>>>>>>> back.  I
>>>>>>> then
>>>>>>> noticed Infinispan and talk of building querying directly
into
>>>>>>> the
>>>>>>> project and figured that it would be worthwhile waiting to
see
>>>>>>> how
>>>>>>> that panned out.
>>>>>>>
>>>>>>> I've thought a bit about how something like this might
work,
>>>>>>> I'm not
>>>>>>> sure if this will be in any way helpful, but here goes:  I  
>>>>>>> guess
>>>>>>> there
>>>>>>> are two approaches:  1) store the index (or partitioned
>>>>>>> indices) in
>>>>>>> the grid and sync it to a node to do a particular query or
2)
>>>>>>> each
>>>>>>> node has an index for the data it currently caches.  We  
>>>>>>> preferred
>>>>>>> the
>>>>>>> second idea as it offers a natural way to partition the
indices
>>>>>>> (i.e.
>>>>>>> however infinispan is configured to do it).  The first
option
>>>>>>> would
>>>>>>> mean you end up with either a monolithic index in the grid,
or
>>>>>>> partitions based on, say, date, that have to be sync'd
en- 
>>>>>>> mass to
>>>>>>> whichever node(s) are doing a query.  I realise that the
second
>>>>>>> technique would produce duplicates, but Im sure there would
be
>>>>>>> a way
>>>>>>> to eliminate dupes based on the object's uuid (something
im
>>>>>>> pretty
>>>>>>> sure infinispan already has a notion of).
>>>>>>
>>>>>> Well 2 looks nicer but I don't know an obvious way to solve
the
>>>>>> duplication issues:
>>>>>>  - returning several times the same content does alter the
>>>>>> scoring of
>>>>>> other documents
>>>>>>  - it prevent efficient pagination as somehow you need to jump
>>>>>> several results.
>>>>>>
>>>>>>>
>>>>>>> We would also need to come up with a way or normalising the
>>>>>>> scoring
>>>>>>> across all partitions (regardless of which method is used). 
I
>>>>>>> have
>>>>>>> seen this done before, and it would basically involve, per- 
>>>>>>> query,
>>>>>>> finding out the term frequency of the various keywords across

>>>>>>> the
>>>>>>> entire index, or at least enough of it to produce a
>>>>>>> representative
>>>>>>> value.  This would be used to calculate the score for each
hit
>>>>>>> when
>>>>>>> doing the actual search, and thus the ranking.
>>>>>>
>>>>>> I believe Lucene does normalize the score properly when using  
>>>>>> the
>>>>>> remote IndexSearcher as the normalization is done on the  
>>>>>> "client"
>>>>>> side.
>>>>>>
>>>>>>>
>>>>>>> We have had issues with index corruption in the past as well
>>>>>>> (probably
>>>>>>> due to programming bugs rather than lucene).  Making each
node
>>>>>>> responsible for its own index will make it very easy to
throw
>>>>>>> corrupt
>>>>>>> indices away and re-generate new ones.
>>>>>>>
>>>>>>> I did take a look at the visitor stuff in Infinispan before,
>>>>>>> but I
>>>>>>> wasn't really sure where the best place to hook into
would be  
>>>>>>> to
>>>>>>> find
>>>>>>> out which objects are being stored locally or evicted.  If
>>>>>>> someone
>>>>>>> has
>>>>>>> a good idea of where to start, I'd be happy to lend a
hand to  
>>>>>>> to
>>>>>>> this
>>>>>>> effort!
>>>>>>>
>>>>>>> Ray
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Sep 19, 2009 at 8:43 PM, Michael Neale
<michael.neale(a)gmail.com
>>>>>>>> wrote:
>>>>>>>> I think you just stuck a pin in the bubble that normally
says
>>>>>>>> "magic
>>>>>>>> happens here" ;)
>>>>>>>>
>>>>>>>> How much of this did you tackle regarding hibernate
search  
>>>>>>>> that
>>>>>>>> could
>>>>>>>> be applied here?
>>>>>>>>
>>>>>>>> (you final point re duplication may have some
"flexibility" I
>>>>>>>> think ?)
>>>>>>>>
>>>>>>>> On Fri, Sep 18, 2009 at 6:18 PM, Emmanuel Bernard
>>>>>>>> <emmanuel(a)hibernate.org&gt; wrote:
>>>>>>>>> Neither 1 nor 2 imply *distributed* queries.
>>>>>>>>>
>>>>>>>>> The hard parts with distributed queries (ie executed
on a
>>>>>>>>> grid and
>>>>>>>>> recomposed) are:
>>>>>>>>>  - making sure you ask all the nodes where the index
is
>>>>>>>>> distributed
>>>>>>>>> (you can't miss a node)
>>>>>>>>>  - find a way to index only a subset of the data in a
given
>>>>>>>>> index
>>>>>>>>> (on
>>>>>>>>> a given node). Applying the Infinispan distribution
routine
>>>>>>>>> to the
>>>>>>>>> InfinispanDirectory does not do that, it chunks data
>>>>>>>>> arbitrarily.
>>>>>>>>>  - be able to rebuild a given index on a givne node
(ie
>>>>>>>>> remember
>>>>>>>>> which element were indexed)
>>>>>>>>>  - you need to find a way to distribute your data
without
>>>>>>>>> duplication. If a key is indexed multiple times, then
you end
>>>>>>>>> up
>>>>>>>>> with
>>>>>>>>> duplicated results that can't trivially be
de-duplicated.
>>>>>>>>>
>>>>>>>>> Happy thinking.
>>>>>>>>>
>>>>>>>>> On 17 sept. 09, at 10:32, Sanne Grinovero wrote:
>>>>>>>>>
>>>>>>>>>> 2009/9/17 Michael Neale
<michael.neale(a)gmail.com&gt;:
>>>>>>>>>>> I am still not entirely sure what I am
asking, but look
>>>>>>>>>>> forward
>>>>>>>>>>> for
>>>>>>>>>>> your merged in changes (they are in another
branch right  
>>>>>>>>>>> now
>>>>>>>>>>> yes?).
>>>>>>>>>>>
>>>>>>>>>>> Yes I mean querying objects - I was under the
impression  
>>>>>>>>>>> that
>>>>>>>>>>> lucene
>>>>>>>>>>> was used for the indexing of the data to
service these
>>>>>>>>>>> queries?
>>>>>>>>>>
>>>>>>>>>> Sure, to clarify: there's work going on on
two different
>>>>>>>>>> aspects,
>>>>>>>>>> which
>>>>>>>>>> complement each other in the ideal setup:
>>>>>>>>>>
>>>>>>>>>> 1) Be able to query a Lucene index (wherever you
store that)
>>>>>>>>>> to
>>>>>>>>>> find
>>>>>>>>>> objects
>>>>>>>>>> which are located inside Infinispan; this is
about how to
>>>>>>>>>> search
>>>>>>>>>> them and how
>>>>>>>>>> to maintain the index in synch with
Infinispan's content.
>>>>>>>>>>
>>>>>>>>>> 2) Store a Lucene index inside Infinispan,
instead of, for
>>>>>>>>>> example,
>>>>>>>>>> filesystem.
>>>>>>>>>> In this case we're not concerned about what
you index, the
>>>>>>>>>> Lucene
>>>>>>>>>> interface
>>>>>>>>>> is the usual one and you should be able to
replace the
>>>>>>>>>> Directory
>>>>>>>>>> implementation in existing applications.
>>>>>>>>>>
>>>>>>>>>> So 1) is the branch you've found, and Navin
is working on
>>>>>>>>>> that,
>>>>>>>>>> 2)
>>>>>>>>>> is not yet
>>>>>>>>>> in subversion, the latest patch is attached to
other  
>>>>>>>>>> thread by
>>>>>>>>>> Łukasz,
>>>>>>>>>> and is to be applied
>>>>>>>>>> on Hibernate Search's trunk (and depends on
Infinispan).
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Sep 16, 2009 at 10:32 PM, Navin
Surtani
>>>>>>>>>>> <nsurtani(a)redhat.com&gt; wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 16 Sep 2009, at 12:25, Michael Neale
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> oh ok nice - could you point me at
which branch to try to
>>>>>>>>>>>>> find
>>>>>>>>>>>>> some
>>>>>>>>>>>>> tests to play with?
>>>>>>>>>>>>
>>>>>>>>>>>> If you're talking about Querying
objects in Infinispan: -
>>>>>>>>>>>>
>>>>>>>>>>>> The eventual goal is to be able to have
different
>>>>>>>>>>>> configurations on
>>>>>>>>>>>> how you want to index your data. Manik
has given me the
>>>>>>>>>>>> 'OK' to
>>>>>>>>>>>> push a
>>>>>>>>>>>> simple query interface for CR1 for
Monday/Tuesday.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm kind-of pressed with getting the
code working for this
>>>>>>>>>>>> and
>>>>>>>>>>>> also
>>>>>>>>>>>> between moving house and lack of internet
there I'll be a
>>>>>>>>>>>> bit
>>>>>>>>>>>> quiet.
>>>>>>>>>>>> However, I'll get a wiki up by the
end of the week about  
>>>>>>>>>>>> how
>>>>>>>>>>>> this
>>>>>>>>>>>> all
>>>>>>>>>>>> works.
>>>>>>>>>>>>
>>>>>>>>>>>> However if you're not then I assume
you're talking about
>>>>>>>>>>>> using
>>>>>>>>>>>> Lucene
>>>>>>>>>>>> to index into Infinispan?
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Sep 16, 2009 at 6:05 PM,
Sanne Grinovero
>>>>>>>>>>>>> <sanne.grinovero(a)gmail.com&gt;
wrote:
>>>>>>>>>>>>>> 2009/9/16 Michael Neale
<michael.neale(a)gmail.com&gt;:
>>>>>>>>>>>>>>> regarding indexing and
queries - is the current aim to
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>> require
>>>>>>>>>>>>>>> that the index for the entire
data grid exist on a  
>>>>>>>>>>>>>>> single
>>>>>>>>>>>>>>> node?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (asking as a potential user
who is wrestling with  
>>>>>>>>>>>>>>> lucene
>>>>>>>>>>>>>>> indexes at
>>>>>>>>>>>>>>> the moment is curious).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes the concept is to store the
Lucene index itself in  
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> grid,
>>>>>>>>>>>>>> so
>>>>>>>>>>>>>> it will
>>>>>>>>>>>>>> be distributed, and the segments
you use most get cached
>>>>>>>>>>>>>> locally.
>>>>>>>>>>>>>> At the moment you have to select
only one node to  
>>>>>>>>>>>>>> write to
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> index,
>>>>>>>>>>>>>> but all other nodes should be
able to read.
>>>>>>>>>>>>>> Feel free to test it as we are
needing feedback.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Michael D Neale
>>>>>>>>>>>>>>> home: www.michaelneale.net
>>>>>>>>>>>>>>> blog:
michaelneale.blogspot.com
>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>> infinispan-dev mailing list
>>>>>>>>>>>>>>>
infinispan-dev(a)lists.jboss.org
>>>>>>>>>>>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>> infinispan-dev mailing list
>>>>>>>>>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>>>>>>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Michael D Neale
>>>>>>>>>>>>> home: www.michaelneale.net
>>>>>>>>>>>>> blog: michaelneale.blogspot.com
>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>> infinispan-dev mailing list
>>>>>>>>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>>>>>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>>>>>>
>>>>>>>>>>>> Navin Surtani
>>>>>>>>>>>>
>>>>>>>>>>>> Intern Infinispan
>>>>>>>>>>>> Intern JBoss Cache Searchable
>>>>>>>>>>>>
>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>> infinispan-dev mailing list
>>>>>>>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>>>>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Michael D Neale
>>>>>>>>>>> home: www.michaelneale.net
>>>>>>>>>>> blog: michaelneale.blogspot.com
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> infinispan-dev mailing list
>>>>>>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>>>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> hibernate-dev mailing list
>>>>>>>>>> hibernate-dev(a)lists.jboss.org
>>>>>>>>>>
https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> infinispan-dev mailing list
>>>>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Michael D Neale
>>>>>>>> home: www.michaelneale.net
>>>>>>>> blog: michaelneale.blogspot.com
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> infinispan-dev mailing list
>>>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ray Hilton
>>>>>>> -
>>>>>>>         email: ray(a)wirestorm.net
>>>>>>> melbourne: +61 (0) 3 9077 0513
>>>>>>>       mobile: +61 (0) 430 484 708
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> infinispan-dev mailing list
>>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev(a)lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ray Hilton
>>>>> -
>>>>>         email: ray(a)wirestorm.net
>>>>> melbourne: +61 (0) 3 9077 0513
>>>>>       mobile: +61 (0) 430 484 708
>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev(a)lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev(a)lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>>
>>>
>>> --
>>> Ray Hilton
>>> -
>>>         email: ray(a)wirestorm.net
>>>  melbourne: +61 (0) 3 9077 0513
>>>       mobile: +61 (0) 430 484 708
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev(a)lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>>
>> --
>> Michael D Neale
>> home: www.michaelneale.net
>> blog: michaelneale.blogspot.com
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> Navin Surtani
>
> Intern Infinispan
> Intern JBoss Cache Searchable
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

 -- 
 Michael D Neale
 home: www.michaelneale.net
 blog: michaelneale.blogspot.com

 _______________________________________________
 infinispan-dev mailing list
 infinispan-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev 
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] [hibernate-dev] Distributed queries