Re: [infinispan-dev] Map/Reduce or other batch processing on CacheLoader stored entries

Friday, 25 May 2012

On 25 May 2012 12:10, Manik Surtani <manik(a)jboss.org&gt; wrote:
...
 Well, the start() bit maybe less useful, but how do you know when the
processor has been fed everything it needs, to be able to clean up? 
Too much usage of separate threads is blinding you from usage of known
patterns :-)
The Processor is a Visitor, so processEntriesWith( processor ) has the
CacheLoader invoke on itself, so it's all being coordinated by the
same thread which did initialize() and will to #shutdownWorkers()
after all entries are known to have been loaded and processed.

If the Processor is using an internal async queue, that's not known to
the API contract, the shutdownWorkers() implementation will have to
deal with it but that's not hard as it's invoked after all entries
having been pushed to such queue.

...

 On 25 May 2012, at 12:02, Sanne Grinovero wrote:

> On 25 May 2012 11:33, Manik Surtani <manik(a)jboss.org&gt; wrote:
>> Yes, as a one-off, but there should be a mechanism to set up internal structures
and clean up/send finalisation messages to Hibernate Search or completion RPCs, etc.
>
> Ah got it. Maybe we need it, but I was initally - maybe naively -
> expecting to deal with initialization myself :
>
> MassIndexingWorkCollector pwc = new MassIndexingWorkCollector();
> //implements Processor
> pwc.initialize(.custom stuff..) //not defined on Processor
> [cacheLoader?].processEntriesWith(pwc); // Blocking! so we know when
> we finished loading all entries.
> pwc.shutdownWorkers(); //not defined on Processor
>
> minimal API ;-)
> But I guess when implementing for real I might need something like that.
>
> Sanne
>
>>
>> On 25 May 2012, at 11:31, Sanne Grinovero wrote:
>>
>>> On 25 May 2012 10:57, Manik Surtani <manik(a)jboss.org&gt; wrote:
>>>> #processEntriesWith(Processor p)
>>>>
>>>> Processor extends Lifecycle { // Lifecycle for start() and stop()
methods…
>>>>   void process(CacheEntry e);
>>>>   void process(Collection<CacheEntry> e);
>>>>   boolean processMoreEntries();
>>>> }
>>>
>>> why the LifeCycle start/stop ?
>>> I expect to use it as a one-off, not as something which is
>>> "permanently hooked": looks like you' re thinking about a
different
>>> problem?
>>>
>>> The use case I'm thinkin of is when we need to iterate on all entries
>>> in the cachestore, such as :
>>> - Map/Reduce
>>>  - evaluating the average value of some attribute
>>>  - word counting
>>> - MassIndexer
>>>
>>> In all the use cases I'm having in mind, you want to process all
>>> entries, and only once.
>>> So #processMoreEntries would be redundant, and I think we should
>>> choose just one between CacheEntry or Collection<CacheEntry>..
let's
>>> go with the simple CacheEntry ?
>>> Should be able to avoid creation of short lived collections, and when
>>> passing collections one would likely need to iterate on each element
>>> anyway to route the invocation so some internal process(CacheEntry e);
>>>
>>> -- Sanne
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev(a)lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Manik Surtani
>> manik(a)jboss.org
>> twitter.com/maniksurtani
>>
>> Lead, Infinispan
>> http://www.infinispan.org
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

 --
 Manik Surtani
 manik(a)jboss.org
 twitter.com/maniksurtani

 Lead, Infinispan
 http://www.infinispan.org

 _______________________________________________
 infinispan-dev mailing list
 infinispan-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Map/Reduce or other batch processing on CacheLoader stored entries