[infinispan-dev] Map/Reduce or other batch processing on CacheLoader stored entries

Manik Surtani manik at jboss.org
Fri May 25 07:19:23 EDT 2012


On 25 May 2012, at 12:16, Sanne Grinovero wrote:

> On 25 May 2012 12:10, Manik Surtani <manik at jboss.org> wrote:
>> Well, the start() bit maybe less useful, but how do you know when the processor has been fed everything it needs, to be able to clean up?
> 
> Too much usage of separate threads is blinding you from usage of known
> patterns :-)

ROFL!  Good point.

> The Processor is a Visitor, so processEntriesWith( processor ) has the
> CacheLoader invoke on itself, so it's all being coordinated by the
> same thread which did initialize() and will to #shutdownWorkers()
> after all entries are known to have been loaded and processed.
> 
> If the Processor is using an internal async queue, that's not known to
> the API contract, the shutdownWorkers() implementation will have to
> deal with it but that's not hard as it's invoked after all entries
> having been pushed to such queue.
> 
>> 
>> On 25 May 2012, at 12:02, Sanne Grinovero wrote:
>> 
>>> On 25 May 2012 11:33, Manik Surtani <manik at jboss.org> wrote:
>>>> Yes, as a one-off, but there should be a mechanism to set up internal structures and clean up/send finalisation messages to Hibernate Search or completion RPCs, etc.
>>> 
>>> Ah got it. Maybe we need it, but I was initally - maybe naively -
>>> expecting to deal with initialization myself :
>>> 
>>> MassIndexingWorkCollector pwc = new MassIndexingWorkCollector();
>>> //implements Processor
>>> pwc.initialize(.custom stuff..) //not defined on Processor
>>> [cacheLoader?].processEntriesWith(pwc); // Blocking! so we know when
>>> we finished loading all entries.
>>> pwc.shutdownWorkers(); //not defined on Processor
>>> 
>>> minimal API ;-)
>>> But I guess when implementing for real I might need something like that.
>>> 
>>> Sanne
>>> 
>>>> 
>>>> On 25 May 2012, at 11:31, Sanne Grinovero wrote:
>>>> 
>>>>> On 25 May 2012 10:57, Manik Surtani <manik at jboss.org> wrote:
>>>>>> #processEntriesWith(Processor p)
>>>>>> 
>>>>>> Processor extends Lifecycle { // Lifecycle for start() and stop() methods…
>>>>>>   void process(CacheEntry e);
>>>>>>   void process(Collection<CacheEntry> e);
>>>>>>   boolean processMoreEntries();
>>>>>> }
>>>>> 
>>>>> why the LifeCycle start/stop ?
>>>>> I expect to use it as a one-off, not as something which is
>>>>> "permanently hooked": looks like you' re thinking about a different
>>>>> problem?
>>>>> 
>>>>> The use case I'm thinkin of is when we need to iterate on all entries
>>>>> in the cachestore, such as :
>>>>> - Map/Reduce
>>>>>  - evaluating the average value of some attribute
>>>>>  - word counting
>>>>> - MassIndexer
>>>>> 
>>>>> In all the use cases I'm having in mind, you want to process all
>>>>> entries, and only once.
>>>>> So #processMoreEntries would be redundant, and I think we should
>>>>> choose just one between CacheEntry or Collection<CacheEntry>.. let's
>>>>> go with the simple CacheEntry ?
>>>>> Should be able to avoid creation of short lived collections, and when
>>>>> passing collections one would likely need to iterate on each element
>>>>> anyway to route the invocation so some internal process(CacheEntry e);
>>>>> 
>>>>> -- Sanne
>>>>> 
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> 
>>>> --
>>>> Manik Surtani
>>>> manik at jboss.org
>>>> twitter.com/maniksurtani
>>>> 
>>>> Lead, Infinispan
>>>> http://www.infinispan.org
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> --
>> Manik Surtani
>> manik at jboss.org
>> twitter.com/maniksurtani
>> 
>> Lead, Infinispan
>> http://www.infinispan.org
>> 
>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
manik at jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org






More information about the infinispan-dev mailing list