[infinispan-dev] Map/Reduce or other batch processing on CacheLoader stored entries

Sanne Grinovero sanne at infinispan.org
Fri May 25 07:16:28 EDT 2012


On 25 May 2012 12:10, Manik Surtani <manik at jboss.org> wrote:
> Well, the start() bit maybe less useful, but how do you know when the processor has been fed everything it needs, to be able to clean up?

Too much usage of separate threads is blinding you from usage of known
patterns :-)
The Processor is a Visitor, so processEntriesWith( processor ) has the
CacheLoader invoke on itself, so it's all being coordinated by the
same thread which did initialize() and will to #shutdownWorkers()
after all entries are known to have been loaded and processed.

If the Processor is using an internal async queue, that's not known to
the API contract, the shutdownWorkers() implementation will have to
deal with it but that's not hard as it's invoked after all entries
having been pushed to such queue.

>
> On 25 May 2012, at 12:02, Sanne Grinovero wrote:
>
>> On 25 May 2012 11:33, Manik Surtani <manik at jboss.org> wrote:
>>> Yes, as a one-off, but there should be a mechanism to set up internal structures and clean up/send finalisation messages to Hibernate Search or completion RPCs, etc.
>>
>> Ah got it. Maybe we need it, but I was initally - maybe naively -
>> expecting to deal with initialization myself :
>>
>> MassIndexingWorkCollector pwc = new MassIndexingWorkCollector();
>> //implements Processor
>> pwc.initialize(.custom stuff..) //not defined on Processor
>> [cacheLoader?].processEntriesWith(pwc); // Blocking! so we know when
>> we finished loading all entries.
>> pwc.shutdownWorkers(); //not defined on Processor
>>
>> minimal API ;-)
>> But I guess when implementing for real I might need something like that.
>>
>> Sanne
>>
>>>
>>> On 25 May 2012, at 11:31, Sanne Grinovero wrote:
>>>
>>>> On 25 May 2012 10:57, Manik Surtani <manik at jboss.org> wrote:
>>>>> #processEntriesWith(Processor p)
>>>>>
>>>>> Processor extends Lifecycle { // Lifecycle for start() and stop() methods…
>>>>>   void process(CacheEntry e);
>>>>>   void process(Collection<CacheEntry> e);
>>>>>   boolean processMoreEntries();
>>>>> }
>>>>
>>>> why the LifeCycle start/stop ?
>>>> I expect to use it as a one-off, not as something which is
>>>> "permanently hooked": looks like you' re thinking about a different
>>>> problem?
>>>>
>>>> The use case I'm thinkin of is when we need to iterate on all entries
>>>> in the cachestore, such as :
>>>> - Map/Reduce
>>>>  - evaluating the average value of some attribute
>>>>  - word counting
>>>> - MassIndexer
>>>>
>>>> In all the use cases I'm having in mind, you want to process all
>>>> entries, and only once.
>>>> So #processMoreEntries would be redundant, and I think we should
>>>> choose just one between CacheEntry or Collection<CacheEntry>.. let's
>>>> go with the simple CacheEntry ?
>>>> Should be able to avoid creation of short lived collections, and when
>>>> passing collections one would likely need to iterate on each element
>>>> anyway to route the invocation so some internal process(CacheEntry e);
>>>>
>>>> -- Sanne
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> --
>>> Manik Surtani
>>> manik at jboss.org
>>> twitter.com/maniksurtani
>>>
>>> Lead, Infinispan
>>> http://www.infinispan.org
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> manik at jboss.org
> twitter.com/maniksurtani
>
> Lead, Infinispan
> http://www.infinispan.org
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev



More information about the infinispan-dev mailing list