[infinispan-dev] Map/Reduce or other batch processing on CacheLoader stored entries
Manik Surtani
manik at jboss.org
Fri May 25 07:19:23 EDT 2012
On 25 May 2012, at 12:16, Sanne Grinovero wrote:
> On 25 May 2012 12:10, Manik Surtani <manik at jboss.org> wrote:
>> Well, the start() bit maybe less useful, but how do you know when the processor has been fed everything it needs, to be able to clean up?
>
> Too much usage of separate threads is blinding you from usage of known
> patterns :-)
ROFL! Good point.
> The Processor is a Visitor, so processEntriesWith( processor ) has the
> CacheLoader invoke on itself, so it's all being coordinated by the
> same thread which did initialize() and will to #shutdownWorkers()
> after all entries are known to have been loaded and processed.
>
> If the Processor is using an internal async queue, that's not known to
> the API contract, the shutdownWorkers() implementation will have to
> deal with it but that's not hard as it's invoked after all entries
> having been pushed to such queue.
>
>>
>> On 25 May 2012, at 12:02, Sanne Grinovero wrote:
>>
>>> On 25 May 2012 11:33, Manik Surtani <manik at jboss.org> wrote:
>>>> Yes, as a one-off, but there should be a mechanism to set up internal structures and clean up/send finalisation messages to Hibernate Search or completion RPCs, etc.
>>>
>>> Ah got it. Maybe we need it, but I was initally - maybe naively -
>>> expecting to deal with initialization myself :
>>>
>>> MassIndexingWorkCollector pwc = new MassIndexingWorkCollector();
>>> //implements Processor
>>> pwc.initialize(.custom stuff..) //not defined on Processor
>>> [cacheLoader?].processEntriesWith(pwc); // Blocking! so we know when
>>> we finished loading all entries.
>>> pwc.shutdownWorkers(); //not defined on Processor
>>>
>>> minimal API ;-)
>>> But I guess when implementing for real I might need something like that.
>>>
>>> Sanne
>>>
>>>>
>>>> On 25 May 2012, at 11:31, Sanne Grinovero wrote:
>>>>
>>>>> On 25 May 2012 10:57, Manik Surtani <manik at jboss.org> wrote:
>>>>>> #processEntriesWith(Processor p)
>>>>>>
>>>>>> Processor extends Lifecycle { // Lifecycle for start() and stop() methods…
>>>>>> void process(CacheEntry e);
>>>>>> void process(Collection<CacheEntry> e);
>>>>>> boolean processMoreEntries();
>>>>>> }
>>>>>
>>>>> why the LifeCycle start/stop ?
>>>>> I expect to use it as a one-off, not as something which is
>>>>> "permanently hooked": looks like you' re thinking about a different
>>>>> problem?
>>>>>
>>>>> The use case I'm thinkin of is when we need to iterate on all entries
>>>>> in the cachestore, such as :
>>>>> - Map/Reduce
>>>>> - evaluating the average value of some attribute
>>>>> - word counting
>>>>> - MassIndexer
>>>>>
>>>>> In all the use cases I'm having in mind, you want to process all
>>>>> entries, and only once.
>>>>> So #processMoreEntries would be redundant, and I think we should
>>>>> choose just one between CacheEntry or Collection<CacheEntry>.. let's
>>>>> go with the simple CacheEntry ?
>>>>> Should be able to avoid creation of short lived collections, and when
>>>>> passing collections one would likely need to iterate on each element
>>>>> anyway to route the invocation so some internal process(CacheEntry e);
>>>>>
>>>>> -- Sanne
>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>> --
>>>> Manik Surtani
>>>> manik at jboss.org
>>>> twitter.com/maniksurtani
>>>>
>>>> Lead, Infinispan
>>>> http://www.infinispan.org
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> --
>> Manik Surtani
>> manik at jboss.org
>> twitter.com/maniksurtani
>>
>> Lead, Infinispan
>> http://www.infinispan.org
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Manik Surtani
manik at jboss.org
twitter.com/maniksurtani
Lead, Infinispan
http://www.infinispan.org
More information about the infinispan-dev
mailing list