[JBoss JIRA] (ISPN-5108) Indexes (aka Filters) for MapReduce

Wednesday, 24 December 2014

     [
https://issues.jboss.org/browse/ISPN-5108?page=com.atlassian.jira.plugin....
]

Guillermo GARCIA OCHOA updated ISPN-5108:
-----------------------------------------
    Description: 
We are using infinispan in a multi-tenant environment. In our first implementation we had
a single group of caches for all the tenants and each object had a _'tenandId'_
(that we used as part of the key of each object too).

We had to abandon this approach due to the poor performance of our MapReduce task. The
main problem is that each task 'iterate' over each element in the
"shared" cache when we only need to process the elements of the tenant
'X'.

To fix this issue we were forced to create caches for each tenant, and now the MapReduce
is as good as it gets (Infinispan 7 improved a lot the performance).

The problem with our current approach is that it does not scale-out: For each tenant, we
create several caches that leads to the creation of thread pools and other resources on
each node. 

*PROPOSED SOLUTION*
Allow creating 'indexes' (aka 'filters') that points to a group of element
on the cache. The idea is to 'register' some index/filters on each cache an
updating it on every put. Then, when executing a MapRecuce task we can indicate the
'index'/'filter' to execute the task over the referred entries only.

This will help us in our use case but it can also improve any MapReduce task executed over
infinispan if it is correctly 'tunned'.

This is the main feature of Oracle Coherence to improve MapReduce-like task (more info
[here|http://docs.oracle.com/cd/E18686_01/coh.37/e18692/querylang.htm#CEGG...])

We are hopping to get your attention before reaching our scale-up limits.

Thanks in advance and happy holidays!

  was:
We are using infinispan in a multi-tenant environment. In our first implementation we had
a single group of caches for all the tenants and each object had a _'tenandId'_
(that we used as part of the key of each object too).

We had to abandon this approach due to the poor performance of our MapReduce task. The
main problem is that each task 'iterate' over each element in the
"shared" cache when we only need to process the elements of the tenant
'X'.

To fix this issue we were forced to create caches for each tenant, and now the MapReduce
is as good as it gets (Infinispan 7 improved a lot the performance).

The problem with our current approach is that it does not scale-out: For each tenant, we
create several caches that leads to the creation of thread pools and other resources on
each node. 

*PROPOSED SOLUTION*
Allow creating 'indexes' (aka 'filters') that points to a group of element
on the cache. The idea is to 'register' some index/filters on each cache an
updating it on every put.

Then, when executing a MapRecuce task we can indicate the 'index'/'filter'
to execute the task over the referred entries only.

This will help us in our use case but it can also improve any MapReduce task executed over
infinispan if it is correctly 'tunned'.

This is the main feature of Oracle Coherence to improve MapReduce-like task (more info
[here|http://docs.oracle.com/cd/E18686_01/coh.37/e18692/querylang.htm#CEGG...])

We are hopping to get your attention before reaching our scale-up limits.

Thanks in advance and happy holidays!

...
 Indexes (aka Filters) for MapReduce
 -----------------------------------

                 Key: ISPN-5108
                 URL: https://issues.jboss.org/browse/ISPN-5108
             Project: Infinispan
          Issue Type: Feature Request
          Components: Distributed Execution and Map/Reduce
            Reporter: Guillermo GARCIA OCHOA

 We are using infinispan in a multi-tenant environment. In our first implementation we had
a single group of caches for all the tenants and each object had a _'tenandId'_
(that we used as part of the key of each object too).
 We had to abandon this approach due to the poor performance of our MapReduce task. The
main problem is that each task 'iterate' over each element in the
"shared" cache when we only need to process the elements of the tenant
'X'.
 To fix this issue we were forced to create caches for each tenant, and now the MapReduce
is as good as it gets (Infinispan 7 improved a lot the performance).
 The problem with our current approach is that it does not scale-out: For each tenant, we
create several caches that leads to the creation of thread pools and other resources on
each node. 
 *PROPOSED SOLUTION*
 Allow creating 'indexes' (aka 'filters') that points to a group of
element on the cache. The idea is to 'register' some index/filters on each cache
an updating it on every put. Then, when executing a MapRecuce task we can indicate the
'index'/'filter' to execute the task over the referred entries only.
 This will help us in our use case but it can also improve any MapReduce task executed
over infinispan if it is correctly 'tunned'.
 This is the main feature of Oracle Coherence to improve MapReduce-like task (more info
[here|http://docs.oracle.com/cd/E18686_01/coh.37/e18692/querylang.htm#CEGG...])
 We are hopping to get your attention before reaching our scale-up limits.
 Thanks in advance and happy holidays! 

--
This message was sent by Atlassian JIRA
(v6.3.11#6341)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009