[infinispan-issues] [JBoss JIRA] (ISPN-5108) Indexes (aka Filters) for MapReduce

Guillermo GARCIA OCHOA (JIRA) issues at jboss.org
Wed Dec 24 10:08:29 EST 2014


     [ https://issues.jboss.org/browse/ISPN-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Guillermo GARCIA OCHOA updated ISPN-5108:
-----------------------------------------
    Description: 
We are using infinispan in a multi-tenant environment. In our first implementation we had a single group of caches for all the tenants and each object had a _'tenandId'_ (that we used as part of the key of each object too).

We had to abandon this approach due to the poor performance of our MapReduce task. The main problem is that each task 'iterate' over each element in the "shared" cache when we only need to process the elements of the tenant 'X'.

To fix this issue we were forced to create caches for each tenant, and now the MapReduce is as good as it gets (Infinispan 7 improved a lot the performance).

The problem with our current approach is that it does not scale-out: For each tenant, we create several caches that leads to the creation of thread pools and other resources on each node. 

*PROPOSED SOLUTION*
Allow creating 'indexes' (aka 'filters') that points to a group of element on the cache. The idea is to 'register' some index/filters on each cache an updating it on every put. Then, when executing a MapRecuce task we can indicate the 'index'/'filter' to execute the task over the referred entries only.

This will help us in our use case but it can also improve any MapReduce task executed over infinispan if it is correctly 'tunned'.

This is the main feature of Oracle Coherence to improve MapReduce-like task (more info [here|http://docs.oracle.com/cd/E18686_01/coh.37/e18692/querylang.htm#CEGGCGCD])

We are hopping to get your attention before reaching our scale-up limits.

Thanks in advance and happy holidays!

  was:
We are using infinispan in a multi-tenant environment. In our first implementation we had a single group of caches for all the tenants and each object had a _'tenandId'_ (that we used as part of the key of each object too).

We had to abandon this approach due to the poor performance of our MapReduce task. The main problem is that each task 'iterate' over each element in the "shared" cache when we only need to process the elements of the tenant 'X'.

To fix this issue we were forced to create caches for each tenant, and now the MapReduce is as good as it gets (Infinispan 7 improved a lot the performance).

The problem with our current approach is that it does not scale-out: For each tenant, we create several caches that leads to the creation of thread pools and other resources on each node. 

*PROPOSED SOLUTION*
Allow creating 'indexes' (aka 'filters') that points to a group of element on the cache. The idea is to 'register' some index/filters on each cache an updating it on every put.

Then, when executing a MapRecuce task we can indicate the 'index'/'filter' to execute the task over the referred entries only.

This will help us in our use case but it can also improve any MapReduce task executed over infinispan if it is correctly 'tunned'.

This is the main feature of Oracle Coherence to improve MapReduce-like task (more info [here|http://docs.oracle.com/cd/E18686_01/coh.37/e18692/querylang.htm#CEGGCGCD])

We are hopping to get your attention before reaching our scale-up limits.

Thanks in advance and happy holidays!



> Indexes (aka Filters) for MapReduce
> -----------------------------------
>
>                 Key: ISPN-5108
>                 URL: https://issues.jboss.org/browse/ISPN-5108
>             Project: Infinispan
>          Issue Type: Feature Request
>          Components: Distributed Execution and Map/Reduce
>            Reporter: Guillermo GARCIA OCHOA
>
> We are using infinispan in a multi-tenant environment. In our first implementation we had a single group of caches for all the tenants and each object had a _'tenandId'_ (that we used as part of the key of each object too).
> We had to abandon this approach due to the poor performance of our MapReduce task. The main problem is that each task 'iterate' over each element in the "shared" cache when we only need to process the elements of the tenant 'X'.
> To fix this issue we were forced to create caches for each tenant, and now the MapReduce is as good as it gets (Infinispan 7 improved a lot the performance).
> The problem with our current approach is that it does not scale-out: For each tenant, we create several caches that leads to the creation of thread pools and other resources on each node. 
> *PROPOSED SOLUTION*
> Allow creating 'indexes' (aka 'filters') that points to a group of element on the cache. The idea is to 'register' some index/filters on each cache an updating it on every put. Then, when executing a MapRecuce task we can indicate the 'index'/'filter' to execute the task over the referred entries only.
> This will help us in our use case but it can also improve any MapReduce task executed over infinispan if it is correctly 'tunned'.
> This is the main feature of Oracle Coherence to improve MapReduce-like task (more info [here|http://docs.oracle.com/cd/E18686_01/coh.37/e18692/querylang.htm#CEGGCGCD])
> We are hopping to get your attention before reaching our scale-up limits.
> Thanks in advance and happy holidays!



--
This message was sent by Atlassian JIRA
(v6.3.11#6341)


More information about the infinispan-issues mailing list