[
https://issues.jboss.org/browse/ISPN-5108?page=com.atlassian.jira.plugin....
]
Guillermo GARCIA OCHOA updated ISPN-5108:
-----------------------------------------
Description:
We are using infinispan in a multi-tenant environment. In our first implementation we had
a single group of caches for all the tenants and each object had a _'tenandId'_
(that we used as part of the key of each object too).
We had to abandon this approach due to the poor performance of our MapReduce task. The
main problem is that each task 'iterate' over each element in the
"shared" cache when we only need to process the elements of the tenant
'X'.
To fix this issue we were forced to create caches for each tenant, and now the MapReduce
is as good as it gets (Infinispan 7 improved a lot the performance).
The problem with our current approach is that it does not scale-out: For each tenant, we
create several caches that leads to the creation of thread pools and other resources on
each node.
*PROPOSED SOLUTION*
Allow creating 'indexes' (aka 'filters') that points to a group of element
on the cache. The idea is to 'register' some index/filters on each cache an
updating it on every put. Then, when executing a MapRecuce task we can indicate the
'index'/'filter' to execute the task over the referred entries only.
This will help us in our use case but it can also improve any MapReduce task executed over
infinispan if it is correctly 'tunned'.
We are hopping to get your attention before reaching our scale-up limits.
Thanks in advance and happy holidays!
(i) This is the main feature of Oracle Coherence to improve MapReduce-like task (more info
[
here|http://docs.oracle.com/cd/E18686_01/coh.37/e18692/querylang.htm#CEGG...])
was:
We are using infinispan in a multi-tenant environment. In our first implementation we had
a single group of caches for all the tenants and each object had a _'tenandId'_
(that we used as part of the key of each object too).
We had to abandon this approach due to the poor performance of our MapReduce task. The
main problem is that each task 'iterate' over each element in the
"shared" cache when we only need to process the elements of the tenant
'X'.
To fix this issue we were forced to create caches for each tenant, and now the MapReduce
is as good as it gets (Infinispan 7 improved a lot the performance).
The problem with our current approach is that it does not scale-out: For each tenant, we
create several caches that leads to the creation of thread pools and other resources on
each node.
*PROPOSED SOLUTION*
Allow creating 'indexes' (aka 'filters') that points to a group of element
on the cache. The idea is to 'register' some index/filters on each cache an
updating it on every put. Then, when executing a MapRecuce task we can indicate the
'index'/'filter' to execute the task over the referred entries only.
This will help us in our use case but it can also improve any MapReduce task executed over
infinispan if it is correctly 'tunned'.
This is the main feature of Oracle Coherence to improve MapReduce-like task (more info
[
here|http://docs.oracle.com/cd/E18686_01/coh.37/e18692/querylang.htm#CEGG...])
We are hopping to get your attention before reaching our scale-up limits.
Thanks in advance and happy holidays!
Indexes (aka Filters) for MapReduce
-----------------------------------
Key: ISPN-5108
URL:
https://issues.jboss.org/browse/ISPN-5108
Project: Infinispan
Issue Type: Feature Request
Components: Distributed Execution and Map/Reduce
Reporter: Guillermo GARCIA OCHOA
We are using infinispan in a multi-tenant environment. In our first implementation we had
a single group of caches for all the tenants and each object had a _'tenandId'_
(that we used as part of the key of each object too).
We had to abandon this approach due to the poor performance of our MapReduce task. The
main problem is that each task 'iterate' over each element in the
"shared" cache when we only need to process the elements of the tenant
'X'.
To fix this issue we were forced to create caches for each tenant, and now the MapReduce
is as good as it gets (Infinispan 7 improved a lot the performance).
The problem with our current approach is that it does not scale-out: For each tenant, we
create several caches that leads to the creation of thread pools and other resources on
each node.
*PROPOSED SOLUTION*
Allow creating 'indexes' (aka 'filters') that points to a group of
element on the cache. The idea is to 'register' some index/filters on each cache
an updating it on every put. Then, when executing a MapRecuce task we can indicate the
'index'/'filter' to execute the task over the referred entries only.
This will help us in our use case but it can also improve any MapReduce task executed
over infinispan if it is correctly 'tunned'.
We are hopping to get your attention before reaching our scale-up limits.
Thanks in advance and happy holidays!
(i) This is the main feature of Oracle Coherence to improve MapReduce-like task (more
info [
here|http://docs.oracle.com/cd/E18686_01/coh.37/e18692/querylang.htm#CEGG...])
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)