[JBoss JIRA] (ISPN-5103) Inefficient index updates cause high cost merges and increase overall latency
by Sanne Grinovero (JIRA)
[ https://issues.jboss.org/browse/ISPN-5103?page=com.atlassian.jira.plugin.... ]
Sanne Grinovero updated ISPN-5103:
----------------------------------
Description:
Currently every change to the index is done Lucene-wise combining two operations:
* Delete by query, using a boolean query on the id plus the entity class
* Add
Under high load, specially during merges those numerous deletes provoke very long delays causing high latency.
We should instead use a simple Lucene Update to add/change documents, since internally it translates to a Delete by term plus an Add operation, and delete by terms are extremely efficient in Lucene.
Some local tests showed average latency of updating the index using this strategy to drop 4 times, both for the SYNC and ASYNC backends
With relation to sharing the index between entities, which was the original motivation of the Delete by query plus add strategy, we have two scenarios:
* Same cache with multiple entity types: that's a non-issue, since obviously there's no id collision in this case
* Different caches with the same index: this scenario happens when different caches shares the same index, for ex:
{code}
@Indexed(indexName=common)
public class Country { ... }
@Indexed(indexName=common)
public class Currency { ... }
cm.getCache("currencies").put(1, new Currency(...))
cm.getCache("countries").put(1, new Country(...))
{code}
This would require a delete by query in order to persist both a Country and a Currency with id=1.
It would also require setting "default.exclusive_index_use", "false", with the associated cost of having to reopen the IndexWriter on every operation.
Given the performance gain of doing a simple Update is considerable, we should make the corner case supported by extra configuration or alternatively, generate a unique @ProvidedId, including the entity class or the cache name that work for all cases described above.
was:
Currently every change to the index is done Lucene-wise combining two operations:
* Delete by query, using a boolean query on the id plus the entity class
* Add
Under high load, specially during merges those numerous deletes provoke very long delays causing high latency.
We should instead use a simple Lucene Update to add/change documents, since internally it translates to a Delete by term plus an Add operation, and delete by terms are extremely efficient in Lucene.
Some local tests showed average latency of updating the index using this strategy to drop 4 times, both for the SYNC and ASYNC backends
With relation to sharing the index between entities, which was the original motivation of the Delete by query plus add strategy, we have two scenarios:
* Same cache with muliple entity types: that's a non-issue, since obviously there's no id colision in this case
* Different caches with the same index: this scenario happens when different caches shares the same index, for ex:
{code}
@Indexed(indeName=common)
public class Country { ... }
@Indexed(indeName=common)
public class Currency { ... }
cm.getCache("currencies").put(1, new Currency(...))
cm.getCache("countries").put(1, new Country(...))
{code}
This would require a delete by query in order to persist both a Country and a Currency with id=1.
It would also require setting "default.exclusive_index_use", "false", with the associated cost of having to reopen the IndexWriter on every operation.
Given the performance gain of doing a simple Update is considerable, we should make the corner case supported by extra configuration or alternatively, generate a unique @ProvidedId, including the entity class or the cache name that work for all cases described above.
> Inefficient index updates cause high cost merges and increase overall latency
> -----------------------------------------------------------------------------
>
> Key: ISPN-5103
> URL: https://issues.jboss.org/browse/ISPN-5103
> Project: Infinispan
> Issue Type: Enhancement
> Components: Embedded Querying
> Affects Versions: 7.0.2.Final, 7.1.0.Alpha1
> Reporter: Gustavo Fernandes
>
> Currently every change to the index is done Lucene-wise combining two operations:
> * Delete by query, using a boolean query on the id plus the entity class
> * Add
>
> Under high load, specially during merges those numerous deletes provoke very long delays causing high latency.
> We should instead use a simple Lucene Update to add/change documents, since internally it translates to a Delete by term plus an Add operation, and delete by terms are extremely efficient in Lucene.
> Some local tests showed average latency of updating the index using this strategy to drop 4 times, both for the SYNC and ASYNC backends
> With relation to sharing the index between entities, which was the original motivation of the Delete by query plus add strategy, we have two scenarios:
> * Same cache with multiple entity types: that's a non-issue, since obviously there's no id collision in this case
> * Different caches with the same index: this scenario happens when different caches shares the same index, for ex:
> {code}
> @Indexed(indexName=common)
> public class Country { ... }
> @Indexed(indexName=common)
> public class Currency { ... }
> cm.getCache("currencies").put(1, new Currency(...))
> cm.getCache("countries").put(1, new Country(...))
> {code}
> This would require a delete by query in order to persist both a Country and a Currency with id=1.
> It would also require setting "default.exclusive_index_use", "false", with the associated cost of having to reopen the IndexWriter on every operation.
> Given the performance gain of doing a simple Update is considerable, we should make the corner case supported by extra configuration or alternatively, generate a unique @ProvidedId, including the entity class or the cache name that work for all cases described above.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 3 months
[JBoss JIRA] (ISPN-5064) Tests from org.infinispan.lucene fails on JDK6
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-5064?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-5064:
-----------------------------------------------
Martin Gencur <mgencur(a)redhat.com> changed the Status of [bug 1172106|https://bugzilla.redhat.com/show_bug.cgi?id=1172106] from ON_QA to VERIFIED
> Tests from org.infinispan.lucene fails on JDK6
> ----------------------------------------------
>
> Key: ISPN-5064
> URL: https://issues.jboss.org/browse/ISPN-5064
> Project: Infinispan
> Issue Type: Bug
> Components: Lucene Directory
> Reporter: Vojtech Juranek
>
> Several tests from {{org.infinispan.lucene}} fails on JDK6,
> * {{org.infinispan.lucene.cacheloader.WarmCacheTest.createBeforeClass}} fails with (see e.g. [this Jekins job|https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/JDG/view/FUNC/j...])
> {noformat}
> java.lang.NoClassDefFoundError: java/nio/file/attribute/FileAttribute
> [...]
> Caused by: java.lang.ClassNotFoundException: java.nio.file.attribute.FileAttribute
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> ... 24 more
> {noformat}
> * org.infinispan.lucene.cacheloader.WarmCacheTest.clearContent fails with
> {noformat}
> java.lang.IllegalStateException: No caches registered! Use registerCacheManager(Cache... caches) to do that!
> at org.infinispan.test.MultipleCacheManagersTest.clearContent(MultipleCacheManagersTest.java:96)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
> at org.testng.internal.Invoker.invokeConfigurationMethod(Invoker.java:564)
> at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:213)
> at org.testng.internal.Invoker.invokeMethod(Invoker.java:786)
> at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901)
> at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231)
> at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
> at org.testng.TestRunner.privateRun(TestRunner.java:767)
> at org.testng.TestRunner.run(TestRunner.java:617)
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
> at org.testng.SuiteRunner.access$000(SuiteRunner.java:37)
> at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:368)
> at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}
> * {{org.infinispan.lucene.cacheloader.WarmCacheTest.clearTempDir}} fails with NPE:
> {noformat}
> java.lang.NullPointerException
> at org.infinispan.test.TestingUtil.recursiveFileRemove(TestingUtil.java:573)
> at org.infinispan.lucene.cacheloader.WarmCacheTest.clearTempDir(WarmCacheTest.java:78)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
> at org.testng.internal.Invoker.invokeConfigurationMethod(Invoker.java:564)
> at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:213)
> at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:138)
> at org.testng.internal.TestMethodWorker.invokeAfterClassMethods(TestMethodWorker.java:225)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:114)
> at org.testng.TestRunner.privateRun(TestRunner.java:767)
> at org.testng.TestRunner.run(TestRunner.java:617)
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
> at org.testng.SuiteRunner.access$000(SuiteRunner.java:37)
> at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:368)
> at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 3 months
[JBoss JIRA] (ISPN-5121) Distributed executor fails when CDI is not on classpath
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-5121?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-5121:
----------------------------------
Status: Open (was: New)
> Distributed executor fails when CDI is not on classpath
> -------------------------------------------------------
>
> Key: ISPN-5121
> URL: https://issues.jboss.org/browse/ISPN-5121
> Project: Infinispan
> Issue Type: Bug
> Components: CDI Integration, Distributed Execution and Map/Reduce
> Affects Versions: 7.1.0.Alpha1, 7.0.3.Final
> Reporter: Tristan Tarrant
> Assignee: Tristan Tarrant
> Fix For: 7.1.0.Final, 7.0.4.Final
>
>
> When running a distributed executor using the infinispan-embedded uber jar, without the CDI API classes on the classpath the following exception is thrown:
> java.lang.NoClassDefFoundError: org/infinispan/cdi/util/BeanManagerProvider
> at org.infinispan.cdi.CDIDistributedTaskLifecycle.onPostExecute(CDIDistributedTaskLifecycle.java:35)
> at org.infinispan.distexec.spi.DistributedTaskLifecycleService.onPostExecute(DistributedTaskLifecycleService.java:48)
> at org.infinispan.commands.read.DistributedExecuteCommand.perform(DistributedExecuteCommand.java:100)
> at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:97)
> at org.infinispan.remoting.InboundInvocationHandlerImpl.access$000(InboundInvocationHandlerImpl.java:52)
> at org.infinispan.remoting.InboundInvocationHandlerImpl$2.run(InboundInvocationHandlerImpl.java:193)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 3 months
[JBoss JIRA] (ISPN-5121) Distributed executor fails when CDI is not on classpath
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-5121?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-5121:
----------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request: https://github.com/infinispan/infinispan/pull/3188
> Distributed executor fails when CDI is not on classpath
> -------------------------------------------------------
>
> Key: ISPN-5121
> URL: https://issues.jboss.org/browse/ISPN-5121
> Project: Infinispan
> Issue Type: Bug
> Components: CDI Integration, Distributed Execution and Map/Reduce
> Affects Versions: 7.1.0.Alpha1, 7.0.3.Final
> Reporter: Tristan Tarrant
> Assignee: Tristan Tarrant
> Fix For: 7.1.0.Final, 7.0.4.Final
>
>
> When running a distributed executor using the infinispan-embedded uber jar, without the CDI API classes on the classpath the following exception is thrown:
> java.lang.NoClassDefFoundError: org/infinispan/cdi/util/BeanManagerProvider
> at org.infinispan.cdi.CDIDistributedTaskLifecycle.onPostExecute(CDIDistributedTaskLifecycle.java:35)
> at org.infinispan.distexec.spi.DistributedTaskLifecycleService.onPostExecute(DistributedTaskLifecycleService.java:48)
> at org.infinispan.commands.read.DistributedExecuteCommand.perform(DistributedExecuteCommand.java:100)
> at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:97)
> at org.infinispan.remoting.InboundInvocationHandlerImpl.access$000(InboundInvocationHandlerImpl.java:52)
> at org.infinispan.remoting.InboundInvocationHandlerImpl$2.run(InboundInvocationHandlerImpl.java:193)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 3 months
[JBoss JIRA] (ISPN-5121) Distributed executor fails when CDI is not on classpath
by Tristan Tarrant (JIRA)
Tristan Tarrant created ISPN-5121:
-------------------------------------
Summary: Distributed executor fails when CDI is not on classpath
Key: ISPN-5121
URL: https://issues.jboss.org/browse/ISPN-5121
Project: Infinispan
Issue Type: Bug
Components: CDI Integration, Distributed Execution and Map/Reduce
Affects Versions: 7.0.3.Final, 7.1.0.Alpha1
Reporter: Tristan Tarrant
Assignee: Tristan Tarrant
Fix For: 7.1.0.Final, 7.0.4.Final
When running a distributed executor using the infinispan-embedded uber jar, without the CDI API classes on the classpath the following exception is thrown:
java.lang.NoClassDefFoundError: org/infinispan/cdi/util/BeanManagerProvider
at org.infinispan.cdi.CDIDistributedTaskLifecycle.onPostExecute(CDIDistributedTaskLifecycle.java:35)
at org.infinispan.distexec.spi.DistributedTaskLifecycleService.onPostExecute(DistributedTaskLifecycleService.java:48)
at org.infinispan.commands.read.DistributedExecuteCommand.perform(DistributedExecuteCommand.java:100)
at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:97)
at org.infinispan.remoting.InboundInvocationHandlerImpl.access$000(InboundInvocationHandlerImpl.java:52)
at org.infinispan.remoting.InboundInvocationHandlerImpl$2.run(InboundInvocationHandlerImpl.java:193)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 3 months
[JBoss JIRA] (ISPN-5108) Indexes (aka Filters) for MapReduce
by Guillermo GARCIA OCHOA (JIRA)
[ https://issues.jboss.org/browse/ISPN-5108?page=com.atlassian.jira.plugin.... ]
Guillermo GARCIA OCHOA updated ISPN-5108:
-----------------------------------------
Description:
We are using infinispan in a multi-tenant environment. In our first implementation we had a single group of caches for all the tenants and each object had a _'tenandId'_ (that we used as part of the key of each object too).
We had to abandon this approach due to the poor performance of our MapReduce task. The main problem is that each task 'iterate' over each element in the "shared" cache when we only need to process the elements of the tenant 'X'.
To fix this issue we were forced to create caches for each tenant, and now the MapReduce is as good as it gets (Infinispan 7 improved a lot the performance).
The problem with our current approach is that it does not scale-out: For each tenant, we create several caches that leads to the creation of thread pools and other resources on each node.
*PROPOSED SOLUTION*
Allow creating 'indexes' (aka 'filters') that points to a group of element on the cache. The idea is to 'register' some index/filters on each cache an updating it on every put. Then, when executing a MapRecuce task we can indicate the 'index'/'filter' to execute the task over the referred entries only.
This will help us in our use case but it can also improve any MapReduce task executed over infinispan if it is correctly 'tunned'. We are hopping to get your attention before reaching our scale-up limits :)
Thanks in advance and happy holidays!
(i) This is the main feature of Oracle Coherence to improve MapReduce-like tasks (more info [here|http://docs.oracle.com/cd/E18686_01/coh.37/e18692/querylang.htm#CEGG...])
was:
We are using infinispan in a multi-tenant environment. In our first implementation we had a single group of caches for all the tenants and each object had a _'tenandId'_ (that we used as part of the key of each object too).
We had to abandon this approach due to the poor performance of our MapReduce task. The main problem is that each task 'iterate' over each element in the "shared" cache when we only need to process the elements of the tenant 'X'.
To fix this issue we were forced to create caches for each tenant, and now the MapReduce is as good as it gets (Infinispan 7 improved a lot the performance).
The problem with our current approach is that it does not scale-out: For each tenant, we create several caches that leads to the creation of thread pools and other resources on each node.
*PROPOSED SOLUTION*
Allow creating 'indexes' (aka 'filters') that points to a group of element on the cache. The idea is to 'register' some index/filters on each cache an updating it on every put. Then, when executing a MapRecuce task we can indicate the 'index'/'filter' to execute the task over the referred entries only.
This will help us in our use case but it can also improve any MapReduce task executed over infinispan if it is correctly 'tunned'. We are hopping to get your attention before reaching our scale-up limits :)
Thanks in advance and happy holidays!
(i) This is the main feature of Oracle Coherence to improve MapReduce-like task (more info [here|http://docs.oracle.com/cd/E18686_01/coh.37/e18692/querylang.htm#CEGG...])
> Indexes (aka Filters) for MapReduce
> -----------------------------------
>
> Key: ISPN-5108
> URL: https://issues.jboss.org/browse/ISPN-5108
> Project: Infinispan
> Issue Type: Feature Request
> Components: Distributed Execution and Map/Reduce
> Reporter: Guillermo GARCIA OCHOA
>
> We are using infinispan in a multi-tenant environment. In our first implementation we had a single group of caches for all the tenants and each object had a _'tenandId'_ (that we used as part of the key of each object too).
> We had to abandon this approach due to the poor performance of our MapReduce task. The main problem is that each task 'iterate' over each element in the "shared" cache when we only need to process the elements of the tenant 'X'.
> To fix this issue we were forced to create caches for each tenant, and now the MapReduce is as good as it gets (Infinispan 7 improved a lot the performance).
> The problem with our current approach is that it does not scale-out: For each tenant, we create several caches that leads to the creation of thread pools and other resources on each node.
> *PROPOSED SOLUTION*
> Allow creating 'indexes' (aka 'filters') that points to a group of element on the cache. The idea is to 'register' some index/filters on each cache an updating it on every put. Then, when executing a MapRecuce task we can indicate the 'index'/'filter' to execute the task over the referred entries only.
> This will help us in our use case but it can also improve any MapReduce task executed over infinispan if it is correctly 'tunned'. We are hopping to get your attention before reaching our scale-up limits :)
> Thanks in advance and happy holidays!
> (i) This is the main feature of Oracle Coherence to improve MapReduce-like tasks (more info [here|http://docs.oracle.com/cd/E18686_01/coh.37/e18692/querylang.htm#CEGG...])
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 3 months
[JBoss JIRA] (ISPN-4884) Deployment scanner should be enabled to allow filter/converter jar deployment
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4884?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4884:
-----------------------------------------------
Vitalii Chepeliuk <vchepeli(a)redhat.com> changed the Status of [bug 1156397|https://bugzilla.redhat.com/show_bug.cgi?id=1156397] from ON_QA to VERIFIED
> Deployment scanner should be enabled to allow filter/converter jar deployment
> -----------------------------------------------------------------------------
>
> Key: ISPN-4884
> URL: https://issues.jboss.org/browse/ISPN-4884
> Project: Infinispan
> Issue Type: Enhancement
> Components: Server
> Affects Versions: 7.0.0.CR2
> Reporter: Galder Zamarreño
> Assignee: Galder Zamarreño
> Priority: Critical
> Fix For: 7.0.0.Final
>
>
> When we want to copy JARS and deploy it on JDG server in standalone/clustered.xml is no deployment-scanner element defined. It should be added when server is built
> Steps to Reproduce:
> 1. cp filter-converter.jar infinispan-server-7.0.0.CR2/standalone/deployments/
> 2. look at console output of server
> 3. check if there is no output from deployment scanner
> Current results: not output from deployment scanner, because it is not enabled
> Expected results: we should see following output in console log
> JBAS015012: Started FileSystemDeploymentService for directory /home/infinispan-server-7.0.0.CR2/standalone/deployments
> JBAS015876: Starting deployment of "filter-converter.jar" (runtime-name: "filter-converter.jar")
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 3 months