[JBoss JIRA] (ISPN-4639) IndexingConfigurationBuilder query module validation is wrong
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-4639?page=com.atlassian.jira.plugin.... ]
Work on ISPN-4639 started by William Burns.
> IndexingConfigurationBuilder query module validation is wrong
> -------------------------------------------------------------
>
> Key: ISPN-4639
> URL: https://issues.jboss.org/browse/ISPN-4639
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Configuration
> Affects Versions: 7.0.0.Beta1
> Reporter: Paul Ferraro
> Assignee: William Burns
> Priority: Critical
>
> The query module is ultimately loaded by the ComponentRegistry, which will load its modules via the classloader returned by GlobalConfiguration.classLoader(). However, the logic in IndexingConfigurationBuilder.validate(...) tests that the query module classes are accessible from the classloader that loaded the IndexingConfigurationBuilder itself.
> In WildFly, to use querying, one would use the following configuration:
> <cache-container module="org.infinispan.query"/>
> Internally, this configures Infinispan's GlobalConfiguration with the classloader of the "org.infinispan.query" module.
> However, the IndexingConfigurationBuilder class is contained in the "org.infinispan" - which does not depend on the "org.infinispan.query" module. Consequently, IndexingConfigurationBuilder validation fails, even though the module would have been successfully loaded at runtime.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
11 years, 7 months
[JBoss JIRA] (ISPN-4650) MassIndexer should not use UpdateDocument when adding to Lucene
by Gustavo Fernandes (JIRA)
[ https://issues.jboss.org/browse/ISPN-4650?page=com.atlassian.jira.plugin.... ]
Gustavo Fernandes commented on ISPN-4650:
-----------------------------------------
Good point, I wasn't expecting having writes when the MassIndexer is running, specially when the MassIndexer wipes the indexes as a first step.
I think the auto switch from optimistic/pessimistic is a good compromise, will go to that route.
> MassIndexer should not use UpdateDocument when adding to Lucene
> ---------------------------------------------------------------
>
> Key: ISPN-4650
> URL: https://issues.jboss.org/browse/ISPN-4650
> Project: Infinispan
> Issue Type: Enhancement
> Security Level: Public(Everyone can see)
> Components: Embedded Querying
> Affects Versions: 7.0.0.Beta1
> Reporter: Gustavo Fernandes
> Assignee: Gustavo Fernandes
> Fix For: 7.0.0.Beta2
>
>
> The MassIndexer currently issues an Update operation to hibernate search backend, which in turn becomes a delete plus and add in the index.
> Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily.
> Since the mass indexer wipes the index at the beginning, it should simply issue an add operation. Performance wise this make a huge difference:
> * indexing 50k documents brings down the indexing time from 195s to 33s
> * indexing 200k documents brings down the indexing time from 600s to 55s
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
11 years, 7 months
[JBoss JIRA] (ISPN-4651) LevelDB crashes JVM when stop() is called concurrently with write()
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-4651?page=com.atlassian.jira.plugin.... ]
Radim Vansa updated ISPN-4651:
------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request: https://github.com/infinispan/infinispan/pull/2807
> LevelDB crashes JVM when stop() is called concurrently with write()
> -------------------------------------------------------------------
>
> Key: ISPN-4651
> URL: https://issues.jboss.org/browse/ISPN-4651
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Loaders and Stores
> Affects Versions: 7.0.0.Beta1
> Reporter: Radim Vansa
> Assignee: Radim Vansa
>
> This test reproduces the issue:
> {code}
> public void testConcurrentWriteAndRestart() {
> final int THREADS = 4;
> final AtomicBoolean run = new AtomicBoolean(true);
> final CountDownLatch started = new CountDownLatch(THREADS);
> ExecutorService executor = Executors.newFixedThreadPool(THREADS);
> for (int i = 0; i < THREADS; ++i) {
> executor.execute(new Runnable() {
> @Override
> public void run() {
> started.countDown();
> int i = 0;
> while (run.get()) {
> InternalCacheEntry entry = TestInternalCacheEntryFactory.create("k" + i, "v" + i);
> MarshalledEntry me = TestingUtil.marshalledEntry(entry, getMarshaller());
> cl.write(me);
> ++i;
> }
> }
> });
> }
> try {
> started.await();
> Thread.sleep(1000);
> cl.stop();
> Thread.sleep(1000);
> cl.start();
> Thread.sleep(1000);
> } catch (InterruptedException e) {
> throw new IllegalStateException(e);
> } finally {
> run.set(false);
> executor.shutdown();
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
11 years, 7 months
[JBoss JIRA] (ISPN-4650) MassIndexer should not use UpdateDocument when adding to Lucene
by Sanne Grinovero (JIRA)
[ https://issues.jboss.org/browse/ISPN-4650?page=com.atlassian.jira.plugin.... ]
Sanne Grinovero edited comment on ISPN-4650 at 8/19/14 7:51 AM:
----------------------------------------------------------------
The finding is certainly correct, I've witnesses as well significant slowdowns caused by delete operations.
But for the MassIndexer to ignore put operations which could be triggered "in parallel" has some dangers, you'd need to carefully test this and see if you don't need to queue up update operations which are triggered by normal transactional operations while the MassIndexer is running.
I suspect you'll need to make sure that any update operation triggered by cache write operations needs to be applied after the MassIndexer is done.
For sake of simplicity, you could make this an option which is disabled by default and properly explains the drawback of enabling it: it would be perfectly fine for a batch upload if the user simply knows because of how his application is designed, that no concurrent updates will be applied to the cache while the indexer is running.
Alternatively rather than using a queue - which poses questions on where to store it and how to limit it from uncontrolled size - is to go with the optimistic approach and automatically switch to the safer approach if any other indexing event is generated.
was (Author: sannegrinovero):
The finding is certainly correct, I've witnesses as will significant slowdowns caused by delete operations.
But for the MassIndexer to ignore put operations which could be triggered "in parallel" has some dangers, you'd need to carefully test this and see if you don't need to queue up update operations which are triggered by normal transactional operations while the MassIndexer is running.
I suspect you'll need to make sure that any update operation triggered by cache write operations needs to be applied after the MassIndexer is done.
For sake of simplicity, you could make this an option which is disabled by default and properly explains the drawback of enabling it: it would be perfectly fine for a batch upload if the user simply knows because of how his application is designed, that no concurrent updates will be applied to the cache while the indexer is running.
Alternatively rather than using a queue - which poses questions on where to store it and how to limit it from uncontrolled size - is to go with the optimistic approach and automatically switch to the safer approach if any other indexing event is generated.
> MassIndexer should not use UpdateDocument when adding to Lucene
> ---------------------------------------------------------------
>
> Key: ISPN-4650
> URL: https://issues.jboss.org/browse/ISPN-4650
> Project: Infinispan
> Issue Type: Enhancement
> Security Level: Public(Everyone can see)
> Components: Embedded Querying
> Affects Versions: 7.0.0.Beta1
> Reporter: Gustavo Fernandes
> Assignee: Gustavo Fernandes
> Fix For: 7.0.0.Beta2
>
>
> The MassIndexer currently issues an Update operation to hibernate search backend, which in turn becomes a delete plus and add in the index.
> Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily.
> Since the mass indexer wipes the index at the beginning, it should simply issue an add operation. Performance wise this make a huge difference:
> * indexing 50k documents brings down the indexing time from 195s to 33s
> * indexing 200k documents brings down the indexing time from 600s to 55s
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
11 years, 7 months
[JBoss JIRA] (ISPN-4650) MassIndexer should not use UpdateDocument when adding to Lucene
by Sanne Grinovero (JIRA)
[ https://issues.jboss.org/browse/ISPN-4650?page=com.atlassian.jira.plugin.... ]
Sanne Grinovero commented on ISPN-4650:
---------------------------------------
The finding is certainly correct, I've witnesses as will significant slowdowns caused by delete operations.
But for the MassIndexer to ignore put operations which could be triggered "in parallel" has some dangers, you'd need to carefully test this and see if you don't need to queue up update operations which are triggered by normal transactional operations while the MassIndexer is running.
I suspect you'll need to make sure that any update operation triggered by cache write operations needs to be applied after the MassIndexer is done.
For sake of simplicity, you could make this an option which is disabled by default and properly explains the drawback of enabling it: it would be perfectly fine for a batch upload if the user simply knows because of how his application is designed, that no concurrent updates will be applied to the cache while the indexer is running.
Alternatively rather than using a queue - which poses questions on where to store it and how to limit it from uncontrolled size - is to go with the optimistic approach and automatically switch to the safer approach if any other indexing event is generated.
> MassIndexer should not use UpdateDocument when adding to Lucene
> ---------------------------------------------------------------
>
> Key: ISPN-4650
> URL: https://issues.jboss.org/browse/ISPN-4650
> Project: Infinispan
> Issue Type: Enhancement
> Security Level: Public(Everyone can see)
> Components: Embedded Querying
> Affects Versions: 7.0.0.Beta1
> Reporter: Gustavo Fernandes
> Assignee: Gustavo Fernandes
> Fix For: 7.0.0.Beta2
>
>
> The MassIndexer currently issues an Update operation to hibernate search backend, which in turn becomes a delete plus and add in the index.
> Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily.
> Since the mass indexer wipes the index at the beginning, it should simply issue an add operation. Performance wise this make a huge difference:
> * indexing 50k documents brings down the indexing time from 195s to 33s
> * indexing 200k documents brings down the indexing time from 600s to 55s
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
11 years, 7 months
[JBoss JIRA] (ISPN-4650) MassIndexer should not use UpdateDocument when adding to Lucene
by Gustavo Fernandes (JIRA)
[ https://issues.jboss.org/browse/ISPN-4650?page=com.atlassian.jira.plugin.... ]
Gustavo Fernandes updated ISPN-4650:
------------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request: https://github.com/infinispan/infinispan/pull/2806
> MassIndexer should not use UpdateDocument when adding to Lucene
> ---------------------------------------------------------------
>
> Key: ISPN-4650
> URL: https://issues.jboss.org/browse/ISPN-4650
> Project: Infinispan
> Issue Type: Enhancement
> Security Level: Public(Everyone can see)
> Components: Embedded Querying
> Affects Versions: 7.0.0.Beta1
> Reporter: Gustavo Fernandes
> Assignee: Gustavo Fernandes
> Fix For: 7.0.0.Beta2
>
>
> The MassIndexer currently issues an Update operation to hibernate search backend, which in turn becomes a delete plus and add in the index.
> Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily.
> Since the mass indexer wipes the index at the beginning, it should simply issue an add operation. Performance wise this make a huge difference:
> * indexing 50k documents brings down the indexing time from 195s to 33s
> * indexing 200k documents brings down the indexing time from 600s to 55s
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
11 years, 7 months
[JBoss JIRA] (ISPN-4651) LevelDB crashes JVM when stop() is called concurrently with write()
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4651?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration updated ISPN-4651:
------------------------------------------
Bugzilla Update: Perform
Bugzilla References: https://bugzilla.redhat.com/show_bug.cgi?id=1113585
> LevelDB crashes JVM when stop() is called concurrently with write()
> -------------------------------------------------------------------
>
> Key: ISPN-4651
> URL: https://issues.jboss.org/browse/ISPN-4651
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Loaders and Stores
> Affects Versions: 7.0.0.Beta1
> Reporter: Radim Vansa
> Assignee: Radim Vansa
>
> This test reproduces the issue:
> {code}
> public void testConcurrentWriteAndRestart() {
> final int THREADS = 4;
> final AtomicBoolean run = new AtomicBoolean(true);
> final CountDownLatch started = new CountDownLatch(THREADS);
> ExecutorService executor = Executors.newFixedThreadPool(THREADS);
> for (int i = 0; i < THREADS; ++i) {
> executor.execute(new Runnable() {
> @Override
> public void run() {
> started.countDown();
> int i = 0;
> while (run.get()) {
> InternalCacheEntry entry = TestInternalCacheEntryFactory.create("k" + i, "v" + i);
> MarshalledEntry me = TestingUtil.marshalledEntry(entry, getMarshaller());
> cl.write(me);
> ++i;
> }
> }
> });
> }
> try {
> started.await();
> Thread.sleep(1000);
> cl.stop();
> Thread.sleep(1000);
> cl.start();
> Thread.sleep(1000);
> } catch (InterruptedException e) {
> throw new IllegalStateException(e);
> } finally {
> run.set(false);
> executor.shutdown();
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
11 years, 7 months
[JBoss JIRA] (ISPN-4651) LevelDB crashes JVM when stop() is called concurrently with write()
by Radim Vansa (JIRA)
Radim Vansa created ISPN-4651:
---------------------------------
Summary: LevelDB crashes JVM when stop() is called concurrently with write()
Key: ISPN-4651
URL: https://issues.jboss.org/browse/ISPN-4651
Project: Infinispan
Issue Type: Bug
Security Level: Public (Everyone can see)
Components: Loaders and Stores
Affects Versions: 7.0.0.Beta1
Reporter: Radim Vansa
Assignee: Radim Vansa
This test reproduces the issue:
{code}
public void testConcurrentWriteAndRestart() {
final int THREADS = 4;
final AtomicBoolean run = new AtomicBoolean(true);
final CountDownLatch started = new CountDownLatch(THREADS);
ExecutorService executor = Executors.newFixedThreadPool(THREADS);
for (int i = 0; i < THREADS; ++i) {
executor.execute(new Runnable() {
@Override
public void run() {
started.countDown();
int i = 0;
while (run.get()) {
InternalCacheEntry entry = TestInternalCacheEntryFactory.create("k" + i, "v" + i);
MarshalledEntry me = TestingUtil.marshalledEntry(entry, getMarshaller());
cl.write(me);
++i;
}
}
});
}
try {
started.await();
Thread.sleep(1000);
cl.stop();
Thread.sleep(1000);
cl.start();
Thread.sleep(1000);
} catch (InterruptedException e) {
throw new IllegalStateException(e);
} finally {
run.set(false);
executor.shutdown();
}
}
{code}
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
11 years, 7 months
[JBoss JIRA] (ISPN-4650) MassIndexer should not use UpdateDocument when adding to Lucene
by Gustavo Fernandes (JIRA)
[ https://issues.jboss.org/browse/ISPN-4650?page=com.atlassian.jira.plugin.... ]
Gustavo Fernandes updated ISPN-4650:
------------------------------------
Issue Type: Enhancement (was: Feature Request)
> MassIndexer should not use UpdateDocument when adding to Lucene
> ---------------------------------------------------------------
>
> Key: ISPN-4650
> URL: https://issues.jboss.org/browse/ISPN-4650
> Project: Infinispan
> Issue Type: Enhancement
> Security Level: Public(Everyone can see)
> Components: Embedded Querying
> Affects Versions: 7.0.0.Beta1
> Reporter: Gustavo Fernandes
> Assignee: Gustavo Fernandes
> Fix For: 7.0.0.Beta2
>
>
> The MassIndexer currently issues an Update operation to hibernate search backend, which in turn becomes a delete plus and add in the index.
> Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily.
> Since the mass indexer wipes the index at the beginning, it should simply issue an add operation. Performance wise this make a huge difference:
> * indexing 50k documents brings down the indexing time from 195s to 33s
> * indexing 200k documents brings down the indexing time from 600s to 55s
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
11 years, 7 months