August 2014 - infinispan-issues - Jboss List Archives

[JBoss JIRA] (ISPN-4650) MassIndexer should not use UpdateDocument when adding to Lucene

by Gustavo Fernandes (JIRA)

[ https://issues.jboss.org/browse/ISPN-4650?page=com.atlassian.jira.plugin.... ] Gustavo Fernandes updated ISPN-4650: ------------------------------------ Description: The MassIndexer currently issues a Delete plus and Add operation to hibernate search backend. Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily. Since the mass indexer wipes the index at the beginning, it should simply issue an add operation. Performance wise this make a huge difference: * indexing 50k documents brings down the indexing time from 195s to 33s * indexing 200k documents brings down the indexing time from 600s to 55s was: The MassIndexer currently issues an Update operation to hibernate search backend, which in turn becomes a delete plus and add in the index. Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily. Since the mass indexer wipes the index at the beginning, it should simply issue an add operation. Performance wise this make a huge difference: * indexing 50k documents brings down the indexing time from 195s to 33s * indexing 200k documents brings down the indexing time from 600s to 55s > MassIndexer should not use UpdateDocument when adding to Lucene > --------------------------------------------------------------- > > Key: ISPN-4650 > URL: https://issues.jboss.org/browse/ISPN-4650 > Project: Infinispan > Issue Type: Enhancement > Security Level: Public(Everyone can see) > Components: Embedded Querying > Affects Versions: 7.0.0.Beta1 > Reporter: Gustavo Fernandes > Assignee: Gustavo Fernandes > Fix For: 7.0.0.Beta2 > > > The MassIndexer currently issues a Delete plus and Add operation to hibernate search backend. > Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily. > Since the mass indexer wipes the index at the beginning, it should simply issue an add operation. Performance wise this make a huge difference: > * indexing 50k documents brings down the indexing time from 195s to 33s > * indexing 200k documents brings down the indexing time from 600s to 55s -- This message was sent by Atlassian JIRA (v6.2.6#6264)

11 years, 7 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4654) AND over range queries does not work (indexless query)

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-4654?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration updated ISPN-4654: ------------------------------------------ Bugzilla Update: Perform Bugzilla References: https://bugzilla.redhat.com/show_bug.cgi?id=1132121 > AND over range queries does not work (indexless query) > ------------------------------------------------------ > > Key: ISPN-4654 > URL: https://issues.jboss.org/browse/ISPN-4654 > Project: Infinispan > Issue Type: Bug > Security Level: Public(Everyone can see) > Components: Embedded Querying, Remote Querying > Affects Versions: 6.0.2.Final, 7.0.0.Beta1 > Reporter: Radim Vansa > Assignee: Adrian Nistor > > Check this in QueryDslConditionsTest: > {code} > public void testAnd5() throws Exception { > QueryFactory qf = getQueryFactory(); > // range queries use different code > Query q = qf.from(getModelFactory().getUserImplClass()) > .having("id").lt(1000) > .and().having("age").lt(1000) > .toBuilder().build(); > List<User> list = q.list(); > assertEquals(3, list.size()); > } > {code} > The problem is that some subscription gets suspended and the second LT does not fire the second predicate update (and then neither the AND reevaluation). -- This message was sent by Atlassian JIRA (v6.2.6#6264)

11 years, 7 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4654) AND over range queries does not work (indexless query)

by Adrian Nistor (JIRA)

[ https://issues.jboss.org/browse/ISPN-4654?page=com.atlassian.jira.plugin.... ] Adrian Nistor updated ISPN-4654: -------------------------------- Status: Pull Request Sent (was: Open) Git Pull Request: https://github.com/infinispan/infinispan/pull/2811 > AND over range queries does not work (indexless query) > ------------------------------------------------------ > > Key: ISPN-4654 > URL: https://issues.jboss.org/browse/ISPN-4654 > Project: Infinispan > Issue Type: Bug > Security Level: Public(Everyone can see) > Components: Embedded Querying, Remote Querying > Affects Versions: 6.0.2.Final, 7.0.0.Beta1 > Reporter: Radim Vansa > Assignee: Adrian Nistor > > Check this in QueryDslConditionsTest: > {code} > public void testAnd5() throws Exception { > QueryFactory qf = getQueryFactory(); > // range queries use different code > Query q = qf.from(getModelFactory().getUserImplClass()) > .having("id").lt(1000) > .and().having("age").lt(1000) > .toBuilder().build(); > List<User> list = q.list(); > assertEquals(3, list.size()); > } > {code} > The problem is that some subscription gets suspended and the second LT does not fire the second predicate update (and then neither the AND reevaluation). -- This message was sent by Atlassian JIRA (v6.2.6#6264)

11 years, 7 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4650) MassIndexer should not use UpdateDocument when adding to Lucene

by Gustavo Fernandes (JIRA)

[ https://issues.jboss.org/browse/ISPN-4650?page=com.atlassian.jira.plugin.... ] Gustavo Fernandes edited comment on ISPN-4650 at 8/20/14 1:27 PM: ------------------------------------------------------------------ The auto switch ended up being trickier than it appears. Considering that IndexMappers are sent over the wire to all nodes, and each node has its own query interceptor, and indexing operations can be async depending on the stack used, and the mass indexer can be started from any node (the same applies to regular writes) there are plenty of opportunities for race conditions to provoke duplicate documents if using the MassIndexer at the same time as normal operations. I'd like to explore a 3rd alternative besides using auto switch and queues, which is relying on UpdateExtWorkDelegate. Some local tests demonstrated this kind of delegate to be very closer performance wise to the AddWorkDelegate. I'm aware the implications of its usage (being recommended if keys are unique in an index), and I think infinispan ticks the boxes, doesn't it? was (Author: gustavonalle): The auto switch ended up being trickier than it appears. Considering that IndexMappers are sent over the wire to all nodes, and each node has its own query interceptor, and indexing operations can be async depending on the stack used, and the mass indexer can be started from any node (the same applies to regular writes) there are plenty of opportunities for race conditions to provoke duplicate documents if using the MassIndexer at the same time as normal operations. I'd like to explore a 3rd alternative besides using auto switch and queues, which is relying on UpdateExtWorkDelegate. Some local tests demonstrated this kind of delegate to be very close performance wise to the AddWorkDelegate. I'm aware the implications of its usage (being recommended if keys are unique in an index), and I think infinispan ticks the boxes, doesn't it? > MassIndexer should not use UpdateDocument when adding to Lucene > --------------------------------------------------------------- > > Key: ISPN-4650 > URL: https://issues.jboss.org/browse/ISPN-4650 > Project: Infinispan > Issue Type: Enhancement > Security Level: Public(Everyone can see) > Components: Embedded Querying > Affects Versions: 7.0.0.Beta1 > Reporter: Gustavo Fernandes > Assignee: Gustavo Fernandes > Fix For: 7.0.0.Beta2 > > > The MassIndexer currently issues an Update operation to hibernate search backend, which in turn becomes a delete plus and add in the index. > Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily. > Since the mass indexer wipes the index at the beginning, it should simply issue an add operation. Performance wise this make a huge difference: > * indexing 50k documents brings down the indexing time from 195s to 33s > * indexing 200k documents brings down the indexing time from 600s to 55s -- This message was sent by Atlassian JIRA (v6.2.6#6264)

11 years, 7 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4650) MassIndexer should not use UpdateDocument when adding to Lucene

by Gustavo Fernandes (JIRA)

[ https://issues.jboss.org/browse/ISPN-4650?page=com.atlassian.jira.plugin.... ] Gustavo Fernandes edited comment on ISPN-4650 at 8/20/14 1:27 PM: ------------------------------------------------------------------ The auto switch ended up being trickier than it appears. Considering that IndexMappers are sent over the wire to all nodes, and each node has its own query interceptor, and indexing operations can be async depending on the stack used, and the mass indexer can be started from any node (the same applies to regular writes) there are plenty of opportunities for race conditions to provoke duplicate documents if using the MassIndexer at the same time as normal operations. I'd like to explore a 3rd alternative besides using auto switch and queues, which is relying on UpdateExtWorkDelegate. Some local tests demonstrated this kind of delegate to be closer performance wise to the AddWorkDelegate. I'm aware the implications of its usage (being recommended if keys are unique in an index), and I think infinispan ticks the boxes, doesn't it? was (Author: gustavonalle): The auto switch ended up being trickier than it appears. Considering that IndexMappers are sent over the wire to all nodes, and each node has its own query interceptor, and indexing operations can be async depending on the stack used, and the mass indexer can be started from any node (the same applies to regular writes) there are plenty of opportunities for race conditions to provoke duplicate documents if using the MassIndexer at the same time as normal operations. I'd like to explore a 3rd alternative besides using auto switch and queues, which is relying on UpdateExtWorkDelegate. Some local tests demonstrated this kind of delegate to be very closer performance wise to the AddWorkDelegate. I'm aware the implications of its usage (being recommended if keys are unique in an index), and I think infinispan ticks the boxes, doesn't it? > MassIndexer should not use UpdateDocument when adding to Lucene > --------------------------------------------------------------- > > Key: ISPN-4650 > URL: https://issues.jboss.org/browse/ISPN-4650 > Project: Infinispan > Issue Type: Enhancement > Security Level: Public(Everyone can see) > Components: Embedded Querying > Affects Versions: 7.0.0.Beta1 > Reporter: Gustavo Fernandes > Assignee: Gustavo Fernandes > Fix For: 7.0.0.Beta2 > > > The MassIndexer currently issues an Update operation to hibernate search backend, which in turn becomes a delete plus and add in the index. > Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily. > Since the mass indexer wipes the index at the beginning, it should simply issue an add operation. Performance wise this make a huge difference: > * indexing 50k documents brings down the indexing time from 195s to 33s > * indexing 200k documents brings down the indexing time from 600s to 55s -- This message was sent by Atlassian JIRA (v6.2.6#6264)

11 years, 7 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4650) MassIndexer should not use UpdateDocument when adding to Lucene

by Gustavo Fernandes (JIRA)

[ https://issues.jboss.org/browse/ISPN-4650?page=com.atlassian.jira.plugin.... ] Gustavo Fernandes commented on ISPN-4650: ----------------------------------------- The auto switch ended up being trickier than it appears. Considering that IndexMappers are sent over the wire to all nodes, and each node has its own query interceptor, and indexing operations can be async depending on the stack used, and the mass indexer can be started from any node (the same applies to regular writes) there are plenty of opportunities for race conditions to provoke duplicate documents if using the MassIndexer at the same time as normal operations. I'd like to explore a 3rd alternative besides using auto switch and queues, which is relying on UpdateExtWorkDelegate. Some local tests demonstrated this kind of delegate to be very close performance wise to the AddWorkDelegate. I'm aware the implications of its usage (being recommended if keys are unique in an index), and I think infinispan ticks the boxes, doesn't it? > MassIndexer should not use UpdateDocument when adding to Lucene > --------------------------------------------------------------- > > Key: ISPN-4650 > URL: https://issues.jboss.org/browse/ISPN-4650 > Project: Infinispan > Issue Type: Enhancement > Security Level: Public(Everyone can see) > Components: Embedded Querying > Affects Versions: 7.0.0.Beta1 > Reporter: Gustavo Fernandes > Assignee: Gustavo Fernandes > Fix For: 7.0.0.Beta2 > > > The MassIndexer currently issues an Update operation to hibernate search backend, which in turn becomes a delete plus and add in the index. > Lucene buffers those deletes queries and during merge it tries to 'apply' those deletes wasting a massive amount of time doing seeks and queries unnecessarily. > Since the mass indexer wipes the index at the beginning, it should simply issue an add operation. Performance wise this make a huge difference: > * indexing 50k documents brings down the indexing time from 195s to 33s > * indexing 200k documents brings down the indexing time from 600s to 55s -- This message was sent by Atlassian JIRA (v6.2.6#6264)

11 years, 7 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4654) AND over range queries does not work (indexless query)

by Adrian Nistor (JIRA)

[ https://issues.jboss.org/browse/ISPN-4654?page=com.atlassian.jira.plugin.... ] Adrian Nistor commented on ISPN-4654: ------------------------------------- The assert should actually be {quote}assertEquals(3, list.size());{quote}, otherwise the test is fine for non-indexed and it correctly demonstrates the problem. Unfortunately this test should pass in the Lucene case, but surprisingly it does not. But this happens for different reasons than in non-indexed case: 1. 'id' field is not @NumericField, 2. 'age' is nullable, which basically cancels the fact it is a numeric field (because nullable fields are actually represented as strings - Lucene weirdness...). So the comparisons happen on strings and the end result is incorrect. I'll fix the non-indexed case ASAP, but the Lucene case needs to be addressed separately. > AND over range queries does not work (indexless query) > ------------------------------------------------------ > > Key: ISPN-4654 > URL: https://issues.jboss.org/browse/ISPN-4654 > Project: Infinispan > Issue Type: Bug > Security Level: Public(Everyone can see) > Components: Embedded Querying, Remote Querying > Affects Versions: 6.0.2.Final, 7.0.0.Beta1 > Reporter: Radim Vansa > Assignee: Adrian Nistor > > Check this in QueryDslConditionsTest: > {code} > public void testAnd5() throws Exception { > QueryFactory qf = getQueryFactory(); > // range queries use different code > Query q = qf.from(getModelFactory().getUserImplClass()) > .having("id").lt(1000) > .and().having("age").lt(1000) > .toBuilder().build(); > List<User> list = q.list(); > assertEquals(3, list.size()); > } > {code} > The problem is that some subscription gets suspended and the second LT does not fire the second predicate update (and then neither the AND reevaluation). -- This message was sent by Atlassian JIRA (v6.2.6#6264)

11 years, 7 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4654) AND over range queries does not work (indexless query)

by Adrian Nistor (JIRA)

[ https://issues.jboss.org/browse/ISPN-4654?page=com.atlassian.jira.plugin.... ] Adrian Nistor edited comment on ISPN-4654 at 8/20/14 1:03 PM: -------------------------------------------------------------- The assert should actually be {quote}assertEquals(1, list.size());{quote}, otherwise the test is fine for non-indexed and it correctly demonstrates the problem. Unfortunately this test should pass in the Lucene case, but surprisingly it does not. But this happens for different reasons than in non-indexed case: 1. 'id' field is not @NumericField, 2. 'age' is nullable, which basically cancels the fact it is a numeric field (because nullable fields are actually represented as strings - Lucene weirdness...). So the comparisons happen on strings and the end result is incorrect. I'll fix the non-indexed case ASAP, but the Lucene case needs to be addressed separately. was (Author: anistor): The assert should actually be {quote}assertEquals(3, list.size());{quote}, otherwise the test is fine for non-indexed and it correctly demonstrates the problem. Unfortunately this test should pass in the Lucene case, but surprisingly it does not. But this happens for different reasons than in non-indexed case: 1. 'id' field is not @NumericField, 2. 'age' is nullable, which basically cancels the fact it is a numeric field (because nullable fields are actually represented as strings - Lucene weirdness...). So the comparisons happen on strings and the end result is incorrect. I'll fix the non-indexed case ASAP, but the Lucene case needs to be addressed separately. > AND over range queries does not work (indexless query) > ------------------------------------------------------ > > Key: ISPN-4654 > URL: https://issues.jboss.org/browse/ISPN-4654 > Project: Infinispan > Issue Type: Bug > Security Level: Public(Everyone can see) > Components: Embedded Querying, Remote Querying > Affects Versions: 6.0.2.Final, 7.0.0.Beta1 > Reporter: Radim Vansa > Assignee: Adrian Nistor > > Check this in QueryDslConditionsTest: > {code} > public void testAnd5() throws Exception { > QueryFactory qf = getQueryFactory(); > // range queries use different code > Query q = qf.from(getModelFactory().getUserImplClass()) > .having("id").lt(1000) > .and().having("age").lt(1000) > .toBuilder().build(); > List<User> list = q.list(); > assertEquals(3, list.size()); > } > {code} > The problem is that some subscription gets suspended and the second LT does not fire the second predicate update (and then neither the AND reevaluation). -- This message was sent by Atlassian JIRA (v6.2.6#6264)

11 years, 7 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4618) Map/Reduce jobs should not timeout by default

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-4618?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-4618: ----------------------------------------------- Tristan Tarrant <ttarrant(a)redhat.com> changed the Status of [bug 1128791|https://bugzilla.redhat.com/show_bug.cgi?id=1128791] from ASSIGNED to POST > Map/Reduce jobs should not timeout by default > --------------------------------------------- > > Key: ISPN-4618 > URL: https://issues.jboss.org/browse/ISPN-4618 > Project: Infinispan > Issue Type: Enhancement > Security Level: Public(Everyone can see) > Components: Distributed Execution and Map/Reduce > Reporter: Sanne Grinovero > Assignee: Vladimir Blagojevic > Fix For: 7.0.0.Beta1 > > > It seems the fault of the jobs today is to timeout at the same timeout of replication commands. > The default should be to not timeout, and allow the user to optionially set a timeout. -- This message was sent by Atlassian JIRA (v6.2.6#6264)

11 years, 7 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4655) Make org.infinispan.client.hotrod.event package public for javadocs

by Galder Zamarreño (JIRA)

Galder Zamarreño created ISPN-4655: -------------------------------------- Summary: Make org.infinispan.client.hotrod.event package public for javadocs Key: ISPN-4655 URL: https://issues.jboss.org/browse/ISPN-4655 Project: Infinispan Issue Type: Task Security Level: Public (Everyone can see) Reporter: Galder Zamarreño Assignee: Galder Zamarreño Fix For: 7.0.0.Beta2 -- This message was sent by Atlassian JIRA (v6.2.6#6264)

11 years, 7 months

1
0
0 / 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-issues August 2014