Author: epbernard
Date: 2009-06-10 20:43:09 -0400 (Wed, 10 Jun 2009)
New Revision: 16755
Added:
search/trunk/src/main/java/org/hibernate/search/filter/FullTextFilterImplementor.java
search/trunk/src/main/java/org/hibernate/search/filter/ShardSensitiveOnlyFilter.java
search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategy.java
search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategyTest.java
Modified:
search/trunk/src/main/docbook/en-US/modules/configuration.xml
search/trunk/src/main/docbook/en-US/modules/query.xml
search/trunk/src/main/java/org/hibernate/search/filter/ChainedFilter.java
search/trunk/src/main/java/org/hibernate/search/query/FullTextFilterImpl.java
search/trunk/src/main/java/org/hibernate/search/query/FullTextQueryImpl.java
search/trunk/src/main/java/org/hibernate/search/store/IdHashShardingStrategy.java
search/trunk/src/main/java/org/hibernate/search/store/IndexShardingStrategy.java
search/trunk/src/main/java/org/hibernate/search/store/NotShardedStrategy.java
search/trunk/src/test/java/org/hibernate/search/test/configuration/UselessShardingStrategy.java
Log:
HSEARCH-251 Query on a shard subset based on a filter activation
Modified: search/trunk/src/main/docbook/en-US/modules/configuration.xml
===================================================================
--- search/trunk/src/main/docbook/en-US/modules/configuration.xml 2009-06-10 21:48:21 UTC
(rev 16754)
+++ search/trunk/src/main/docbook/en-US/modules/configuration.xml 2009-06-11 00:43:09 UTC
(rev 16755)
@@ -206,21 +206,33 @@
<section id="search-configuration-directory-sharding"
revision="1">
<title>Sharding indexes</title>
- <para>In some extreme cases involving huge indexes (in size), it is
- necessary to split (shard) the indexing data of a given entity type into
- several Lucene indexes. This solution is not recommended until you reach
- significant index sizes and index update times are slowing the application
- down. The main drawback of index sharding is that searches will end up
- being slower since more files have to be opened for a single search. In
+ <para>In some cases, it is necessary to split (shard) the indexing data of
+ a given entity type into several Lucene indexes. This solution is not
+ recommended unless there is a pressing need because by default, searches
+ will be slower as all shards have to be opened for a single search. In
other words don't do it until you have problems :)</para>
- <para>Despite this strong warning, Hibernate Search allows you to index a
- given entity type into several sub indexes. Data is sharded into the
- different sub indexes thanks to an
- <classname>IndexShardingStrategy</classname>. By default, no sharding
- strategy is enabled, unless the number of shards is configured. To
- configure the number of shards use the following property</para>
+ <para>For example, sharding may be desirable if:</para>
+ <itemizedlist>
+ <listitem>
+ <para>A single index is so huge that index update times are slowing
+ the application down.</para>
+ </listitem>
+
+ <listitem>
+ <para>A typical search will only hit a sub-set of the index, such as
+ when data is naturally segmented by customer, region or
+ application.</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>Hibernate Search allows you to index a given entity type into
+ several sub indexes. Data is sharded into the different sub indexes thanks
+ to an <classname>IndexShardingStrategy</classname>. By default, no
+ sharding strategy is enabled, unless the number of shards is configured.
+ To configure the number of shards use the following property</para>
+
<example>
<title>Enabling index sharding by specifying nbr_of_shards for a
specific index</title>
@@ -243,10 +255,21 @@
<programlisting>hibernate.search.<indexName>.sharding_strategy
my.shardingstrategy.Implementation</programlisting>
</example>
+ <para>Using a custom <classname>IndexShardingStrategy</classname>
+ implementation, it's possible to define what shard a given entity is
+ indexed to. </para>
+
+ <para>It also allows for optimizing searches by selecting which shard to
+ run the query onto. By activating a filter (see <xref
+ linkend="query-filter-shard" />), a sharding strategy can select a
subset
+ of the shards used to answer a query
+
(<classname>IndexShardingStrategy.getDirectoryProvidersForQuery</classname>)
+ and thus speed up the query execution.</para>
+
<para>Each shard has an independent directory provider configuration as
described in <xref linkend="search-configuration-directory" />. The
- DirectoryProvider default name for the previous example are
- <literal><indexName>.0</literal> to
+ <classname>DirectoryProvider</classname> default name for the previous
+ example are <literal><indexName>.0</literal> to
<literal><indexName>.4</literal>. In other words, each
shard has the
name of it's owning index followed by <constant>.</constant> (dot)
and its
index number.</para>
@@ -367,14 +390,15 @@
<entry>Out of the box support for the Apache Lucene back end and
the JMS back end. Default to <literal>lucene</literal>. Supports
- also <literal>jms</literal> and
<literal>blackhole</literal>.</entry>
+ also <literal>jms</literal> and
+ <literal>blackhole</literal>.</entry>
</row>
<row>
<entry><literal>hibernate.search.worker.execution</literal></entry>
- <entry>Supports synchronous and asynchronous execution. Default
- to <literal><literal>sync</literal></literal>.
Supports also
+ <entry>Supports synchronous and asynchronous execution. Default to
+ <literal><literal>sync</literal></literal>. Supports
also
<literal>async</literal>.</entry>
</row>
@@ -445,8 +469,8 @@
<section>
<title>Slave nodes</title>
- <para>Every index update operation is sent to a JMS queue. Index querying
- operations are executed on a local index copy.</para>
+ <para>Every index update operation is sent to a JMS queue. Index
+ querying operations are executed on a local index copy.</para>
<example>
<title>JMS Slave configuration</title>
@@ -605,8 +629,9 @@
<para>To enable Hibernate Search in Hibernate Core (ie. if you don't use
Hibernate Annotations), add the
<literal>FullTextIndexEventListener</literal> for the following six
- Hibernate events and also add it after the default
- <literal>DefaultFlushEventListener</literal>, as in the following
example.</para>
+ Hibernate events and also add it after the default
+ <literal>DefaultFlushEventListener</literal>, as in the following
+ example.</para>
<example>
<title>Explicitly enabling Hibernate Search by configuring the
@@ -768,13 +793,13 @@
terms.</para> <para>This silently truncates large documents,
excluding from the index all terms that occur further in the
document. If you know your source documents are large, be sure to
- set this value high enough to accommodate the expected size. If you
- set it to Integer.MAX_VALUE, then the only limit is your memory,
- but you should anticipate an OutOfMemoryError. </para> <para>If
- setting this value in <literal>batch</literal> differently than
in
- <literal>transaction</literal> you may get different data (and
- results) in your index depending on the indexing
- mode.</para></entry>
+ set this value high enough to accommodate the expected size. If
+ you set it to Integer.MAX_VALUE, then the only limit is your
+ memory, but you should anticipate an OutOfMemoryError. </para>
+ <para>If setting this value in <literal>batch</literal>
+ differently than in <literal>transaction</literal> you may get
+ different data (and results) in your index depending on the
+ indexing mode.</para></entry>
<entry>10000</entry>
</row>
@@ -852,24 +877,26 @@
</tbody>
</tgroup>
</table>
-
- <para>To tune the indexing speed it might be useful to time the
- object loading from database in isolation from the writes to the index.
- To achieve this set the <literal>blackhole</literal> as worker backend
and start
- you indexing routines.
- This backend does not disable Hibernate Search: it will still generate the needed
- changesets to the index, but will discard them instead of flushing them to the
index.
- As opposite to setting the
<literal>hibernate.search.indexing_strategy</literal>
- to <literal>manual</literal> when using
<literal>blackhole</literal> it will possibly load
- more data to rebuild the index from associated entities.</para>
-
+
+ <para>To tune the indexing speed it might be useful to time the object
+ loading from database in isolation from the writes to the index. To
+ achieve this set the <literal>blackhole</literal> as worker backend and
+ start you indexing routines. This backend does not disable Hibernate
+ Search: it will still generate the needed changesets to the index, but
+ will discard them instead of flushing them to the index. As opposite to
+ setting the <literal>hibernate.search.indexing_strategy</literal> to
+ <literal>manual</literal> when using
<literal>blackhole</literal> it will
+ possibly load more data to rebuild the index from associated
+ entities.</para>
+
<programlisting>hibernate.search.worker.backend
blackhole</programlisting>
-
- <para>The recommended approach is to focus first on optimizing the object
loading, and then
- use the timings you achieve as a baseline to tune the indexing process.</para>
- <para>The <literal>blackhole</literal> backend is not meant to be
used in production, only
- as a tool to identify indexing bottlenecks.</para>
-
+
+ <para>The recommended approach is to focus first on optimizing the object
+ loading, and then use the timings you achieve as a baseline to tune the
+ indexing process.</para>
+
+ <para>The <literal>blackhole</literal> backend is not meant to be
used in
+ production, only as a tool to identify indexing bottlenecks.</para>
</section>
<section id="search-configuration-directory-lockfactories"
revision="1">
@@ -883,6 +910,8 @@
for most cases, but it's possible to specify for each index managed by
Hibernate Search which LockingFactory you want to use.</para>
+
+
<para>Some of these locking strategies require a filesystem level lock and
may be used even on RAM based indexes, but this is not recommended and of
no practical use.</para>
@@ -976,7 +1005,7 @@
</tgroup>
</table></para>
- Configuration example:
+ Configuration example:
<programlisting>hibernate.search.default.locking_strategy simple
hibernate.search.Animals.locking_strategy native
@@ -988,4 +1017,4 @@
</section>
-</chapter>
+</chapter>
\ No newline at end of file
Modified: search/trunk/src/main/docbook/en-US/modules/query.xml
===================================================================
--- search/trunk/src/main/docbook/en-US/modules/query.xml 2009-06-10 21:48:21 UTC (rev
16754)
+++ search/trunk/src/main/docbook/en-US/modules/query.xml 2009-06-11 00:43:09 UTC (rev
16755)
@@ -345,8 +345,8 @@
</listitem>
<listitem>
- <para>FullTextQuery.OBJECT_CLASS: returns the class of the
- indexed entity.</para>
+ <para>FullTextQuery.OBJECT_CLASS: returns the class of the indexed
+ entity.</para>
</listitem>
<listitem>
@@ -545,7 +545,7 @@
</section>
</section>
- <section>
+ <section id="query-filter">
<title>Filters</title>
<para>Apache Lucene has a powerful feature that allows to filter query
@@ -833,6 +833,105 @@
time spent to execute the query)</para>
</listitem>
</itemizedlist>
+
+ <section id="query-filter-shard">
+ <title>Using filters in a sharded environment</title>
+
+ <para>It is possible, in a sharded environment to execute queries on a
+ subset of the available shards. This can be done in two steps:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>create a sharding strategy that does select a subset of
+ <classname>DirectoryProvider</classname>s depending on sone filter
+ configuration</para>
+ </listitem>
+
+ <listitem>
+ <para>activate the proper filter at query time</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>Let's first look at an example of sharding strategy that query on
+ a specific customer shard if the customer filter is activated.</para>
+
+ <programlisting>public class CustomerShardingStrategy implements
IndexShardingStrategy {
+
+ // stored DirectoryProviders in a array indexed by customerID
+ private DirectoryProvider<?>[] providers;
+
+ public void initialize(Properties properties, DirectoryProvider<?>[]
providers) {
+ this.providers = providers;
+ }
+
+ public DirectoryProvider<?>[] getDirectoryProvidersForAllShards() {
+ return providers;
+ }
+
+ public DirectoryProvider<?>
getDirectoryProviderForAddition(Class<?> entity, Serializable id, String
idInString, Document document) {
+ Integer customerID =
Integer.parseInt(document.getField("customerID").stringValue());
+ return providers[customerID];
+ }
+
+ public DirectoryProvider<?>[]
getDirectoryProvidersForDeletion(Class<?> entity, Serializable id, String
idInString) {
+ return getDirectoryProvidersForAllShards();
+ }
+
+<emphasis role="bold"> /**
+ * Optimization; don't search ALL shards and union the results; in this case, we
+ * can be certain that all the data for a particular customer Filter is in a single
+ * shard; simply return that shard by customerID.
+ */
+ public DirectoryProvider<?>[]
getDirectoryProvidersForQuery(FullTextFilterImplementor[] filters) {
+ FFullTextFilter filter = getCustomerFilter(filters, "customer");
+ if (filter == null) {
+ return getDirectoryProvidersForAllShards();
+ }
+ else {
+ return new DirectoryProvider[] {
providers[Integer.parseInt(filter.getParameter("customerID").toString())] };
+ }
+ }
+
+ private FullTextFilter getFilter(FullTextFilterImplementor[] filters, String name) {
+ for (FullTextFilterImplementor filter: filters) {
+ if (filter.getName().equals(name)) return filter;
+ }
+ return null;
+ }</emphasis>
+
+}</programlisting>
+
+ <para>In this example, if the filter named
<literal>customer</literal>
+ is present, we make sure to only use the shard dedicated to this
+ customer. Otherwise, we return all shards. A given Sharding strategy can
+ react to one or more filters and depends on their parameters.</para>
+
+ <para>The second step is simply to activate the filter at query time.
+ While the filter can be a regular filter (as defined in <xref
+ linkend="query-filter" />) which also filters Lucene results after
the
+ query, you can make use of a special filter that will only be passed to
+ the sharding strategy and otherwise ignored for the rest of the query.
+ Simply use the <classname>ShardSensitiveOnlyFilter</classname> class
+ when declaring your filter.</para>
+
+ <programlisting>@Entity @Indexed
+<emphasis role="bold">@FullTextFilterDef(name="customer",
impl=ShardSensitiveOnlyFilter.class)</emphasis>
+public class Customer {
+ ...
+}
+
+
+FullTextQuery query = ftEm.createFullTextQuery(luceneQuery, Customer.class);
+<emphasis
role="bold">query.enableFulltextFilter("customer").setParameter("CustomerID",
5);</emphasis>
+@SuppressWarnings("unchecked")
+List<Customer> results = query.getResultList();</programlisting>
+
+ <para>Note that by using the
+ <classname>ShardSensitiveOnlyFilter</classname>, you do not have to
+ implement any Lucene filter. Using filters and sharding strategy
+ reacting to these filters is recommended to speed up queries in a
+ sharded environment.</para>
+ </section>
</section>
<section>
@@ -866,4 +965,4 @@
run Lucene specific queries. Check <xref linkend="search-lucene-native"
/>
for more information.</para>
</section>
-</chapter>
+</chapter>
\ No newline at end of file
Modified: search/trunk/src/main/java/org/hibernate/search/filter/ChainedFilter.java
===================================================================
--- search/trunk/src/main/java/org/hibernate/search/filter/ChainedFilter.java 2009-06-10
21:48:21 UTC (rev 16754)
+++ search/trunk/src/main/java/org/hibernate/search/filter/ChainedFilter.java 2009-06-11
00:43:09 UTC (rev 16755)
@@ -30,6 +30,10 @@
this.chainedFilters.add( filter );
}
+ public boolean isEmpty() {
+ return chainedFilters.size() == 0;
+ }
+
public BitSet bits(IndexReader reader) throws IOException {
throw new UnsupportedOperationException();
}
Added:
search/trunk/src/main/java/org/hibernate/search/filter/FullTextFilterImplementor.java
===================================================================
--- search/trunk/src/main/java/org/hibernate/search/filter/FullTextFilterImplementor.java
(rev 0)
+++
search/trunk/src/main/java/org/hibernate/search/filter/FullTextFilterImplementor.java 2009-06-11
00:43:09 UTC (rev 16755)
@@ -0,0 +1,15 @@
+package org.hibernate.search.filter;
+
+import org.hibernate.search.FullTextFilter;
+
+/**
+ * @author Emmanuel Bernard
+ */
+public interface FullTextFilterImplementor extends FullTextFilter {
+ /**
+ * Returns the Filter name
+ */
+ String getName();
+
+ //TODO should we expose Map<String, Object> getParameters()
+}
Added:
search/trunk/src/main/java/org/hibernate/search/filter/ShardSensitiveOnlyFilter.java
===================================================================
--- search/trunk/src/main/java/org/hibernate/search/filter/ShardSensitiveOnlyFilter.java
(rev 0)
+++
search/trunk/src/main/java/org/hibernate/search/filter/ShardSensitiveOnlyFilter.java 2009-06-11
00:43:09 UTC (rev 16755)
@@ -0,0 +1,12 @@
+package org.hibernate.search.filter;
+
+/**
+ * When using this class in @FullTextFilterDef.impl, Hibernate Search
+ * considers the filter to be only influencing the sharding strategy.
+ *
+ * This filter is not applied on the results of the Lucene query.
+ *
+ * @author Emmanuel Bernard
+ */
+public interface ShardSensitiveOnlyFilter {
+}
Modified: search/trunk/src/main/java/org/hibernate/search/query/FullTextFilterImpl.java
===================================================================
---
search/trunk/src/main/java/org/hibernate/search/query/FullTextFilterImpl.java 2009-06-10
21:48:21 UTC (rev 16754)
+++
search/trunk/src/main/java/org/hibernate/search/query/FullTextFilterImpl.java 2009-06-11
00:43:09 UTC (rev 16755)
@@ -5,11 +5,12 @@
import java.util.Map;
import org.hibernate.search.FullTextFilter;
+import org.hibernate.search.filter.FullTextFilterImplementor;
/**
* @author Emmanuel Bernard
*/
-public class FullTextFilterImpl implements FullTextFilter {
+public class FullTextFilterImpl implements FullTextFilterImplementor {
private final Map<String, Object> parameters = new HashMap<String,
Object>();
private String name;
Modified: search/trunk/src/main/java/org/hibernate/search/query/FullTextQueryImpl.java
===================================================================
---
search/trunk/src/main/java/org/hibernate/search/query/FullTextQueryImpl.java 2009-06-10
21:48:21 UTC (rev 16754)
+++
search/trunk/src/main/java/org/hibernate/search/query/FullTextQueryImpl.java 2009-06-11
00:43:09 UTC (rev 16755)
@@ -54,9 +54,12 @@
import org.hibernate.search.filter.ChainedFilter;
import org.hibernate.search.filter.FilterKey;
import org.hibernate.search.filter.StandardFilterKey;
+import org.hibernate.search.filter.FullTextFilterImplementor;
+import org.hibernate.search.filter.ShardSensitiveOnlyFilter;
import org.hibernate.search.reader.ReaderProvider;
import static org.hibernate.search.reader.ReaderProviderHelper.getIndexReaders;
import org.hibernate.search.store.DirectoryProvider;
+import org.hibernate.search.store.IndexShardingStrategy;
import org.hibernate.search.util.ContextHelper;
import static org.hibernate.search.util.FilterCacheModeTypeHelper.cacheInstance;
import static org.hibernate.search.util.FilterCacheModeTypeHelper.cacheResults;
@@ -84,6 +87,7 @@
private Integer resultSize;
private Sort sort;
private Filter filter;
+ private Filter userFilter;
private Criteria criteria;
private String[] indexProjection;
private Set<String> idFieldNames;
@@ -92,7 +96,9 @@
private SearchFactoryImplementor searchFactoryImplementor;
private Map<String, FullTextFilterImpl> filterDefinitions;
private int fetchSize = 1;
+ private static final FullTextFilterImplementor[] EMPTY_FULL_TEXT_FILTER_IMPLEMENTOR =
new FullTextFilterImplementor[0];
+
/**
* Constructs a <code>FullTextQueryImpl</code> instance.
*
@@ -127,7 +133,7 @@
* {@inheritDoc}
*/
public FullTextQuery setFilter(Filter filter) {
- this.filter = filter;
+ this.userFilter = filter;
return this;
}
@@ -399,20 +405,27 @@
}
private void buildFilters() {
- if ( filterDefinitions == null || filterDefinitions.size() == 0 ) {
- return; // there is nothing to do if we don't have any filter definitions
+ ChainedFilter chainedFilter = null;
+ if ( ! ( filterDefinitions == null || filterDefinitions.size() == 0 ) ) {
+ chainedFilter = new ChainedFilter();
+ for ( FullTextFilterImpl fullTextFilter : filterDefinitions.values() ) {
+ Filter filter = buildLuceneFilter( fullTextFilter );
+ if (filter != null) chainedFilter.addFilter( filter );
+ }
}
- ChainedFilter chainedFilter = new ChainedFilter();
- for ( FullTextFilterImpl fullTextFilter : filterDefinitions.values() ) {
- Filter filter = buildLuceneFilter( fullTextFilter );
- chainedFilter.addFilter( filter );
+ if ( userFilter != null ) {
+ //chainedFilter is not always necessary here but the code is easier to read
+ if (chainedFilter == null) chainedFilter = new ChainedFilter();
+ chainedFilter.addFilter( userFilter );
}
- if ( filter != null ) {
- chainedFilter.addFilter( filter );
+ if ( chainedFilter == null || chainedFilter.isEmpty() ) {
+ filter = null;
}
- filter = chainedFilter;
+ else {
+ filter = chainedFilter;
+ }
}
/**
@@ -430,6 +443,10 @@
* as FilterCachingStrategy ensure a memory barrier between concurrent thread calls
*/
FilterDef def = searchFactoryImplementor.getFilterDefinition( fullTextFilter.getName()
);
+ //def can never be null, ti's guarded by enableFullTextFilter(String)
+
+ if ( isPreQueryFilterOnly(def) ) return null;
+
Object instance = createFilterInstance( fullTextFilter, def );
FilterKey key = createFilterKey( def, instance );
@@ -449,6 +466,10 @@
return filter;
}
+ private boolean isPreQueryFilterOnly(FilterDef def) {
+ return def.getImpl().equals( ShardSensitiveOnlyFilter.class );
+ }
+
private Filter createFilter(FilterDef def, Object instance) {
Filter filter;
if ( def.getFactoryMethod() != null ) {
@@ -633,7 +654,7 @@
*/
private IndexSearcher buildSearcher(SearchFactoryImplementor searchFactoryImplementor)
{
Map<Class<?>, DocumentBuilderIndexedEntity<?>> builders =
searchFactoryImplementor.getDocumentBuildersIndexedEntities();
- List<DirectoryProvider> directories = new ArrayList<DirectoryProvider>();
+ List<DirectoryProvider> targetedDirectories = new
ArrayList<DirectoryProvider>();
Set<String> idFieldNames = new HashSet<String>();
Similarity searcherSimilarity = null;
@@ -653,9 +674,7 @@
idFieldNames.add( builder.getIdKeywordName() );
allowFieldSelectionInProjection = allowFieldSelectionInProjection &&
builder.allowFieldSelectionInProjection();
}
- final DirectoryProvider[] directoryProviders =
builder.getDirectoryProviderSelectionStrategy()
- .getDirectoryProvidersForAllShards();
- populateDirectories( directories, directoryProviders );
+ populateDirectories( targetedDirectories, builder );
}
classesAndSubclasses = null;
}
@@ -679,10 +698,8 @@
idFieldNames.add( builder.getIdKeywordName() );
allowFieldSelectionInProjection = allowFieldSelectionInProjection &&
builder.allowFieldSelectionInProjection();
}
- final DirectoryProvider[] directoryProviders =
builder.getDirectoryProviderSelectionStrategy()
- .getDirectoryProvidersForAllShards();
searcherSimilarity = checkSimilarity( searcherSimilarity, builder );
- populateDirectories( directories, directoryProviders );
+ populateDirectories( targetedDirectories, builder );
}
this.classesAndSubclasses = involvedClasses;
}
@@ -691,7 +708,7 @@
//compute optimization needClassFilterClause
//if at least one DP contains one class that is not part of the targeted
classesAndSubclasses we can't optimize
if ( classesAndSubclasses != null ) {
- for ( DirectoryProvider dp : directories ) {
+ for ( DirectoryProvider dp : targetedDirectories ) {
final Set<Class<?>> classesInDirectoryProvider =
searchFactoryImplementor.getClassesInDirectoryProvider(
dp
);
@@ -712,7 +729,7 @@
}
//set up the searcher
- final DirectoryProvider[] directoryProviders = directories.toArray( new
DirectoryProvider[directories.size()] );
+ final DirectoryProvider[] directoryProviders = targetedDirectories.toArray( new
DirectoryProvider[targetedDirectories.size()] );
IndexSearcher is = new IndexSearcher(
searchFactoryImplementor.getReaderProvider().openReader(
directoryProviders
@@ -722,7 +739,19 @@
return is;
}
- private void populateDirectories(List<DirectoryProvider> directories,
DirectoryProvider[] directoryProviders) {
+ private void populateDirectories(List<DirectoryProvider> directories,
DocumentBuilderIndexedEntity builder) {
+ final IndexShardingStrategy indexShardingStrategy =
builder.getDirectoryProviderSelectionStrategy();
+ final DirectoryProvider[] directoryProviders;
+ if ( filterDefinitions != null && !filterDefinitions.isEmpty() ) {
+ directoryProviders = indexShardingStrategy.getDirectoryProvidersForQuery(
+ filterDefinitions.values().toArray( new
FullTextFilterImplementor[filterDefinitions.size()] )
+ );
+ }
+ else {
+ //no filter get all shards
+ directoryProviders = indexShardingStrategy.getDirectoryProvidersForQuery(
EMPTY_FULL_TEXT_FILTER_IMPLEMENTOR );
+ }
+
for ( DirectoryProvider provider : directoryProviders ) {
if ( !directories.contains( provider ) ) {
directories.add( provider );
Modified:
search/trunk/src/main/java/org/hibernate/search/store/IdHashShardingStrategy.java
===================================================================
---
search/trunk/src/main/java/org/hibernate/search/store/IdHashShardingStrategy.java 2009-06-10
21:48:21 UTC (rev 16754)
+++
search/trunk/src/main/java/org/hibernate/search/store/IdHashShardingStrategy.java 2009-06-11
00:43:09 UTC (rev 16755)
@@ -6,6 +6,8 @@
import org.apache.lucene.document.Document;
+import org.hibernate.search.filter.FullTextFilterImplementor;
+
/**
* This implementation use idInString as the hashKey.
*
@@ -31,6 +33,10 @@
return new DirectoryProvider[] { providers[hashKey( idInString )] };
}
+ public DirectoryProvider<?>[]
getDirectoryProvidersForQuery(FullTextFilterImplementor[] fullTextFilters) {
+ return getDirectoryProvidersForAllShards();
+ }
+
private int hashKey(String key) {
// reproduce the hashCode implementation of String as documented in the javadoc
// to be safe cross Java version (in case it changes some day)
Modified:
search/trunk/src/main/java/org/hibernate/search/store/IndexShardingStrategy.java
===================================================================
---
search/trunk/src/main/java/org/hibernate/search/store/IndexShardingStrategy.java 2009-06-10
21:48:21 UTC (rev 16754)
+++
search/trunk/src/main/java/org/hibernate/search/store/IndexShardingStrategy.java 2009-06-11
00:43:09 UTC (rev 16755)
@@ -6,6 +6,8 @@
import org.apache.lucene.document.Document;
+import org.hibernate.search.filter.FullTextFilterImplementor;
+
/**
* Defines how a given virtual index shards data into different DirectoryProviders
*
@@ -32,4 +34,13 @@
* id and idInString can be null. If null, all the directory providers containing entity
types should be returned
*/
DirectoryProvider<?>[] getDirectoryProvidersForDeletion(Class<?> entity,
Serializable id, String idInString);
+
+ /**
+ * return the set of DirectoryProvider(s) where the entities matching the filters are
stored
+ * this optional optimization allows queries to hit a subset of all shards, which may be
useful for some datasets
+ * if this optimization is not needed, return getDirectoryProvidersForAllShards()
+ *
+ * fullTextFilters can be empty if no filter is applied
+ */
+ DirectoryProvider<?>[] getDirectoryProvidersForQuery(FullTextFilterImplementor[]
fullTextFilters);
}
Modified: search/trunk/src/main/java/org/hibernate/search/store/NotShardedStrategy.java
===================================================================
---
search/trunk/src/main/java/org/hibernate/search/store/NotShardedStrategy.java 2009-06-10
21:48:21 UTC (rev 16754)
+++
search/trunk/src/main/java/org/hibernate/search/store/NotShardedStrategy.java 2009-06-11
00:43:09 UTC (rev 16755)
@@ -6,6 +6,7 @@
import org.apache.lucene.document.Document;
import org.hibernate.annotations.common.AssertionFailure;
+import org.hibernate.search.filter.FullTextFilterImplementor;
/**
* @author Emmanuel Bernard
@@ -31,4 +32,8 @@
return directoryProvider;
}
+ public DirectoryProvider<?>[]
getDirectoryProvidersForQuery(FullTextFilterImplementor[] fullTextFilters) {
+ return directoryProvider;
+ }
+
}
Modified:
search/trunk/src/test/java/org/hibernate/search/test/configuration/UselessShardingStrategy.java
===================================================================
---
search/trunk/src/test/java/org/hibernate/search/test/configuration/UselessShardingStrategy.java 2009-06-10
21:48:21 UTC (rev 16754)
+++
search/trunk/src/test/java/org/hibernate/search/test/configuration/UselessShardingStrategy.java 2009-06-11
00:43:09 UTC (rev 16755)
@@ -8,6 +8,7 @@
import org.apache.lucene.document.Document;
import org.hibernate.search.store.DirectoryProvider;
import org.hibernate.search.store.IndexShardingStrategy;
+import org.hibernate.search.filter.FullTextFilterImplementor;
/**
* Used to test the configuration of a third-party strategy
@@ -27,6 +28,10 @@
return null;
}
+ public DirectoryProvider<?>[]
getDirectoryProvidersForQuery(FullTextFilterImplementor[] fullTextFilters) {
+ return null;
+ }
+
public void initialize(Properties properties, DirectoryProvider<?>[] providers) {
Enumeration<?> propertyNames = properties.propertyNames();
int counter;
Added:
search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategy.java
===================================================================
---
search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategy.java
(rev 0)
+++
search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategy.java 2009-06-11
00:43:09 UTC (rev 16755)
@@ -0,0 +1,65 @@
+package org.hibernate.search.test.shards;
+
+import java.io.Serializable;
+import java.util.Properties;
+
+import org.apache.lucene.document.Document;
+
+import org.hibernate.search.FullTextFilter;
+import org.hibernate.search.filter.FullTextFilterImplementor;
+import org.hibernate.search.store.DirectoryProvider;
+import org.hibernate.search.store.IndexShardingStrategy;
+
+/**
+ * Shards an index containing data for multiple customers by customerID. customerID is
+ * provided as a property on all indexes entities, and is also defined as a Filter.
+ *
+ * The number of shards should be configured to be MAX(customerID).
+ *
+ * @author Chase Seibert
+ */
+public class CustomerShardingStrategy implements IndexShardingStrategy {
+
+ // stored DirectoryProviders in a array indexed by customerID
+ private DirectoryProvider<?>[] providers;
+
+ public void initialize(Properties properties, DirectoryProvider<?>[] providers) {
+ this.providers = providers;
+ }
+
+ public DirectoryProvider<?>[] getDirectoryProvidersForAllShards() {
+ return providers;
+ }
+
+ public DirectoryProvider<?> getDirectoryProviderForAddition(Class<?> entity,
Serializable id, String idInString, Document document) {
+ Integer customerID =
Integer.parseInt(document.getField("customerID").stringValue());
+ return providers[customerID];
+ }
+
+ public DirectoryProvider<?>[] getDirectoryProvidersForDeletion(Class<?>
entity, Serializable id, String idInString) {
+ return getDirectoryProvidersForAllShards();
+ }
+
+ /**
+ * Optimization; don't search ALL shards and union the results; in this case, we
+ * can be certain that all the data for a particular customer Filter is in a single
+ * shard; simply return that shard by customerID.
+ */
+ public DirectoryProvider<?>[]
getDirectoryProvidersForQuery(FullTextFilterImplementor[] filters) {
+ FullTextFilter filter = getCustomerFilter(filters, "customer");
+ if (filter == null) {
+ return getDirectoryProvidersForAllShards();
+ }
+ else {
+ return new DirectoryProvider[] {
providers[Integer.parseInt(filter.getParameter("customerID").toString())] };
+ }
+ }
+
+ private FullTextFilter getCustomerFilter(FullTextFilterImplementor[] filters, String
name) {
+ for (FullTextFilterImplementor filter: filters) {
+ if (filter.getName().equals(name)) return filter;
+ }
+ return null;
+ }
+
+}
Added:
search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategyTest.java
===================================================================
---
search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategyTest.java
(rev 0)
+++
search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategyTest.java 2009-06-11
00:43:09 UTC (rev 16755)
@@ -0,0 +1,56 @@
+package org.hibernate.search.test.shards;
+
+import junit.framework.TestCase;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.Field;
+
+import org.hibernate.search.query.FullTextFilterImpl;
+import org.hibernate.search.store.DirectoryProvider;
+import org.hibernate.search.store.RAMDirectoryProvider;
+
+/**
+ * @author Chase Seibert
+ */
+public class CustomerShardingStrategyTest extends TestCase {
+
+ private CustomerShardingStrategy shardStrategy;
+
+ protected void setUp() throws Exception {
+ shardStrategy = new CustomerShardingStrategy();
+
+ // initilaize w/ 10 shards
+ shardStrategy.initialize( null, new DirectoryProvider[] {
+ new RAMDirectoryProvider(),
+ new RAMDirectoryProvider(),
+ new RAMDirectoryProvider(),
+ new RAMDirectoryProvider(),
+ new RAMDirectoryProvider(),
+ new RAMDirectoryProvider(),
+ new RAMDirectoryProvider(),
+ new RAMDirectoryProvider(),
+ new RAMDirectoryProvider(),
+ new RAMDirectoryProvider()
+ } );
+ }
+
+ public void testGetDirectoryProvidersForQuery() {
+
+ FullTextFilterImpl filter = new FullTextFilterImpl();
+ filter.setName("customer");
+ filter.setParameter("customerID", 5);
+
+ // customerID == 5 should correspond to just a single shard instance
+ DirectoryProvider[] providers = shardStrategy.getDirectoryProvidersForQuery(new
FullTextFilterImpl[] { filter });
+ assertTrue(providers.length == 1);
+
+ // create a dummy document for the same customerID, and make sure the shard it would
be
+ // indexed on matches the shard returned by getDirectoryProvidersForQuery()
+ Document document = new Document();
+ document.add(new Field("customerID", "5", Field.Store.NO,
Field.Index.UN_TOKENIZED));
+
+ assertTrue(providers[0].equals(
+ shardStrategy.getDirectoryProviderForAddition(null, null, null, document)
+ ));
+ }
+
+}
\ No newline at end of file