[hibernate-commits] Hibernate SVN: r16755 - in search/trunk/src: main/java/org/hibernate/search/filter and 4 other directories.

hibernate-commits at lists.jboss.org hibernate-commits at lists.jboss.org
Wed Jun 10 20:43:10 EDT 2009


Author: epbernard
Date: 2009-06-10 20:43:09 -0400 (Wed, 10 Jun 2009)
New Revision: 16755

Added:
   search/trunk/src/main/java/org/hibernate/search/filter/FullTextFilterImplementor.java
   search/trunk/src/main/java/org/hibernate/search/filter/ShardSensitiveOnlyFilter.java
   search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategy.java
   search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategyTest.java
Modified:
   search/trunk/src/main/docbook/en-US/modules/configuration.xml
   search/trunk/src/main/docbook/en-US/modules/query.xml
   search/trunk/src/main/java/org/hibernate/search/filter/ChainedFilter.java
   search/trunk/src/main/java/org/hibernate/search/query/FullTextFilterImpl.java
   search/trunk/src/main/java/org/hibernate/search/query/FullTextQueryImpl.java
   search/trunk/src/main/java/org/hibernate/search/store/IdHashShardingStrategy.java
   search/trunk/src/main/java/org/hibernate/search/store/IndexShardingStrategy.java
   search/trunk/src/main/java/org/hibernate/search/store/NotShardedStrategy.java
   search/trunk/src/test/java/org/hibernate/search/test/configuration/UselessShardingStrategy.java
Log:
HSEARCH-251 Query on a shard subset based on a filter activation

Modified: search/trunk/src/main/docbook/en-US/modules/configuration.xml
===================================================================
--- search/trunk/src/main/docbook/en-US/modules/configuration.xml	2009-06-10 21:48:21 UTC (rev 16754)
+++ search/trunk/src/main/docbook/en-US/modules/configuration.xml	2009-06-11 00:43:09 UTC (rev 16755)
@@ -206,21 +206,33 @@
   <section id="search-configuration-directory-sharding" revision="1">
     <title>Sharding indexes</title>
 
-    <para>In some extreme cases involving huge indexes (in size), it is
-    necessary to split (shard) the indexing data of a given entity type into
-    several Lucene indexes. This solution is not recommended until you reach
-    significant index sizes and index update times are slowing the application
-    down. The main drawback of index sharding is that searches will end up
-    being slower since more files have to be opened for a single search. In
+    <para>In some cases, it is necessary to split (shard) the indexing data of
+    a given entity type into several Lucene indexes. This solution is not
+    recommended unless there is a pressing need because by default, searches
+    will be slower as all shards have to be opened for a single search. In
     other words don't do it until you have problems :)</para>
 
-    <para>Despite this strong warning, Hibernate Search allows you to index a
-    given entity type into several sub indexes. Data is sharded into the
-    different sub indexes thanks to an
-    <classname>IndexShardingStrategy</classname>. By default, no sharding
-    strategy is enabled, unless the number of shards is configured. To
-    configure the number of shards use the following property</para>
+    <para>For example, sharding may be desirable if:</para>
 
+    <itemizedlist>
+      <listitem>
+        <para>A single index is so huge that index update times are slowing
+        the application down.</para>
+      </listitem>
+
+      <listitem>
+        <para>A typical search will only hit a sub-set of the index, such as
+        when data is naturally segmented by customer, region or
+        application.</para>
+      </listitem>
+    </itemizedlist>
+
+    <para>Hibernate Search allows you to index a given entity type into
+    several sub indexes. Data is sharded into the different sub indexes thanks
+    to an <classname>IndexShardingStrategy</classname>. By default, no
+    sharding strategy is enabled, unless the number of shards is configured.
+    To configure the number of shards use the following property</para>
+
     <example>
       <title>Enabling index sharding by specifying nbr_of_shards for a
       specific index</title>
@@ -243,10 +255,21 @@
       <programlisting>hibernate.search.&lt;indexName&gt;.sharding_strategy my.shardingstrategy.Implementation</programlisting>
     </example>
 
+    <para>Using a custom <classname>IndexShardingStrategy</classname>
+    implementation, it's possible to define what shard a given entity is
+    indexed to. </para>
+
+    <para>It also allows for optimizing searches by selecting which shard to
+    run the query onto. By activating a filter (see <xref
+    linkend="query-filter-shard" />), a sharding strategy can select a subset
+    of the shards used to answer a query
+    (<classname>IndexShardingStrategy.getDirectoryProvidersForQuery</classname>)
+    and thus speed up the query execution.</para>
+
     <para>Each shard has an independent directory provider configuration as
     described in <xref linkend="search-configuration-directory" />. The
-    DirectoryProvider default name for the previous example are
-    <literal>&lt;indexName&gt;.0</literal> to
+    <classname>DirectoryProvider</classname> default name for the previous
+    example are <literal>&lt;indexName&gt;.0</literal> to
     <literal>&lt;indexName&gt;.4</literal>. In other words, each shard has the
     name of it's owning index followed by <constant>.</constant> (dot) and its
     index number.</para>
@@ -367,14 +390,15 @@
 
             <entry>Out of the box support for the Apache Lucene back end and
             the JMS back end. Default to <literal>lucene</literal>. Supports
-            also <literal>jms</literal> and <literal>blackhole</literal>.</entry>
+            also <literal>jms</literal> and
+            <literal>blackhole</literal>.</entry>
           </row>
 
           <row>
             <entry><literal>hibernate.search.worker.execution</literal></entry>
 
-            <entry>Supports synchronous and asynchronous execution. Default
-            to <literal><literal>sync</literal></literal>. Supports also
+            <entry>Supports synchronous and asynchronous execution. Default to
+            <literal><literal>sync</literal></literal>. Supports also
             <literal>async</literal>.</entry>
           </row>
 
@@ -445,8 +469,8 @@
     <section>
       <title>Slave nodes</title>
 
-      <para>Every index update operation is sent to a JMS queue. Index querying
-      operations are executed on a local index copy.</para>
+      <para>Every index update operation is sent to a JMS queue. Index
+      querying operations are executed on a local index copy.</para>
 
       <example>
         <title>JMS Slave configuration</title>
@@ -605,8 +629,9 @@
       <para>To enable Hibernate Search in Hibernate Core (ie. if you don't use
       Hibernate Annotations), add the
       <literal>FullTextIndexEventListener</literal> for the following six
-      Hibernate events and also add it after the default 
-      <literal>DefaultFlushEventListener</literal>, as in the following example.</para>
+      Hibernate events and also add it after the default
+      <literal>DefaultFlushEventListener</literal>, as in the following
+      example.</para>
 
       <example>
         <title>Explicitly enabling Hibernate Search by configuring the
@@ -768,13 +793,13 @@
             terms.</para> <para>This silently truncates large documents,
             excluding from the index all terms that occur further in the
             document. If you know your source documents are large, be sure to
-            set this value high enough to accommodate the expected size. If you
-            set it to Integer.MAX_VALUE, then the only limit is your memory,
-            but you should anticipate an OutOfMemoryError. </para> <para>If
-            setting this value in <literal>batch</literal> differently than in
-            <literal>transaction</literal> you may get different data (and
-            results) in your index depending on the indexing
-            mode.</para></entry>
+            set this value high enough to accommodate the expected size. If
+            you set it to Integer.MAX_VALUE, then the only limit is your
+            memory, but you should anticipate an OutOfMemoryError. </para>
+            <para>If setting this value in <literal>batch</literal>
+            differently than in <literal>transaction</literal> you may get
+            different data (and results) in your index depending on the
+            indexing mode.</para></entry>
 
             <entry>10000</entry>
           </row>
@@ -852,24 +877,26 @@
         </tbody>
       </tgroup>
     </table>
-    
-    <para>To tune the indexing speed it might be useful to time the
-    object loading from database in isolation from the writes to the index.
-    To achieve this set the <literal>blackhole</literal> as worker backend and start
-    you indexing routines.
-    This backend does not disable Hibernate Search: it will still generate the needed
-    changesets to the index, but will discard them instead of flushing them to the index.
-    As opposite to setting the <literal>hibernate.search.indexing_strategy</literal>
-    to <literal>manual</literal> when using <literal>blackhole</literal> it will possibly load
-    more data to rebuild the index from associated entities.</para>
-    
+
+    <para>To tune the indexing speed it might be useful to time the object
+    loading from database in isolation from the writes to the index. To
+    achieve this set the <literal>blackhole</literal> as worker backend and
+    start you indexing routines. This backend does not disable Hibernate
+    Search: it will still generate the needed changesets to the index, but
+    will discard them instead of flushing them to the index. As opposite to
+    setting the <literal>hibernate.search.indexing_strategy</literal> to
+    <literal>manual</literal> when using <literal>blackhole</literal> it will
+    possibly load more data to rebuild the index from associated
+    entities.</para>
+
     <programlisting>hibernate.search.worker.backend blackhole</programlisting>
-    
-    <para>The recommended approach is to focus first on optimizing the object loading, and then
-    use the timings you achieve as a baseline to tune the indexing process.</para>
-    <para>The <literal>blackhole</literal> backend is not meant to be used in production, only
-    as a tool to identify indexing bottlenecks.</para>
-    
+
+    <para>The recommended approach is to focus first on optimizing the object
+    loading, and then use the timings you achieve as a baseline to tune the
+    indexing process.</para>
+
+    <para>The <literal>blackhole</literal> backend is not meant to be used in
+    production, only as a tool to identify indexing bottlenecks.</para>
   </section>
 
   <section id="search-configuration-directory-lockfactories" revision="1">
@@ -883,6 +910,8 @@
     for most cases, but it's possible to specify for each index managed by
     Hibernate Search which LockingFactory you want to use.</para>
 
+     
+
     <para>Some of these locking strategies require a filesystem level lock and
     may be used even on RAM based indexes, but this is not recommended and of
     no practical use.</para>
@@ -976,7 +1005,7 @@
         </tgroup>
       </table></para>
 
-    Configuration example: 
+     Configuration example: 
 
     <programlisting>hibernate.search.default.locking_strategy simple
 hibernate.search.Animals.locking_strategy native
@@ -988,4 +1017,4 @@
 
      
   </section>
-</chapter>
+</chapter>
\ No newline at end of file

Modified: search/trunk/src/main/docbook/en-US/modules/query.xml
===================================================================
--- search/trunk/src/main/docbook/en-US/modules/query.xml	2009-06-10 21:48:21 UTC (rev 16754)
+++ search/trunk/src/main/docbook/en-US/modules/query.xml	2009-06-11 00:43:09 UTC (rev 16755)
@@ -345,8 +345,8 @@
           </listitem>
 
           <listitem>
-            <para>FullTextQuery.OBJECT_CLASS: returns the class of the
-            indexed entity.</para>
+            <para>FullTextQuery.OBJECT_CLASS: returns the class of the indexed
+            entity.</para>
           </listitem>
 
           <listitem>
@@ -545,7 +545,7 @@
     </section>
   </section>
 
-  <section>
+  <section id="query-filter">
     <title>Filters</title>
 
     <para>Apache Lucene has a powerful feature that allows to filter query
@@ -833,6 +833,105 @@
         time spent to execute the query)</para>
       </listitem>
     </itemizedlist>
+
+    <section id="query-filter-shard">
+      <title>Using filters in a sharded environment</title>
+
+      <para>It is possible, in a sharded environment to execute queries on a
+      subset of the available shards. This can be done in two steps:</para>
+
+      <itemizedlist>
+        <listitem>
+          <para>create a sharding strategy that does select a subset of
+          <classname>DirectoryProvider</classname>s depending on sone filter
+          configuration</para>
+        </listitem>
+
+        <listitem>
+          <para>activate the proper filter at query time</para>
+        </listitem>
+      </itemizedlist>
+
+      <para>Let's first look at an example of sharding strategy that query on
+      a specific customer shard if the customer filter is activated.</para>
+
+      <programlisting>public class CustomerShardingStrategy implements IndexShardingStrategy {
+
+	// stored DirectoryProviders in a array indexed by customerID
+	private DirectoryProvider&lt;?&gt;[] providers;
+	
+	public void initialize(Properties properties, DirectoryProvider&lt;?&gt;[] providers) {
+		this.providers = providers;
+	}
+
+	public DirectoryProvider&lt;?&gt;[] getDirectoryProvidersForAllShards() {
+		return providers;
+	}
+
+	public DirectoryProvider&lt;?&gt; getDirectoryProviderForAddition(Class&lt;?&gt; entity, Serializable id, String idInString, Document document) {
+		Integer customerID = Integer.parseInt(document.getField("customerID").stringValue());
+		return providers[customerID];
+	}
+
+	public DirectoryProvider&lt;?&gt;[] getDirectoryProvidersForDeletion(Class&lt;?&gt; entity, Serializable id, String idInString) {
+		return getDirectoryProvidersForAllShards();
+	}
+
+<emphasis role="bold">	/**
+	 * Optimization; don't search ALL shards and union the results; in this case, we 
+	 * can be certain that all the data for a particular customer Filter is in a single
+	 * shard; simply return that shard by customerID.
+	 */
+	public DirectoryProvider&lt;?&gt;[] getDirectoryProvidersForQuery(FullTextFilterImplementor[] filters) {
+		FFullTextFilter filter = getCustomerFilter(filters, "customer");
+		if (filter == null) {
+			return getDirectoryProvidersForAllShards();
+		}
+		else {
+			return new DirectoryProvider[] { providers[Integer.parseInt(filter.getParameter("customerID").toString())] };
+		}
+	}
+
+	private FullTextFilter getFilter(FullTextFilterImplementor[] filters, String name) {
+		for (FullTextFilterImplementor filter: filters) {
+			if (filter.getName().equals(name)) return filter;
+		}
+		return null;
+	}</emphasis>
+
+}</programlisting>
+
+      <para>In this example, if the filter named <literal>customer</literal>
+      is present, we make sure to only use the shard dedicated to this
+      customer. Otherwise, we return all shards. A given Sharding strategy can
+      react to one or more filters and depends on their parameters.</para>
+
+      <para>The second step is simply to activate the filter at query time.
+      While the filter can be a regular filter (as defined in <xref
+      linkend="query-filter" />) which also filters Lucene results after the
+      query, you can make use of a special filter that will only be passed to
+      the sharding strategy and otherwise ignored for the rest of the query.
+      Simply use the <classname>ShardSensitiveOnlyFilter</classname> class
+      when declaring your filter.</para>
+
+      <programlisting>@Entity @Indexed
+<emphasis role="bold">@FullTextFilterDef(name="customer", impl=ShardSensitiveOnlyFilter.class)</emphasis>
+public class Customer {
+   ...
+}
+
+
+FullTextQuery query = ftEm.createFullTextQuery(luceneQuery, Customer.class);
+<emphasis role="bold">query.enableFulltextFilter("customer").setParameter("CustomerID", 5);</emphasis>
+ at SuppressWarnings("unchecked")
+List&lt;Customer&gt; results = query.getResultList();</programlisting>
+
+      <para>Note that by using the
+      <classname>ShardSensitiveOnlyFilter</classname>, you do not have to
+      implement any Lucene filter. Using filters and sharding strategy
+      reacting to these filters is recommended to speed up queries in a
+      sharded environment.</para>
+    </section>
   </section>
 
   <section>
@@ -866,4 +965,4 @@
     run Lucene specific queries. Check <xref linkend="search-lucene-native" />
     for more information.</para>
   </section>
-</chapter>
+</chapter>
\ No newline at end of file

Modified: search/trunk/src/main/java/org/hibernate/search/filter/ChainedFilter.java
===================================================================
--- search/trunk/src/main/java/org/hibernate/search/filter/ChainedFilter.java	2009-06-10 21:48:21 UTC (rev 16754)
+++ search/trunk/src/main/java/org/hibernate/search/filter/ChainedFilter.java	2009-06-11 00:43:09 UTC (rev 16755)
@@ -30,6 +30,10 @@
 		this.chainedFilters.add( filter );
 	}
 
+	public boolean isEmpty() {
+		return chainedFilters.size() == 0;
+	}
+
 	public BitSet bits(IndexReader reader) throws IOException {
 		throw new UnsupportedOperationException();
 	}

Added: search/trunk/src/main/java/org/hibernate/search/filter/FullTextFilterImplementor.java
===================================================================
--- search/trunk/src/main/java/org/hibernate/search/filter/FullTextFilterImplementor.java	                        (rev 0)
+++ search/trunk/src/main/java/org/hibernate/search/filter/FullTextFilterImplementor.java	2009-06-11 00:43:09 UTC (rev 16755)
@@ -0,0 +1,15 @@
+package org.hibernate.search.filter;
+
+import org.hibernate.search.FullTextFilter;
+
+/**
+ * @author Emmanuel Bernard
+ */
+public interface FullTextFilterImplementor extends FullTextFilter {
+	/**
+	 * Returns the Filter name
+	 */
+	String getName();
+
+	//TODO should we expose Map<String, Object> getParameters()
+}

Added: search/trunk/src/main/java/org/hibernate/search/filter/ShardSensitiveOnlyFilter.java
===================================================================
--- search/trunk/src/main/java/org/hibernate/search/filter/ShardSensitiveOnlyFilter.java	                        (rev 0)
+++ search/trunk/src/main/java/org/hibernate/search/filter/ShardSensitiveOnlyFilter.java	2009-06-11 00:43:09 UTC (rev 16755)
@@ -0,0 +1,12 @@
+package org.hibernate.search.filter;
+
+/**
+ * When using this class in @FullTextFilterDef.impl, Hibernate Search
+ * considers the filter to be only influencing the sharding strategy.
+ *
+ * This filter is not applied on the results of the Lucene query.
+ *
+ * @author Emmanuel Bernard
+ */
+public interface ShardSensitiveOnlyFilter {
+}

Modified: search/trunk/src/main/java/org/hibernate/search/query/FullTextFilterImpl.java
===================================================================
--- search/trunk/src/main/java/org/hibernate/search/query/FullTextFilterImpl.java	2009-06-10 21:48:21 UTC (rev 16754)
+++ search/trunk/src/main/java/org/hibernate/search/query/FullTextFilterImpl.java	2009-06-11 00:43:09 UTC (rev 16755)
@@ -5,11 +5,12 @@
 import java.util.Map;
 
 import org.hibernate.search.FullTextFilter;
+import org.hibernate.search.filter.FullTextFilterImplementor;
 
 /**
  * @author Emmanuel Bernard
  */
-public class FullTextFilterImpl implements FullTextFilter {
+public class FullTextFilterImpl implements FullTextFilterImplementor {
 	private final Map<String, Object> parameters = new HashMap<String, Object>();
 	private String name;
 

Modified: search/trunk/src/main/java/org/hibernate/search/query/FullTextQueryImpl.java
===================================================================
--- search/trunk/src/main/java/org/hibernate/search/query/FullTextQueryImpl.java	2009-06-10 21:48:21 UTC (rev 16754)
+++ search/trunk/src/main/java/org/hibernate/search/query/FullTextQueryImpl.java	2009-06-11 00:43:09 UTC (rev 16755)
@@ -54,9 +54,12 @@
 import org.hibernate.search.filter.ChainedFilter;
 import org.hibernate.search.filter.FilterKey;
 import org.hibernate.search.filter.StandardFilterKey;
+import org.hibernate.search.filter.FullTextFilterImplementor;
+import org.hibernate.search.filter.ShardSensitiveOnlyFilter;
 import org.hibernate.search.reader.ReaderProvider;
 import static org.hibernate.search.reader.ReaderProviderHelper.getIndexReaders;
 import org.hibernate.search.store.DirectoryProvider;
+import org.hibernate.search.store.IndexShardingStrategy;
 import org.hibernate.search.util.ContextHelper;
 import static org.hibernate.search.util.FilterCacheModeTypeHelper.cacheInstance;
 import static org.hibernate.search.util.FilterCacheModeTypeHelper.cacheResults;
@@ -84,6 +87,7 @@
 	private Integer resultSize;
 	private Sort sort;
 	private Filter filter;
+	private Filter userFilter;
 	private Criteria criteria;
 	private String[] indexProjection;
 	private Set<String> idFieldNames;
@@ -92,7 +96,9 @@
 	private SearchFactoryImplementor searchFactoryImplementor;
 	private Map<String, FullTextFilterImpl> filterDefinitions;
 	private int fetchSize = 1;
+	private static final FullTextFilterImplementor[] EMPTY_FULL_TEXT_FILTER_IMPLEMENTOR = new FullTextFilterImplementor[0];
 
+
 	/**
 	 * Constructs a  <code>FullTextQueryImpl</code> instance.
 	 *
@@ -127,7 +133,7 @@
 	 * {@inheritDoc}
 	 */
 	public FullTextQuery setFilter(Filter filter) {
-		this.filter = filter;
+		this.userFilter = filter;
 		return this;
 	}
 
@@ -399,20 +405,27 @@
 	}
 
 	private void buildFilters() {
-		if ( filterDefinitions == null || filterDefinitions.size() == 0 ) {
-			return; // there is nothing to do if we don't have any filter definitions
+		ChainedFilter chainedFilter = null;
+		if ( ! ( filterDefinitions == null || filterDefinitions.size() == 0 ) ) {
+			chainedFilter = new ChainedFilter();
+			for ( FullTextFilterImpl fullTextFilter : filterDefinitions.values() ) {
+				Filter filter = buildLuceneFilter( fullTextFilter );
+				if (filter != null) chainedFilter.addFilter( filter );
+			}
 		}
 
-		ChainedFilter chainedFilter = new ChainedFilter();
-		for ( FullTextFilterImpl fullTextFilter : filterDefinitions.values() ) {
-			Filter filter = buildLuceneFilter( fullTextFilter );
-			chainedFilter.addFilter( filter );
+		if ( userFilter != null ) {
+			//chainedFilter is not always necessary here but the code is easier to read
+			if (chainedFilter == null) chainedFilter = new ChainedFilter();
+			chainedFilter.addFilter( userFilter );
 		}
 
-		if ( filter != null ) {
-			chainedFilter.addFilter( filter );
+		if ( chainedFilter == null || chainedFilter.isEmpty() ) {
+			filter = null;
 		}
-		filter = chainedFilter;
+		else {
+			filter = chainedFilter;
+		}
 	}
 
 	/**
@@ -430,6 +443,10 @@
 		 * as FilterCachingStrategy ensure a memory barrier between concurrent thread calls
 		 */
 		FilterDef def = searchFactoryImplementor.getFilterDefinition( fullTextFilter.getName() );
+		//def can never be null, ti's guarded by enableFullTextFilter(String)
+
+		if ( isPreQueryFilterOnly(def) ) return null;
+
 		Object instance = createFilterInstance( fullTextFilter, def );
 		FilterKey key = createFilterKey( def, instance );
 
@@ -449,6 +466,10 @@
 		return filter;
 	}
 
+	private boolean isPreQueryFilterOnly(FilterDef def) {
+		return def.getImpl().equals( ShardSensitiveOnlyFilter.class );
+	}
+
 	private Filter createFilter(FilterDef def, Object instance) {
 		Filter filter;
 		if ( def.getFactoryMethod() != null ) {
@@ -633,7 +654,7 @@
 	 */
 	private IndexSearcher buildSearcher(SearchFactoryImplementor searchFactoryImplementor) {
 		Map<Class<?>, DocumentBuilderIndexedEntity<?>> builders = searchFactoryImplementor.getDocumentBuildersIndexedEntities();
-		List<DirectoryProvider> directories = new ArrayList<DirectoryProvider>();
+		List<DirectoryProvider> targetedDirectories = new ArrayList<DirectoryProvider>();
 		Set<String> idFieldNames = new HashSet<String>();
 
 		Similarity searcherSimilarity = null;
@@ -653,9 +674,7 @@
 					idFieldNames.add( builder.getIdKeywordName() );
 					allowFieldSelectionInProjection = allowFieldSelectionInProjection && builder.allowFieldSelectionInProjection();
 				}
-				final DirectoryProvider[] directoryProviders = builder.getDirectoryProviderSelectionStrategy()
-						.getDirectoryProvidersForAllShards();
-				populateDirectories( directories, directoryProviders );
+				populateDirectories( targetedDirectories, builder );
 			}
 			classesAndSubclasses = null;
 		}
@@ -679,10 +698,8 @@
 					idFieldNames.add( builder.getIdKeywordName() );
 					allowFieldSelectionInProjection = allowFieldSelectionInProjection && builder.allowFieldSelectionInProjection();
 				}
-				final DirectoryProvider[] directoryProviders = builder.getDirectoryProviderSelectionStrategy()
-						.getDirectoryProvidersForAllShards();
 				searcherSimilarity = checkSimilarity( searcherSimilarity, builder );
-				populateDirectories( directories, directoryProviders );
+				populateDirectories( targetedDirectories, builder );
 			}
 			this.classesAndSubclasses = involvedClasses;
 		}
@@ -691,7 +708,7 @@
 		//compute optimization needClassFilterClause
 		//if at least one DP contains one class that is not part of the targeted classesAndSubclasses we can't optimize
 		if ( classesAndSubclasses != null ) {
-			for ( DirectoryProvider dp : directories ) {
+			for ( DirectoryProvider dp : targetedDirectories ) {
 				final Set<Class<?>> classesInDirectoryProvider = searchFactoryImplementor.getClassesInDirectoryProvider(
 						dp
 				);
@@ -712,7 +729,7 @@
 		}
 
 		//set up the searcher
-		final DirectoryProvider[] directoryProviders = directories.toArray( new DirectoryProvider[directories.size()] );
+		final DirectoryProvider[] directoryProviders = targetedDirectories.toArray( new DirectoryProvider[targetedDirectories.size()] );
 		IndexSearcher is = new IndexSearcher(
 				searchFactoryImplementor.getReaderProvider().openReader(
 						directoryProviders
@@ -722,7 +739,19 @@
 		return is;
 	}
 
-	private void populateDirectories(List<DirectoryProvider> directories, DirectoryProvider[] directoryProviders) {
+	private void populateDirectories(List<DirectoryProvider> directories, DocumentBuilderIndexedEntity builder) {
+		final IndexShardingStrategy indexShardingStrategy = builder.getDirectoryProviderSelectionStrategy();
+		final DirectoryProvider[] directoryProviders;
+		if ( filterDefinitions != null && !filterDefinitions.isEmpty() ) {
+			directoryProviders = indexShardingStrategy.getDirectoryProvidersForQuery(
+				filterDefinitions.values().toArray( new FullTextFilterImplementor[filterDefinitions.size()] )
+			);
+		}
+		else {
+			//no filter get all shards
+			directoryProviders = indexShardingStrategy.getDirectoryProvidersForQuery( EMPTY_FULL_TEXT_FILTER_IMPLEMENTOR );
+		}
+		
 		for ( DirectoryProvider provider : directoryProviders ) {
 			if ( !directories.contains( provider ) ) {
 				directories.add( provider );

Modified: search/trunk/src/main/java/org/hibernate/search/store/IdHashShardingStrategy.java
===================================================================
--- search/trunk/src/main/java/org/hibernate/search/store/IdHashShardingStrategy.java	2009-06-10 21:48:21 UTC (rev 16754)
+++ search/trunk/src/main/java/org/hibernate/search/store/IdHashShardingStrategy.java	2009-06-11 00:43:09 UTC (rev 16755)
@@ -6,6 +6,8 @@
 
 import org.apache.lucene.document.Document;
 
+import org.hibernate.search.filter.FullTextFilterImplementor;
+
 /**
  * This implementation use idInString as the hashKey.
  * 
@@ -31,6 +33,10 @@
 		return new DirectoryProvider[] { providers[hashKey( idInString )] };
 	}
 
+	public DirectoryProvider<?>[] getDirectoryProvidersForQuery(FullTextFilterImplementor[] fullTextFilters) {
+		return getDirectoryProvidersForAllShards();
+	}
+
 	private int hashKey(String key) {
 		// reproduce the hashCode implementation of String as documented in the javadoc
 		// to be safe cross Java version (in case it changes some day)

Modified: search/trunk/src/main/java/org/hibernate/search/store/IndexShardingStrategy.java
===================================================================
--- search/trunk/src/main/java/org/hibernate/search/store/IndexShardingStrategy.java	2009-06-10 21:48:21 UTC (rev 16754)
+++ search/trunk/src/main/java/org/hibernate/search/store/IndexShardingStrategy.java	2009-06-11 00:43:09 UTC (rev 16755)
@@ -6,6 +6,8 @@
 
 import org.apache.lucene.document.Document;
 
+import org.hibernate.search.filter.FullTextFilterImplementor;
+
 /**
  * Defines how a given virtual index shards data into different DirectoryProviders
  *
@@ -32,4 +34,13 @@
 	 * id and idInString can be null. If null, all the directory providers containing entity types should be returned
 	 */
 	DirectoryProvider<?>[] getDirectoryProvidersForDeletion(Class<?> entity, Serializable id, String idInString);
+
+	/**
+	 * return the set of DirectoryProvider(s) where the entities matching the filters are stored
+	 * this optional optimization allows queries to hit a subset of all shards, which may be useful for some datasets
+	 * if this optimization is not needed, return getDirectoryProvidersForAllShards()
+	 *
+	 * fullTextFilters can be empty if no filter is applied
+	 */
+	DirectoryProvider<?>[] getDirectoryProvidersForQuery(FullTextFilterImplementor[] fullTextFilters);
 }

Modified: search/trunk/src/main/java/org/hibernate/search/store/NotShardedStrategy.java
===================================================================
--- search/trunk/src/main/java/org/hibernate/search/store/NotShardedStrategy.java	2009-06-10 21:48:21 UTC (rev 16754)
+++ search/trunk/src/main/java/org/hibernate/search/store/NotShardedStrategy.java	2009-06-11 00:43:09 UTC (rev 16755)
@@ -6,6 +6,7 @@
 
 import org.apache.lucene.document.Document;
 import org.hibernate.annotations.common.AssertionFailure;
+import org.hibernate.search.filter.FullTextFilterImplementor;
 
 /**
  * @author Emmanuel Bernard
@@ -31,4 +32,8 @@
 		return directoryProvider;
 	}
 
+	public DirectoryProvider<?>[] getDirectoryProvidersForQuery(FullTextFilterImplementor[] fullTextFilters) {
+		return directoryProvider;
+	}
+
 }

Modified: search/trunk/src/test/java/org/hibernate/search/test/configuration/UselessShardingStrategy.java
===================================================================
--- search/trunk/src/test/java/org/hibernate/search/test/configuration/UselessShardingStrategy.java	2009-06-10 21:48:21 UTC (rev 16754)
+++ search/trunk/src/test/java/org/hibernate/search/test/configuration/UselessShardingStrategy.java	2009-06-11 00:43:09 UTC (rev 16755)
@@ -8,6 +8,7 @@
 import org.apache.lucene.document.Document;
 import org.hibernate.search.store.DirectoryProvider;
 import org.hibernate.search.store.IndexShardingStrategy;
+import org.hibernate.search.filter.FullTextFilterImplementor;
 
 /**
  * Used to test the configuration of a third-party strategy
@@ -27,6 +28,10 @@
 		return null;
 	}
 
+	public DirectoryProvider<?>[] getDirectoryProvidersForQuery(FullTextFilterImplementor[] fullTextFilters) {
+		return null;
+	}
+
 	public void initialize(Properties properties, DirectoryProvider<?>[] providers) {
 		Enumeration<?> propertyNames = properties.propertyNames();
 		int counter;

Added: search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategy.java
===================================================================
--- search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategy.java	                        (rev 0)
+++ search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategy.java	2009-06-11 00:43:09 UTC (rev 16755)
@@ -0,0 +1,65 @@
+package org.hibernate.search.test.shards;
+
+import java.io.Serializable;
+import java.util.Properties;
+
+import org.apache.lucene.document.Document;
+
+import org.hibernate.search.FullTextFilter;
+import org.hibernate.search.filter.FullTextFilterImplementor;
+import org.hibernate.search.store.DirectoryProvider;
+import org.hibernate.search.store.IndexShardingStrategy;
+
+/**
+ * Shards an index containing data for multiple customers by customerID. customerID is
+ * provided as a property on all indexes entities, and is also defined as a Filter.
+ * 
+ * The number of shards should be configured to be MAX(customerID).
+ *
+ * @author Chase Seibert
+ */
+public class CustomerShardingStrategy implements IndexShardingStrategy {
+
+	// stored DirectoryProviders in a array indexed by customerID
+	private DirectoryProvider<?>[] providers;
+	
+	public void initialize(Properties properties, DirectoryProvider<?>[] providers) {
+		this.providers = providers;
+	}
+
+	public DirectoryProvider<?>[] getDirectoryProvidersForAllShards() {
+		return providers;
+	}
+
+	public DirectoryProvider<?> getDirectoryProviderForAddition(Class<?> entity, Serializable id, String idInString, Document document) {
+		Integer customerID = Integer.parseInt(document.getField("customerID").stringValue());
+		return providers[customerID];
+	}
+
+	public DirectoryProvider<?>[] getDirectoryProvidersForDeletion(Class<?> entity, Serializable id, String idInString) {
+		return getDirectoryProvidersForAllShards();
+	}
+
+	/**
+	 * Optimization; don't search ALL shards and union the results; in this case, we 
+	 * can be certain that all the data for a particular customer Filter is in a single
+	 * shard; simply return that shard by customerID.
+	 */
+	public DirectoryProvider<?>[] getDirectoryProvidersForQuery(FullTextFilterImplementor[] filters) {
+		FullTextFilter filter = getCustomerFilter(filters, "customer");
+		if (filter == null) {
+			return getDirectoryProvidersForAllShards();
+		}
+		else {
+			return new DirectoryProvider[] { providers[Integer.parseInt(filter.getParameter("customerID").toString())] };
+		}
+	}
+
+	private FullTextFilter getCustomerFilter(FullTextFilterImplementor[] filters, String name) {
+		for (FullTextFilterImplementor filter: filters) {
+			if (filter.getName().equals(name)) return filter;
+		}
+		return null;
+	}
+
+}

Added: search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategyTest.java
===================================================================
--- search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategyTest.java	                        (rev 0)
+++ search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategyTest.java	2009-06-11 00:43:09 UTC (rev 16755)
@@ -0,0 +1,56 @@
+package org.hibernate.search.test.shards;
+
+import junit.framework.TestCase;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.Field;
+
+import org.hibernate.search.query.FullTextFilterImpl;
+import org.hibernate.search.store.DirectoryProvider;
+import org.hibernate.search.store.RAMDirectoryProvider;
+
+/**
+ * @author Chase Seibert
+ */
+public class CustomerShardingStrategyTest extends TestCase {
+
+	private CustomerShardingStrategy shardStrategy;
+
+	protected void setUp() throws Exception {
+		shardStrategy = new CustomerShardingStrategy();
+		
+		// initilaize w/ 10 shards
+		shardStrategy.initialize( null, new DirectoryProvider[] {
+				new RAMDirectoryProvider(), 
+				new RAMDirectoryProvider(),
+				new RAMDirectoryProvider(),
+				new RAMDirectoryProvider(),
+				new RAMDirectoryProvider(),
+				new RAMDirectoryProvider(),
+				new RAMDirectoryProvider(),
+				new RAMDirectoryProvider(),
+				new RAMDirectoryProvider(),
+				new RAMDirectoryProvider() 
+		} );
+	}
+
+	public void testGetDirectoryProvidersForQuery() {
+		
+		FullTextFilterImpl filter = new FullTextFilterImpl();
+		filter.setName("customer");
+		filter.setParameter("customerID", 5);
+		
+		// customerID == 5 should correspond to just a single shard instance
+		DirectoryProvider[] providers = shardStrategy.getDirectoryProvidersForQuery(new FullTextFilterImpl[] { filter });
+		assertTrue(providers.length == 1);
+		
+		// create a dummy document for the same customerID, and make sure the shard it would be
+		// indexed on matches the shard returned by getDirectoryProvidersForQuery()
+		Document document = new Document();
+		document.add(new Field("customerID", "5", Field.Store.NO, Field.Index.UN_TOKENIZED));
+		
+		assertTrue(providers[0].equals(
+			shardStrategy.getDirectoryProviderForAddition(null, null, null, document)
+			));
+	}
+	
+}
\ No newline at end of file




More information about the hibernate-commits mailing list