[exo-jcr-commits] exo-jcr SVN: r1893 - in jcr/trunk/docs/reference/en/src/main/docbook/en-US: modules and 1 other directory.
do-not-reply at jboss.org
do-not-reply at jboss.org
Thu Feb 18 07:58:57 EST 2010
Author: sergiykarpenko
Date: 2010-02-18 07:58:57 -0500 (Thu, 18 Feb 2010)
New Revision: 1893
Modified:
jcr/trunk/docs/reference/en/src/main/docbook/en-US/master.xml
jcr/trunk/docs/reference/en/src/main/docbook/en-US/modules/search-configuration.xml
Log:
EXOJCR-490: search-configuration added
Modified: jcr/trunk/docs/reference/en/src/main/docbook/en-US/master.xml
===================================================================
--- jcr/trunk/docs/reference/en/src/main/docbook/en-US/master.xml 2010-02-18 12:58:07 UTC (rev 1892)
+++ jcr/trunk/docs/reference/en/src/main/docbook/en-US/master.xml 2010-02-18 12:58:57 UTC (rev 1893)
@@ -78,5 +78,8 @@
<xi:include href="modules/external-value-storages.xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
+ <xi:include href="modules/search-configuration.xml"
+ xmlns:xi="http://www.w3.org/2001/XInclude" />
+
</book>
Modified: jcr/trunk/docs/reference/en/src/main/docbook/en-US/modules/search-configuration.xml
===================================================================
--- jcr/trunk/docs/reference/en/src/main/docbook/en-US/modules/search-configuration.xml 2010-02-18 12:58:07 UTC (rev 1892)
+++ jcr/trunk/docs/reference/en/src/main/docbook/en-US/modules/search-configuration.xml 2010-02-18 12:58:57 UTC (rev 1893)
@@ -1,14 +1,774 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
-"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
-<chapter>
- <?dbhtml filename="search-configuration.html"?>
-
- <title>Search Configuration</title>
-
- <section>
- <title></title>
-
- <para></para>
- </section>
-</chapter>
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
+"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
+<chapter>
+ <?dbhtml filename="search-configuration.html"?>
+
+ <title>Search Configuration</title>
+
+ <section>
+ <title>XML Configuration</title>
+
+ <para>JCR index configuration. You can find this file here:
+ <filename>.../portal/WEB-INF/conf/jcr/repository-configuration.xml</filename></para>
+
+ <programlisting><repository-service default-repository="db1">
+ <repositories>
+ <repository name="db1" system-workspace="ws" default-workspace="ws">
+ ....
+ <workspaces>
+ <workspace name="ws">
+ ....
+ <query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
+ <properties>
+ <property name="index-dir" value="${java.io.tmpdir}/temp/index/db1/ws" />
+ <property name="synonymprovider-class" value="org.exoplatform.services.jcr.impl.core.query.lucene.PropertiesSynonymProvider" />
+ <property name="synonymprovider-config-path" value="/synonyms.properties" />
+ <property name="indexing-config-path" value="/indexing-configuration.xml" />
+ <property name="query-class" value="org.exoplatform.services.jcr.impl.core.query.QueryImpl" />
+ </properties>
+ </query-handler>
+ ...
+ </workspace>
+ </workspaces>
+ </repository>
+ </repositories>
+</repository-service></programlisting>
+ </section>
+
+ <section>
+ <title>Configuration parameters</title>
+
+ <table>
+ <title></title>
+
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>Parameter</entry>
+
+ <entry>Default</entry>
+
+ <entry>Description</entry>
+
+ <entry>Since</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry>index-dir</entry>
+
+ <entry>none</entry>
+
+ <entry>The location of the index directory. This parameter is
+ mandatory. Up to 1.9 this parameter called "indexDir"</entry>
+
+ <entry>1.0</entry>
+ </row>
+
+ <row>
+ <entry>use-compoundfile</entry>
+
+ <entry>true</entry>
+
+ <entry>Advises lucene to use compound files for the index
+ files.</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>min-merge-docs</entry>
+
+ <entry>100</entry>
+
+ <entry>Minimum number of nodes in an index until segments are
+ merged.</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>volatile-idle-time</entry>
+
+ <entry>3</entry>
+
+ <entry>Idle time in seconds until the volatile index part is moved
+ to a persistent index even though minMergeDocs is not
+ reached.</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>max-merge-docs</entry>
+
+ <entry>Integer.MAX_VALUE</entry>
+
+ <entry>Maximum number of nodes in segments that will be merged.
+ The default value changed in JCR 1.9 to Integer.MAX_VALUE.</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>merge-factor</entry>
+
+ <entry>10</entry>
+
+ <entry>Determines how often segment indices are merged.</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>max-field-length</entry>
+
+ <entry>10000</entry>
+
+ <entry>The number of words that are fulltext indexed at most per
+ property.</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>cache-size</entry>
+
+ <entry>1000</entry>
+
+ <entry>Size of the document number cache. This cache maps uuids to
+ lucene document numbers</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>force-consistencycheck</entry>
+
+ <entry>false</entry>
+
+ <entry>Runs a consistency check on every startup. If false, a
+ consistency check is only performed when the search index detects
+ a prior forced shutdown.</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>auto-repair</entry>
+
+ <entry>true</entry>
+
+ <entry>Errors detected by a consistency check are automatically
+ repaired. If false, errors are only written to the log.</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>query-class</entry>
+
+ <entry>QueryImpl</entry>
+
+ <entry>Class name that implements the javax.jcr.query.Query
+ interface.This class must also extend from the class:
+ org.exoplatform.services.jcr.impl.core.query.AbstractQueryImpl.</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>document-order</entry>
+
+ <entry>true</entry>
+
+ <entry>If true and the query does not contain an 'order by'
+ clause, result nodes will be in document order. For better
+ performance when queries return a lot of nodes set to
+ 'false'.</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>result-fetch-size</entry>
+
+ <entry>Integer.MAX_VALUE</entry>
+
+ <entry>The number of results when a query is executed. Default
+ value: Integer.MAX_VALUE (-> all).</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>excerptprovider-class</entry>
+
+ <entry>DefaultXMLExcerpt</entry>
+
+ <entry>The name of the class that implements
+ org.exoplatform.services.jcr.impl.core.query.lucene.ExcerptProvider
+ and should be used for the rep:excerpt() function in a
+ query.</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>support-highlighting</entry>
+
+ <entry>false</entry>
+
+ <entry>If set to true additional information is stored in the
+ index to support highlighting using the rep:excerpt()
+ function.</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>synonymprovider-class</entry>
+
+ <entry>none</entry>
+
+ <entry>The name of a class that implements
+ org.exoplatform.services.jcr.impl.core.query.lucene.SynonymProvider.
+ The default value is null (-> not set).</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>synonymprovider-config-path</entry>
+
+ <entry>none</entry>
+
+ <entry>The path to the synonym provider configuration file. This
+ path interpreted relative to the path parameter. If there is a
+ path element inside the SearchIndex element, then this path is
+ interpreted relative to the root path of the path. Whether this
+ parameter is mandatory depends on the synonym provider
+ implementation. The default value is null (-> not set).</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>indexing-configuration-path</entry>
+
+ <entry>none</entry>
+
+ <entry>The path to the indexing configuration file.</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>indexing-configuration-class</entry>
+
+ <entry>IndexingConfigurationImpl</entry>
+
+ <entry>The name of the class that implements
+ org.exoplatform.services.jcr.impl.core.query.lucene.IndexingConfiguration.</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>force-consistencycheck</entry>
+
+ <entry>false</entry>
+
+ <entry>If set to true a consistency check is performed depending
+ on the parameter forceConsistencyCheck. If set to false no
+ consistency check is performed on startup, even if a redo log had
+ been applied.</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>spellchecker-class</entry>
+
+ <entry>none</entry>
+
+ <entry>The name of a class that implements
+ org.exoplatform.services.jcr.impl.core.query.lucene.SpellChecker.</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>errorlog-size</entry>
+
+ <entry>50(Kb)</entry>
+
+ <entry>The default size of error log file in Kb.</entry>
+
+ <entry>1.9</entry>
+ </row>
+
+ <row>
+ <entry>upgrade-index</entry>
+
+ <entry>false</entry>
+
+ <entry>Allows JCR to convert an existing index into the new
+ format. Also it is possible to set this property via system
+ property, for example: -Dupgrade-index=true Indexes before JCR
+ 1.12 will not run with JCR 1.12. Hence you have to run an
+ automatic migration: Start JCR with -Dupgrade-index=true. The old
+ index format is then converted in the new index format. After the
+ conversion the new format is used. On the next start you don't
+ need this option anymore. The old index is replaced and a back
+ conversion is not possible - therefore better take a backup of the
+ index before. (Only for migrations from JCR 1.9 and
+ later.)</entry>
+
+ <entry>1.12</entry>
+ </row>
+
+ <row>
+ <entry>analyzer</entry>
+
+ <entry>org.apache.lucene.analysis.standard.StandardAnalyzer</entry>
+
+ <entry>Class name of a lucene analyzer to use for fulltext
+ indexing of text.</entry>
+
+ <entry>1.12</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </section>
+
+ <section>
+ <title>Global Search Index</title>
+
+ <section>
+ <title>Global Search Index Configuration</title>
+
+ <para>The global search index is configured in the above-mentioned
+ configuration file
+ (<filename>portal/WEB-INF/conf/jcr/repository-configuration.xml</filename>)
+ in the tag "query-handler".</para>
+
+ <programlisting><query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"></programlisting>
+
+ <para>In fact when using Lucene you always should use the same analyzer
+ for indexing and for querying - otherwise the results are unpredictable.
+ You don't have to worry about this, eXo JCR does this for you
+ automatically. If you don't like the StandardAnalyzer configured by
+ default just replace it by your own.</para>
+
+ <para>If you don't have a handy QueryHandler you will learn how create a
+ customized Handler in 5 minutes.</para>
+ </section>
+
+ <section>
+ <title>Customized Search Indexes and Analyzers</title>
+
+ <para>By default Exo JCR uses the Lucene standard Analyzer to index
+ contents. This analyzer uses some standard filters in the method that
+ analyzes the content:<programlisting>public TokenStream tokenStream(String fieldName, Reader reader) {
+ StandardTokenizer tokenStream = new StandardTokenizer(reader, replaceInvalidAcronym);
+ tokenStream.setMaxTokenLength(maxTokenLength);
+ TokenStream result = new StandardFilter(tokenStream);
+ result = new LowerCaseFilter(result);
+ result = new StopFilter(result, stopSet);
+ return result;
+ }</programlisting><itemizedlist>
+ <listitem>
+ <para>The first one (StandardFilter) removes 's (as 's in
+ "Peter's") from the end of words and removes dots from
+ acronyms.</para>
+ </listitem>
+
+ <listitem>
+ <para>The second one (LowerCaseFilter) normalizes token text to
+ lower case.</para>
+ </listitem>
+
+ <listitem>
+ <para>The last one (StopFilter) removes stop words from a token
+ stream. The stop set is defined in the analyzer.</para>
+ </listitem>
+ </itemizedlist></para>
+
+ <para>For specific cases, you may wish to use additional filters like
+ <phrase>ISOLatin1AccentFilter</phrase>, which replaces accented
+ characters in the ISO Latin 1 character set (ISO-8859-1) by their
+ unaccented equivalents.</para>
+
+ <para>In order to use a different filter, you have to create a new
+ analyzer, and a new search index to use the analyzer. You put it in a
+ jar, which is deployed with your application.</para>
+
+ <section>
+ <title>Create the filter</title>
+
+ <para>The ISOLatin1AccentFilter is not present in the current Lucene
+ version used by Exo. You can use the attached file. You can also
+ create your own filter, the relevant method is<programlisting>public final Token next(final Token reusableToken) throws java.io.IOException</programlisting>which
+ defines how chars are read and used by the filter.</para>
+ </section>
+
+ <section>
+ <title>Create the analyzer</title>
+
+ <para>The analyzer have to extends
+ org.apache.lucene.analysis.standard.StandardAnalyzer, and overload the
+ method<programlisting>public TokenStream tokenStream(String fieldName, Reader reader)</programlisting>to
+ put your own filters. You can have a glance at the example analyzer
+ attached to this article.</para>
+ </section>
+
+ <section>
+ <title>Create the search index</title>
+
+ <para>Now, we have the analyzer, we have to write the SearchIndex,
+ which will use the analyzer. Your have to extends
+ org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex. You
+ have to write the constructor, to set the right analyzer, and the
+ method<programlisting>public Analyzer getAnalyzer() {
+ return MyAnalyzer;
+ }</programlisting>to return your analyzer. You can see the attached
+ SearchIndex.</para>
+
+ <note>
+ <para>Since 1.12 version we can set Analyzer directly in
+ configuration. So, creation new SearchIndex only for new Analyzer is
+ redundant.</para>
+ </note>
+ </section>
+
+ <section>
+ <title>Configure your application to use your SearchIndex</title>
+
+ <para>In
+ <filename>portal/WEB-INF/conf/jcr/repository-configuration.xml</filename>,
+ you have to replace each<programlisting><query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex"></programlisting>by
+ your own class<programlisting><query-handler class="mypackage.indexation.MySearchIndex"></programlisting></para>
+ </section>
+
+ <section>
+ <title>Configure your application to use your Analyzer</title>
+
+ <para>In
+ <filename>portal/WEB-INF/conf/jcr/repository-configuration.xml</filename>,
+ you have to add parameter "analyzer" to each query-handler
+ config:<programlisting><query-handler class="org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">
+ <properties>
+ ...
+ <property name="analyzer" value="org.exoplatform.services.jcr.impl.core.MyAnalyzer"/>
+ ...
+ </properties>
+</query-handler></programlisting></para>
+
+ <para>When you start exo, your SearchIndex will start to index
+ contents with the specified filters.</para>
+ </section>
+ </section>
+ </section>
+
+ <section>
+ <title>Index Adjustments</title>
+
+ <section>
+ <title>IndexingConfiguration</title>
+
+ <para>Starting with version 1.9, the default search index implementation
+ in JCR allows you to control which properties of a node are indexed. You
+ also can define different analyzers for different nodes.</para>
+
+ <para>The configuration parameter is called indexingConfiguration and
+ per default is not set. This means all properties of a node are
+ indexed.</para>
+
+ <para>If you wish to configure the indexing behavior you need to add a
+ parameter to the query-handler element in your configuration
+ file.</para>
+
+ <programlisting><param name="indexing-configuration-path" value="/indexing_configuration.xml"/></programlisting>
+ </section>
+
+ <section>
+ <title>Index rules</title>
+
+ <section>
+ <title>Node Scope Limit</title>
+
+ <para>To optimize the index size you can limit the node scope so that
+ <phrase>only certain properties</phrase> of a node type are
+ indexed.</para>
+
+ <para>With the below configuration only properties named Text are
+ indexed for nodes of type nt:unstructured. This configuration also
+ applies to all nodes whose type extends from nt:unstructured.</para>
+
+ <programlisting><?xml version="1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
+<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
+ <index-rule nodeType="nt:unstructured">
+ <property>Text</property>
+ </index-rule>
+</configuration></programlisting>
+
+ <para>Please note that you have to declare the <phrase>namespace
+ prefixes</phrase> in the configuration element that you are using
+ throughout the XML file!</para>
+ </section>
+
+ <section>
+ <title>Index Boost Value</title>
+
+ <para>It is also possible to configure a <phrase>boost value</phrase>
+ for the nodes that match the index rule. The default boost value is
+ 1.0. Higher boost values (a reasonable range is 1.0 - 5.0) will yield
+ a higher score value and appear as more relevant.</para>
+
+ <programlisting><?xml version="1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
+<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
+ <index-rule nodeType="nt:unstructured"
+ boost="2.0">
+ <property>Text</property>
+ </index-rule>
+</configuration></programlisting>
+
+ <para>If you do not wish to boost the complete node but only certain
+ properties you can also provide a boost value for the listed
+ properties:<programlisting><?xml version="1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
+<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
+ <index-rule nodeType="nt:unstructured">
+ <property boost="3.0">Title</property>
+ <property boost="1.5">Text</property>
+ </index-rule>
+</configuration></programlisting></para>
+ </section>
+
+ <section>
+ <title>Conditional Index Rules</title>
+
+ <para>You may also add a <phrase>condition</phrase> to the index rule
+ and have multiple rules with the same nodeType. The first index rule
+ that matches will apply and all remaining ones are
+ ignored:<programlisting><?xml version="1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
+<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
+ <index-rule nodeType="nt:unstructured"
+ boost="2.0"
+ condition="@priority = 'high'">
+ <property>Text</property>
+ </index-rule>
+ <index-rule nodeType="nt:unstructured">
+ <property>Text</property>
+ </index-rule>
+</configuration></programlisting></para>
+
+ <para>In the above example the first rule only applies if the
+ nt:unstructured node has a priority property with a value 'high'. The
+ condition syntax supports only the equals operator and a string
+ literal.</para>
+
+ <para>You may also reference properties in the condition that are not
+ on the current node:<programlisting><?xml version="1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
+<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
+ <index-rule nodeType="nt:unstructured"
+ boost="2.0"
+ condition="ancestor::*/@priority = 'high'">
+ <property>Text</property>
+ </index-rule>
+ <index-rule nodeType="nt:unstructured"
+ boost="0.5"
+ condition="parent::foo/@priority = 'low'">
+ <property>Text</property>
+ </index-rule>
+ <index-rule nodeType="nt:unstructured"
+ boost="1.5"
+ condition="bar/@priority = 'medium'">
+ <property>Text</property>
+ </index-rule>
+ <index-rule nodeType="nt:unstructured">
+ <property>Text</property>
+ </index-rule>
+</configuration></programlisting></para>
+
+ <para>The indexing configuration also allows you to specify the type
+ of a node in the condition. Please note however that the type match
+ must be exact. It does not consider sub types of the specified node
+ type.</para>
+
+ <programlisting><?xml version="1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
+<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
+ <index-rule nodeType="nt:unstructured"
+ boost="2.0"
+ condition="element(*, nt:unstructured)/@priority = 'high'">
+ <property>Text</property>
+ </index-rule>
+</configuration></programlisting>
+ </section>
+
+ <section>
+ <title>Exclusion from the Node Scope Index</title>
+
+ <para>Per default all configured properties are fulltext indexed if
+ they are of type STRING and included in the node scope index. A node
+ scope search finds normally all nodes of an index. That is, the select
+ jcr:contains(., 'foo') returns all nodes that have a string property
+ containing the word 'foo'. You can exclude explicitly a property from
+ the node scope index:<programlisting><?xml version="1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
+<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
+ <index-rule nodeType="nt:unstructured">
+ <property nodeScopeIndex="false">Text</property>
+ </index-rule>
+</configuration></programlisting></para>
+ </section>
+ </section>
+
+ <section>
+ <title>Index Aggregates</title>
+
+ <para>Sometimes it is useful to include the contents of descendant nodes
+ into a single node to easier search on content that is scattered across
+ multiple nodes.</para>
+
+ <para>JCR allows you to define index aggregates based on relative path
+ patterns and primary node types.</para>
+
+ <para>The following example creates an index aggregate on nt:file that
+ includes the content of the jcr:content node:<programlisting><?xml version="1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
+<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
+ xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
+ <aggregate primaryType="nt:file">
+ <include>jcr:content</include>
+ </aggregate>
+</configuration></programlisting></para>
+
+ <para>You can also restrict the included nodes to a certain
+ type:<programlisting><?xml version="1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
+<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
+ xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
+ <aggregate primaryType="nt:file">
+ <include primaryType="nt:resource">jcr:content</include>
+ </aggregate>
+</configuration></programlisting></para>
+
+ <para>You may also use the * to match all child nodes:<programlisting><?xml version="1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
+<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
+ xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
+ <aggregate primaryType="nt:file">http://wiki.exoplatform.com/xwiki/bin/edit/JCR/Search+Configuration
+ <include primaryType="nt:resource">*</include>
+ </aggregate>
+</configuration></programlisting></para>
+
+ <para>If you wish to include nodes up to a certain depth below the
+ current node you can add multiple include elements. E.g. the nt:file
+ node may contain a complete XML document under
+ jcr:content:<programlisting><?xml version="1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
+<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
+ xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
+ <aggregate primaryType="nt:file">
+ <include>*</include>
+ <include>*/*</include>
+ <include>*/*/*</include>
+ </aggregate>
+</configuration></programlisting></para>
+ </section>
+
+ <section>
+ <title>Property-Level Analyzers</title>
+
+ <section>
+ <title>Example</title>
+
+ <para>In this configuration section you define how a property has to
+ be analyzed. If there is an analyzer configuration for a property,
+ this analyzer is used for indexing and searching of this property. For
+ example:<programlisting><?xml version="1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing-configuration-1.0.dtd">
+<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
+ <analyzers>
+ <analyzer class="org.apache.lucene.analysis.KeywordAnalyzer">
+ <property>mytext</property>
+ </analyzer>
+ <analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer">
+ <property>mytext2</property>
+ </analyzer>
+ </analyzers>
+</configuration></programlisting></para>
+
+ <para>The configuration above means that the property "mytext" for the
+ entire workspace is indexed (and searched) with the Lucene
+ KeywordAnalyzer, and property "mytext2" with the WhitespaceAnalyzer.
+ Using different analyzers for different languages is particularly
+ useful.</para>
+
+ <para>The WhitespaceAnalyzer tokenizes a property, the KeywordAnalyzer
+ takes the property as a whole.</para>
+ </section>
+
+ <section>
+ <title>Characteristics of Node Scope Searches</title>
+
+ <para>When using analyzers, you may encounter an unexpected behavior
+ when searching within a property compared to searching within a node
+ scope. The reason is that the node scope always uses the global
+ analyzer.</para>
+
+ <para>Let's suppose that the property "mytext" contains the text :
+ "testing my analyzers" and that you haven't configured any analyzers
+ for the property "mytext" (and not changed the default analyzer in
+ SearchIndex).</para>
+
+ <para>If your query is for example:<programlisting>xpath = "//*[jcr:contains(mytext,'analyzer')]"</programlisting></para>
+
+ <para>This xpath does not return a hit in the node with the property
+ above and default analyzers.</para>
+
+ <para>Also a search on the node scope<programlisting>xpath = "//*[jcr:contains(.,'analyzer')]"</programlisting>won't
+ give a hit. Realize, that you can only set specific analyzers on a
+ node property, and that the node scope indexing/analyzing is always
+ done with the globally defined analyzer in the SearchIndex
+ element.</para>
+
+ <para>Now, if you change the analyzer used to index the "mytext"
+ property above to<programlisting><analyzer class="org.apache.lucene.analysis.Analyzer.GermanAnalyzer">
+ <property>mytext</property>
+</analyzer></programlisting>and you do the same search again, then
+ for<programlisting>xpath = "//*[jcr:contains(mytext,'analyzer')]"</programlisting>you
+ would get a hit because of the word stemming (analyzers -
+ analyzer).</para>
+
+ <para>The other search,<programlisting>xpath = "//*[jcr:contains(.,'analyzer')]"</programlisting>still
+ would not give a result, since the node scope is indexed with the
+ global analyzer, which in this case does not take into account any
+ word stemming.</para>
+
+ <para>In conclusion, be aware that when using analyzers for specific
+ properties, you might find a hit in a property for some search text,
+ and you do not find a hit with the same search text in the node scope
+ of the property!</para>
+
+ <note>
+ <para>Both index rules and index aggregates influence how content is
+ indexed in JCR. If you change the configuration the existing content
+ is not automatically re-indexed according to the new rules. You
+ therefore have to manually re-index the content when you change the
+ configuration!</para>
+ </note>
+ </section>
+ </section>
+ </section>
+</chapter>
More information about the exo-jcr-commits
mailing list