From do-not-reply at jboss.org Tue Jun 15 11:11:58 2010
Content-Type: multipart/mixed; boundary="===============1101572748759254311=="
MIME-Version: 1.0
From: do-not-reply at jboss.org
To: exo-jcr-commits at lists.jboss.org
Subject: [exo-jcr-commits] exo-jcr SVN: r2611 -
jcr/branches/1.12.x/docs/reference/en/src/main/docbook/en-US/modules/jcr.
Date: Tue, 15 Jun 2010 11:11:58 -0400
Message-ID: <201006151511.o5FFBwCK008202@svn01.web.mwc.hst.phx2.redhat.com>
--===============1101572748759254311==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Author: sergiykarpenko
Date: 2010-06-15 11:11:57 -0400 (Tue, 15 Jun 2010)
New Revision: 2611
Modified:
jcr/branches/1.12.x/docs/reference/en/src/main/docbook/en-US/modules/jcr=
/search-configuration.xml
Log:
EXOJCR-787: new prameters added to search configuration docbook
Modified: jcr/branches/1.12.x/docs/reference/en/src/main/docbook/en-US/modu=
les/jcr/search-configuration.xml
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- jcr/branches/1.12.x/docs/reference/en/src/main/docbook/en-US/modules/jc=
r/search-configuration.xml 2010-06-15 15:02:36 UTC (rev 2610)
+++ jcr/branches/1.12.x/docs/reference/en/src/main/docbook/en-US/modules/jc=
r/search-configuration.xml 2010-06-15 15:11:57 UTC (rev 2611)
@@ -1,774 +1,799 @@
-
-
-
-
-
- Search Configuration
-
-
- XML Configuration
-
- JCR index configuration. You can find this file here:
- .../portal/WEB-INF/conf/jcr/repository-configuration.xml
-
- <repository-service default-repository=3D"db1">
- <repositories>
- <repository name=3D"db1" system-workspace=3D"ws" default-workspace=
=3D"ws">
- ....
- <workspaces>
- <workspace name=3D"ws">
- ....
- <query-handler class=3D"org.exoplatform.services.jcr.impl.cor=
e.query.lucene.SearchIndex">
- <properties>
- <property name=3D"index-dir" value=3D"${java.io.tmpdir}/t=
emp/index/db1/ws" />
- <property name=3D"synonymprovider-class" value=3D"org.exo=
platform.services.jcr.impl.core.query.lucene.PropertiesSynonymProvider" /&g=
t;
- <property name=3D"synonymprovider-config-path" value=3D"/=
synonyms.properties" />
- <property name=3D"indexing-config-path" value=3D"/indexin=
g-configuration.xml" />
- <property name=3D"query-class" value=3D"org.exoplatform.s=
ervices.jcr.impl.core.query.QueryImpl" />
- </properties>
- </query-handler>
- ... =
- </workspace>
- </workspaces>
- </repository> =
- </repositories>
-</repository-service>
-
-
-
- Configuration parameters
-
-
-
-
-
-
-
- Parameter
-
- Default
-
- Description
-
- Since
-
-
-
-
-
- index-dir
-
- none
-
- The location of the index directory. This parameter is
- mandatory. Up to 1.9 this parameter called "indexDir"
-
- 1.0
-
-
-
- use-compoundfile
-
- true
-
- Advises lucene to use compound files for the index
- files.
-
- 1.9
-
-
-
- min-merge-docs
-
- 100
-
- Minimum number of nodes in an index until segments are
- merged.
-
- 1.9
-
-
-
- volatile-idle-time
-
- 3
-
- Idle time in seconds until the volatile index part is m=
oved
- to a persistent index even though minMergeDocs is not
- reached.
-
- 1.9
-
-
-
- max-merge-docs
-
- Integer.MAX_VALUE
-
- Maximum number of nodes in segments that will be merged.
- The default value changed in JCR 1.9 to Integer.MAX_VALUE.
-
- 1.9
-
-
-
- merge-factor
-
- 10
-
- Determines how often segment indices are merged.
-
- 1.9
-
-
-
- max-field-length
-
- 10000
-
- The number of words that are fulltext indexed at most p=
er
- property.
-
- 1.9
-
-
-
- cache-size
-
- 1000
-
- Size of the document number cache. This cache maps uuid=
s to
- lucene document numbers
-
- 1.9
-
-
-
- force-consistencycheck
-
- false
-
- Runs a consistency check on every startup. If false, a
- consistency check is only performed when the search index dete=
cts
- a prior forced shutdown.
-
- 1.9
-
-
-
- auto-repair
-
- true
-
- Errors detected by a consistency check are automatically
- repaired. If false, errors are only written to the log.
-
- 1.9
-
-
-
- query-class
-
- QueryImpl
-
- Class name that implements the javax.jcr.query.Query
- interface.This class must also extend from the class:
- org.exoplatform.services.jcr.impl.core.query.AbstractQueryImpl=
.
-
- 1.9
-
-
-
- document-order
-
- true
-
- If true and the query does not contain an 'order by'
- clause, result nodes will be in document order. For better
- performance when queries return a lot of nodes set to
- 'false'.
-
- 1.9
-
-
-
- result-fetch-size
-
- Integer.MAX_VALUE
-
- The number of results when a query is executed. Default
- value: Integer.MAX_VALUE (-> all).
-
- 1.9
-
-
-
- excerptprovider-class
-
- DefaultXMLExcerpt
-
- The name of the class that implements
- org.exoplatform.services.jcr.impl.core.query.lucene.ExcerptPro=
vider
- and should be used for the rep:excerpt() function in a
- query.
-
- 1.9
-
-
-
- support-highlighting
-
- false
-
- If set to true additional information is stored in the
- index to support highlighting using the rep:excerpt()
- function.
-
- 1.9
-
-
-
- synonymprovider-class
-
- none
-
- The name of a class that implements
- org.exoplatform.services.jcr.impl.core.query.lucene.SynonymPro=
vider.
- The default value is null (-> not set).
-
- 1.9
-
-
-
- synonymprovider-config-path
-
- none
-
- The path to the synonym provider configuration file. Th=
is
- path interpreted relative to the path parameter. If there is a
- path element inside the SearchIndex element, then this path is
- interpreted relative to the root path of the path. Whether this
- parameter is mandatory depends on the synonym provider
- implementation. The default value is null (-> not set).
-
- 1.9
-
-
-
- indexing-configuration-path
-
- none
-
- The path to the indexing configuration file.
-
- 1.9
-
-
-
- indexing-configuration-class
-
- IndexingConfigurationImpl
-
- The name of the class that implements
- org.exoplatform.services.jcr.impl.core.query.lucene.IndexingCo=
nfiguration.
-
- 1.9
-
-
-
- force-consistencycheck
-
- false
-
- If set to true a consistency check is performed dependi=
ng
- on the parameter forceConsistencyCheck. If set to false no
- consistency check is performed on startup, even if a redo log =
had
- been applied.
-
- 1.9
-
-
-
- spellchecker-class
-
- none
-
- The name of a class that implements
- org.exoplatform.services.jcr.impl.core.query.lucene.SpellCheck=
er.
-
- 1.9
-
-
-
- errorlog-size
-
- 50(Kb)
-
- The default size of error log file in Kb.
-
- 1.9
-
-
-
- upgrade-index
-
- false
-
- Allows JCR to convert an existing index into the new
- format. Also it is possible to set this property via system
- property, for example: -Dupgrade-index=3Dtrue Indexes before J=
CR
- 1.12 will not run with JCR 1.12. Hence you have to run an
- automatic migration: Start JCR with -Dupgrade-index=3Dtrue. Th=
e old
- index format is then converted in the new index format. After =
the
- conversion the new format is used. On the next start you don't
- need this option anymore. The old index is replaced and a back
- conversion is not possible - therefore better take a backup of=
the
- index before. (Only for migrations from JCR 1.9 and
- later.)
-
- 1.12
-
-
-
- analyzer
-
- org.apache.lucene.analysis.standard.StandardAnalyzer
-
- Class name of a lucene analyzer to use for fulltext
- indexing of text.
-
- 1.12
-
-
-
-
-
-
-
- Global Search Index
-
-
- Global Search Index Configuration
-
- The global search index is configured in the above-mentioned
- configuration file
- (portal/WEB-INF/conf/jcr/repository-configuration.xml)
- in the tag "query-handler".
-
- <query-handler class=3D"org.exoplatform.services.=
jcr.impl.core.query.lucene.SearchIndex">
-
- In fact when using Lucene you always should use the same analy=
zer
- for indexing and for querying - otherwise the results are unpredicta=
ble.
- You don't have to worry about this, eXo JCR does this for you
- automatically. If you don't like the StandardAnalyzer configured by
- default just replace it by your own.
-
- If you don't have a handy QueryHandler you will learn how crea=
te a
- customized Handler in 5 minutes.
-
-
-
- Customized Search Indexes and Analyzers
-
- By default Exo JCR uses the Lucene standard Analyzer to index
- contents. This analyzer uses some standard filters in the method that
- analyzes the content:public TokenStream tokenStream(=
String fieldName, Reader reader) {
- StandardTokenizer tokenStream =3D new StandardTokenizer(reader, replac=
eInvalidAcronym);
- tokenStream.setMaxTokenLength(maxTokenLength);
- TokenStream result =3D new StandardFilter(tokenStream);
- result =3D new LowerCaseFilter(result);
- result =3D new StopFilter(result, stopSet);
- return result;
- }
-
- The first one (StandardFilter) removes 's (as 's in
- "Peter's") from the end of words and removes dots from
- acronyms.
-
-
-
- The second one (LowerCaseFilter) normalizes token text to
- lower case.
-
-
-
- The last one (StopFilter) removes stop words from a token
- stream. The stop set is defined in the analyzer.
-
-
-
- For specific cases, you may wish to use additional filters like
- ISOLatin1AccentFilter, which replaces accented
- characters in the ISO Latin 1 character set (ISO-8859-1) by their
- unaccented equivalents.
-
- In order to use a different filter, you have to create a new
- analyzer, and a new search index to use the analyzer. You put it in a
- jar, which is deployed with your application.
-
-
- Create the filter
-
- The ISOLatin1AccentFilter is not present in the current Luce=
ne
- version used by Exo. You can use the attached file. You can also
- create your own filter, the relevant method ispubl=
ic final Token next(final Token reusableToken) throws java.io.IOException=
programlisting>which
- defines how chars are read and used by the filter.
-
-
-
- Create the analyzer
-
- The analyzer have to extends
- org.apache.lucene.analysis.standard.StandardAnalyzer, and overload=
the
- methodpublic TokenStream tokenStream(String fieldN=
ame, Reader reader)to
- put your own filters. You can have a glance at the example analyzer
- attached to this article.
-
-
-
- Create the search index
-
- Now, we have the analyzer, we have to write the SearchIndex,
- which will use the analyzer. Your have to extends
- org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex. Y=
ou
- have to write the constructor, to set the right analyzer, and the
- methodpublic Analyzer getAnalyzer() {
- return MyAnalyzer;
- }to return your analyzer. You can see the attached
- SearchIndex.
-
-
- Since 1.12 version we can set Analyzer directly in
- configuration. So, creation new SearchIndex only for new Analyze=
r is
- redundant.
-
-
-
-
- Configure your application to use your SearchIndex
-
- In
- portal/WEB-INF/conf/jcr/repository-configuration.xml,
- you have to replace each<query-handler class=3D=
"org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">by
- your own class<query-handler class=3D"mypackage=
.indexation.MySearchIndex">
-
-
-
- Configure your application to use your Analyzer
-
- In
- portal/WEB-INF/conf/jcr/repository-configuration.xml,
- you have to add parameter "analyzer" to each query-handler
- config:<query-handler class=3D"org.exoplatform.=
services.jcr.impl.core.query.lucene.SearchIndex">
- <properties>
- ...
- <property name=3D"analyzer" value=3D"org.exoplatform.services.jcr=
.impl.core.MyAnalyzer"/>
- ...
- </properties>
-</query-handler>
-
- When you start exo, your SearchIndex will start to index
- contents with the specified filters.
-
-
-
-
-
- Index Adjustments
-
-
- IndexingConfiguration
-
- Starting with version 1.9, the default search index implementa=
tion
- in JCR allows you to control which properties of a node are indexed.=
You
- also can define different analyzers for different nodes.
-
- The configuration parameter is called indexingConfiguration and
- per default is not set. This means all properties of a node are
- indexed.
-
- If you wish to configure the indexing behavior you need to add=
a
- parameter to the query-handler element in your configuration
- file.
-
- <param name=3D"indexing-configuration-path" value=
=3D"/indexing_configuration.xml"/>
-
-
-
- Index rules
-
-
- Node Scope Limit
-
- To optimize the index size you can limit the node scope so t=
hat
- only certain properties of a node type are
- indexed.
-
- With the below configuration only properties named Text are
- indexed for nodes of type nt:unstructured. This configuration also
- applies to all nodes whose type extends from nt:unstructured.
-
- <?xml version=3D"1.0"?>
-<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
-<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
- <index-rule nodeType=3D"nt:unstructured">
- <property>Text</property>
- </index-rule>
-</configuration>
-
- Please note that you have to declare the namespace
- prefixes in the configuration element that you are using
- throughout the XML file!
-
-
-
- Index Boost Value
-
- It is also possible to configure a boost value
- for the nodes that match the index rule. The default boost value is
- 1.0. Higher boost values (a reasonable range is 1.0 - 5.0) will yi=
eld
- a higher score value and appear as more relevant.
-
- <?xml version=3D"1.0"?>
-<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
-<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
- <index-rule nodeType=3D"nt:unstructured"
- boost=3D"2.0">
- <property>Text</property>
- </index-rule>
-</configuration>
-
- If you do not wish to boost the complete node but only certa=
in
- properties you can also provide a boost value for the listed
- properties:<?xml version=3D"1.0"?>
-<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
-<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
- <index-rule nodeType=3D"nt:unstructured">
- <property boost=3D"3.0">Title</property>
- <property boost=3D"1.5">Text</property>
- </index-rule>
-</configuration>
-
-
-
- Conditional Index Rules
-
- You may also add a condition to the index r=
ule
- and have multiple rules with the same nodeType. The first index ru=
le
- that matches will apply and all remaining ones are
- ignored:<?xml version=3D"1.0"?>
-<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
-<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
- <index-rule nodeType=3D"nt:unstructured"
- boost=3D"2.0"
- condition=3D"@priority =3D 'high'">
- <property>Text</property>
- </index-rule>
- <index-rule nodeType=3D"nt:unstructured">
- <property>Text</property>
- </index-rule>
-</configuration>
-
- In the above example the first rule only applies if the
- nt:unstructured node has a priority property with a value 'high'. =
The
- condition syntax supports only the equals operator and a string
- literal.
-
- You may also reference properties in the condition that are =
not
- on the current node:<?xml version=3D"1.0"?>
-<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
-<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
- <index-rule nodeType=3D"nt:unstructured"
- boost=3D"2.0"
- condition=3D"ancestor::*/@priority =3D 'high'">
- <property>Text</property>
- </index-rule>
- <index-rule nodeType=3D"nt:unstructured"
- boost=3D"0.5"
- condition=3D"parent::foo/@priority =3D 'low'">
- <property>Text</property>
- </index-rule>
- <index-rule nodeType=3D"nt:unstructured"
- boost=3D"1.5"
- condition=3D"bar/@priority =3D 'medium'">
- <property>Text</property>
- </index-rule>
- <index-rule nodeType=3D"nt:unstructured">
- <property>Text</property>
- </index-rule>
-</configuration>
-
- The indexing configuration also allows you to specify the ty=
pe
- of a node in the condition. Please note however that the type match
- must be exact. It does not consider sub types of the specified node
- type.
-
- <?xml version=3D"1.0"?>
-<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
-<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
- <index-rule nodeType=3D"nt:unstructured"
- boost=3D"2.0"
- condition=3D"element(*, nt:unstructured)/@priority =3D 'high=
'">
- <property>Text</property>
- </index-rule>
-</configuration>
-
-
-
- Exclusion from the Node Scope Index
-
- Per default all configured properties are fulltext indexed if
- they are of type STRING and included in the node scope index. A no=
de
- scope search finds normally all nodes of an index. That is, the se=
lect
- jcr:contains(., 'foo') returns all nodes that have a string proper=
ty
- containing the word 'foo'. You can exclude explicitly a property f=
rom
- the node scope index:<?xml version=3D"1.0"?>
-<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
-<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
- <index-rule nodeType=3D"nt:unstructured">
- <property nodeScopeIndex=3D"false">Text</property>
- </index-rule>
-</configuration>
-
-
-
-
- Index Aggregates
-
- Sometimes it is useful to include the contents of descendant n=
odes
- into a single node to easier search on content that is scattered acr=
oss
- multiple nodes.
-
- JCR allows you to define index aggregates based on relative pa=
th
- patterns and primary node types.
-
- The following example creates an index aggregate on nt:file th=
at
- includes the content of the jcr:content node:<?xm=
l version=3D"1.0"?>
-<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
-<configuration xmlns:jcr=3D"http://www.jcp.org/jcr/1.0"
- xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
- <aggregate primaryType=3D"nt:file">
- <include>jcr:content</include>
- </aggregate>
-</configuration>
-
- You can also restrict the included nodes to a certain
- type:<?xml version=3D"1.0"?>
-<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
-<configuration xmlns:jcr=3D"http://www.jcp.org/jcr/1.0"
- xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
- <aggregate primaryType=3D"nt:file">
- <include primaryType=3D"nt:resource">jcr:content</include>
- </aggregate>
-</configuration>
-
- You may also use the * to match all child nodes:<?xml version=3D"1.0"?>
-<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
-<configuration xmlns:jcr=3D"http://www.jcp.org/jcr/1.0"
- xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
- <aggregate primaryType=3D"nt:file">http://wiki.exoplatform.com/xwi=
ki/bin/edit/JCR/Search+Configuration
- <include primaryType=3D"nt:resource">*</include>
- </aggregate>
-</configuration>
-
- If you wish to include nodes up to a certain depth below the
- current node you can add multiple include elements. E.g. the nt:file
- node may contain a complete XML document under
- jcr:content:<?xml version=3D"1.0"?>
-<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
-<configuration xmlns:jcr=3D"http://www.jcp.org/jcr/1.0"
- xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
- <aggregate primaryType=3D"nt:file">
- <include>*</include>
- <include>*/*</include>
- <include>*/*/*</include>
- </aggregate>
-</configuration>
-
-
-
- Property-Level Analyzers
-
-
- Example
-
- In this configuration section you define how a property has =
to
- be analyzed. If there is an analyzer configuration for a property,
- this analyzer is used for indexing and searching of this property.=
For
- example:<?xml version=3D"1.0"?>
-<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
-<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
- <analyzers> =
- <analyzer class=3D"org.apache.lucene.analysis.KeywordAnalyzer"&=
gt;
- <property>mytext</property>
- </analyzer>
- <analyzer class=3D"org.apache.lucene.analysis.WhitespaceAnalyze=
r">
- <property>mytext2</property>
- </analyzer>
- </analyzers> =
-</configuration>
-
- The configuration above means that the property "mytext" for=
the
- entire workspace is indexed (and searched) with the Lucene
- KeywordAnalyzer, and property "mytext2" with the WhitespaceAnalyze=
r.
- Using different analyzers for different languages is particularly
- useful.
-
- The WhitespaceAnalyzer tokenizes a property, the KeywordAnal=
yzer
- takes the property as a whole.
-
-
-
- Characteristics of Node Scope Searches
-
- When using analyzers, you may encounter an unexpected behavi=
or
- when searching within a property compared to searching within a no=
de
- scope. The reason is that the node scope always uses the global
- analyzer.
-
- Let's suppose that the property "mytext" contains the text :
- "testing my analyzers" and that you haven't configured any analyze=
rs
- for the property "mytext" (and not changed the default analyzer in
- SearchIndex).
-
- If your query is for example:xpath =3D "//*[=
jcr:contains(mytext,'analyzer')]"
-
- This xpath does not return a hit in the node with the proper=
ty
- above and default analyzers.
-
- Also a search on the node scopexpath =3D "//=
*[jcr:contains(.,'analyzer')]"won't
- give a hit. Realize, that you can only set specific analyzers on a
- node property, and that the node scope indexing/analyzing is always
- done with the globally defined analyzer in the SearchIndex
- element.
-
- Now, if you change the analyzer used to index the "mytext"
- property above to<analyzer class=3D"org.apache.=
lucene.analysis.Analyzer.GermanAnalyzer">
- <property>mytext</property>
-</analyzer>and you do the same search again, then
- forxpath =3D "//*[jcr:contains(mytext,'analyzer')]=
"you
- would get a hit because of the word stemming (analyzers -
- analyzer).
-
- The other search,xpath =3D "//*[jcr:contains=
(.,'analyzer')]"still
- would not give a result, since the node scope is indexed with the
- global analyzer, which in this case does not take into account any
- word stemming.
-
- In conclusion, be aware that when using analyzers for specif=
ic
- properties, you might find a hit in a property for some search tex=
t,
- and you do not find a hit with the same search text in the node sc=
ope
- of the property!
-
-
- Both index rules and index aggregates influence how conten=
t is
- indexed in JCR. If you change the configuration the existing con=
tent
- is not automatically re-indexed according to the new rules. You
- therefore have to manually re-index the content when you change =
the
- configuration!
-
-
-
-
-
+
+
+
+
+
+ Search Configuration
+
+
+ XML Configuration
+
+ JCR index configuration. You can find this file here:
+ .../portal/WEB-INF/conf/jcr/repository-configuration.xml
+
+ <repository-service default-repository=3D"db1">
+ <repositories>
+ <repository name=3D"db1" system-workspace=3D"ws" default-workspace=
=3D"ws">
+ ....
+ <workspaces>
+ <workspace name=3D"ws">
+ ....
+ <query-handler class=3D"org.exoplatform.services.jcr.impl.cor=
e.query.lucene.SearchIndex">
+ <properties>
+ <property name=3D"index-dir" value=3D"${java.io.tmpdir}/t=
emp/index/db1/ws" />
+ <property name=3D"synonymprovider-class" value=3D"org.exo=
platform.services.jcr.impl.core.query.lucene.PropertiesSynonymProvider" /&g=
t;
+ <property name=3D"synonymprovider-config-path" value=3D"/=
synonyms.properties" />
+ <property name=3D"indexing-config-path" value=3D"/indexin=
g-configuration.xml" />
+ <property name=3D"query-class" value=3D"org.exoplatform.s=
ervices.jcr.impl.core.query.QueryImpl" />
+ </properties>
+ </query-handler>
+ ... =
+ </workspace>
+ </workspaces>
+ </repository> =
+ </repositories>
+</repository-service>
+
+
+
+ Configuration parameters
+
+
+
+
+
+
+
+ Parameter
+
+ Default
+
+ Description
+
+ Since
+
+
+
+
+
+ index-dir
+
+ none
+
+ The location of the index directory. This parameter is
+ mandatory. Up to 1.9 this parameter called "indexDir"
+
+ 1.0
+
+
+
+ use-compoundfile
+
+ true
+
+ Advises lucene to use compound files for the index
+ files.
+
+ 1.9
+
+
+
+ min-merge-docs
+
+ 100
+
+ Minimum number of nodes in an index until segments are
+ merged.
+
+ 1.9
+
+
+
+ volatile-idle-time
+
+ 3
+
+ Idle time in seconds until the volatile index part is m=
oved
+ to a persistent index even though minMergeDocs is not
+ reached.
+
+ 1.9
+
+
+
+ max-merge-docs
+
+ Integer.MAX_VALUE
+
+ Maximum number of nodes in segments that will be merged.
+ The default value changed in JCR 1.9 to Integer.MAX_VALUE.
+
+ 1.9
+
+
+
+ merge-factor
+
+ 10
+
+ Determines how often segment indices are merged.
+
+ 1.9
+
+
+
+ max-field-length
+
+ 10000
+
+ The number of words that are fulltext indexed at most p=
er
+ property.
+
+ 1.9
+
+
+
+ cache-size
+
+ 1000
+
+ Size of the document number cache. This cache maps uuid=
s to
+ lucene document numbers
+
+ 1.9
+
+
+
+ force-consistencycheck
+
+ false
+
+ Runs a consistency check on every startup. If false, a
+ consistency check is only performed when the search index dete=
cts
+ a prior forced shutdown.
+
+ 1.9
+
+
+
+ auto-repair
+
+ true
+
+ Errors detected by a consistency check are automatically
+ repaired. If false, errors are only written to the log.
+
+ 1.9
+
+
+
+ query-class
+
+ QueryImpl
+
+ Class name that implements the javax.jcr.query.Query
+ interface.This class must also extend from the class:
+ org.exoplatform.services.jcr.impl.core.query.AbstractQueryImpl=
.
+
+ 1.9
+
+
+
+ document-order
+
+ true
+
+ If true and the query does not contain an 'order by'
+ clause, result nodes will be in document order. For better
+ performance when queries return a lot of nodes set to
+ 'false'.
+
+ 1.9
+
+
+
+ result-fetch-size
+
+ Integer.MAX_VALUE
+
+ The number of results when a query is executed. Default
+ value: Integer.MAX_VALUE (-> all).
+
+ 1.9
+
+
+
+ excerptprovider-class
+
+ DefaultXMLExcerpt
+
+ The name of the class that implements
+ org.exoplatform.services.jcr.impl.core.query.lucene.ExcerptPro=
vider
+ and should be used for the rep:excerpt() function in a
+ query.
+
+ 1.9
+
+
+
+ support-highlighting
+
+ false
+
+ If set to true additional information is stored in the
+ index to support highlighting using the rep:excerpt()
+ function.
+
+ 1.9
+
+
+
+ synonymprovider-class
+
+ none
+
+ The name of a class that implements
+ org.exoplatform.services.jcr.impl.core.query.lucene.SynonymPro=
vider.
+ The default value is null (-> not set).
+
+ 1.9
+
+
+
+ synonymprovider-config-path
+
+ none
+
+ The path to the synonym provider configuration file. Th=
is
+ path interpreted relative to the path parameter. If there is a
+ path element inside the SearchIndex element, then this path is
+ interpreted relative to the root path of the path. Whether this
+ parameter is mandatory depends on the synonym provider
+ implementation. The default value is null (-> not set).
+
+ 1.9
+
+
+
+ indexing-configuration-path
+
+ none
+
+ The path to the indexing configuration file.
+
+ 1.9
+
+
+
+ indexing-configuration-class
+
+ IndexingConfigurationImpl
+
+ The name of the class that implements
+ org.exoplatform.services.jcr.impl.core.query.lucene.IndexingCo=
nfiguration.
+
+ 1.9
+
+
+
+ force-consistencycheck
+
+ false
+
+ If set to true a consistency check is performed dependi=
ng
+ on the parameter forceConsistencyCheck. If set to false no
+ consistency check is performed on startup, even if a redo log =
had
+ been applied.
+
+ 1.9
+
+
+
+ spellchecker-class
+
+ none
+
+ The name of a class that implements
+ org.exoplatform.services.jcr.impl.core.query.lucene.SpellCheck=
er.
+
+ 1.9
+
+
+
+ spellchecker-more-popular
+
+ true
+
+ If set true - spellchecker return only the suggest words
+ that are as frequent or more frequent than the checked word. If
+ set false, spellchecker return null (if checked word exit in
+ dictionary), or spellchecker will return most close suggest
+ word.
+
+ 1.10
+
+
+
+ spellchecker-min-distance
+
+ 0.55f
+
+ Minimal distance between checked word and proposed sugg=
est
+ word.
+
+ 1.10
+
+
+
+ errorlog-size
+
+ 50(Kb)
+
+ The default size of error log file in Kb.
+
+ 1.9
+
+
+
+ upgrade-index
+
+ false
+
+ Allows JCR to convert an existing index into the new
+ format. Also it is possible to set this property via system
+ property, for example: -Dupgrade-index=3Dtrue Indexes before J=
CR
+ 1.12 will not run with JCR 1.12. Hence you have to run an
+ automatic migration: Start JCR with -Dupgrade-index=3Dtrue. Th=
e old
+ index format is then converted in the new index format. After =
the
+ conversion the new format is used. On the next start you don't
+ need this option anymore. The old index is replaced and a back
+ conversion is not possible - therefore better take a backup of=
the
+ index before. (Only for migrations from JCR 1.9 and
+ later.)
+
+ 1.12
+
+
+
+ analyzer
+
+ org.apache.lucene.analysis.standard.StandardAnalyzer
+
+ Class name of a lucene analyzer to use for fulltext
+ indexing of text.
+
+ 1.12
+
+
+
+
+
+
+
+ Global Search Index
+
+
+ Global Search Index Configuration
+
+ The global search index is configured in the above-mentioned
+ configuration file
+ (portal/WEB-INF/conf/jcr/repository-configuration.xml)
+ in the tag "query-handler".
+
+ <query-handler class=3D"org.exoplatform.services.=
jcr.impl.core.query.lucene.SearchIndex">
+
+ In fact when using Lucene you always should use the same analy=
zer
+ for indexing and for querying - otherwise the results are unpredicta=
ble.
+ You don't have to worry about this, eXo JCR does this for you
+ automatically. If you don't like the StandardAnalyzer configured by
+ default just replace it by your own.
+
+ If you don't have a handy QueryHandler you will learn how crea=
te a
+ customized Handler in 5 minutes.
+
+
+
+ Customized Search Indexes and Analyzers
+
+ By default Exo JCR uses the Lucene standard Analyzer to index
+ contents. This analyzer uses some standard filters in the method that
+ analyzes the content:public TokenStream tokenStream(=
String fieldName, Reader reader) {
+ StandardTokenizer tokenStream =3D new StandardTokenizer(reader, replac=
eInvalidAcronym);
+ tokenStream.setMaxTokenLength(maxTokenLength);
+ TokenStream result =3D new StandardFilter(tokenStream);
+ result =3D new LowerCaseFilter(result);
+ result =3D new StopFilter(result, stopSet);
+ return result;
+ }
+
+ The first one (StandardFilter) removes 's (as 's in
+ "Peter's") from the end of words and removes dots from
+ acronyms.
+
+
+
+ The second one (LowerCaseFilter) normalizes token text to
+ lower case.
+
+
+
+ The last one (StopFilter) removes stop words from a token
+ stream. The stop set is defined in the analyzer.
+
+
+
+ For specific cases, you may wish to use additional filters like
+ ISOLatin1AccentFilter, which replaces accented
+ characters in the ISO Latin 1 character set (ISO-8859-1) by their
+ unaccented equivalents.
+
+ In order to use a different filter, you have to create a new
+ analyzer, and a new search index to use the analyzer. You put it in a
+ jar, which is deployed with your application.
+
+
+ Create the filter
+
+ The ISOLatin1AccentFilter is not present in the current Luce=
ne
+ version used by Exo. You can use the attached file. You can also
+ create your own filter, the relevant method ispubl=
ic final Token next(final Token reusableToken) throws java.io.IOException=
programlisting>which
+ defines how chars are read and used by the filter.
+
+
+
+ Create the analyzer
+
+ The analyzer have to extends
+ org.apache.lucene.analysis.standard.StandardAnalyzer, and overload=
the
+ methodpublic TokenStream tokenStream(String fieldN=
ame, Reader reader)to
+ put your own filters. You can have a glance at the example analyzer
+ attached to this article.
+
+
+
+ Create the search index
+
+ Now, we have the analyzer, we have to write the SearchIndex,
+ which will use the analyzer. Your have to extends
+ org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex. Y=
ou
+ have to write the constructor, to set the right analyzer, and the
+ methodpublic Analyzer getAnalyzer() {
+ return MyAnalyzer;
+ }to return your analyzer. You can see the attached
+ SearchIndex.
+
+
+ Since 1.12 version we can set Analyzer directly in
+ configuration. So, creation new SearchIndex only for new Analyze=
r is
+ redundant.
+
+
+
+
+ Configure your application to use your SearchIndex
+
+ In
+ portal/WEB-INF/conf/jcr/repository-configuration.xml,
+ you have to replace each<query-handler class=3D=
"org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">by
+ your own class<query-handler class=3D"mypackage=
.indexation.MySearchIndex">
+
+
+
+ Configure your application to use your Analyzer
+
+ In
+ portal/WEB-INF/conf/jcr/repository-configuration.xml,
+ you have to add parameter "analyzer" to each query-handler
+ config:<query-handler class=3D"org.exoplatform.=
services.jcr.impl.core.query.lucene.SearchIndex">
+ <properties>
+ ...
+ <property name=3D"analyzer" value=3D"org.exoplatform.services.jcr=
.impl.core.MyAnalyzer"/>
+ ...
+ </properties>
+</query-handler>
+
+ When you start exo, your SearchIndex will start to index
+ contents with the specified filters.
+
+
+
+
+
+ Index Adjustments
+
+
+ IndexingConfiguration
+
+ Starting with version 1.9, the default search index implementa=
tion
+ in JCR allows you to control which properties of a node are indexed.=
You
+ also can define different analyzers for different nodes.
+
+ The configuration parameter is called indexingConfiguration and
+ per default is not set. This means all properties of a node are
+ indexed.
+
+ If you wish to configure the indexing behavior you need to add=
a
+ parameter to the query-handler element in your configuration
+ file.
+
+ <param name=3D"indexing-configuration-path" value=
=3D"/indexing_configuration.xml"/>
+
+
+
+ Index rules
+
+
+ Node Scope Limit
+
+ To optimize the index size you can limit the node scope so t=
hat
+ only certain properties of a node type are
+ indexed.
+
+ With the below configuration only properties named Text are
+ indexed for nodes of type nt:unstructured. This configuration also
+ applies to all nodes whose type extends from nt:unstructured.
+
+ <?xml version=3D"1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
+<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
+ <index-rule nodeType=3D"nt:unstructured">
+ <property>Text</property>
+ </index-rule>
+</configuration>
+
+ Please note that you have to declare the namespace
+ prefixes in the configuration element that you are using
+ throughout the XML file!
+
+
+
+ Index Boost Value
+
+ It is also possible to configure a boost value
+ for the nodes that match the index rule. The default boost value is
+ 1.0. Higher boost values (a reasonable range is 1.0 - 5.0) will yi=
eld
+ a higher score value and appear as more relevant.
+
+ <?xml version=3D"1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
+<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
+ <index-rule nodeType=3D"nt:unstructured"
+ boost=3D"2.0">
+ <property>Text</property>
+ </index-rule>
+</configuration>
+
+ If you do not wish to boost the complete node but only certa=
in
+ properties you can also provide a boost value for the listed
+ properties:<?xml version=3D"1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
+<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
+ <index-rule nodeType=3D"nt:unstructured">
+ <property boost=3D"3.0">Title</property>
+ <property boost=3D"1.5">Text</property>
+ </index-rule>
+</configuration>
+
+
+
+ Conditional Index Rules
+
+ You may also add a condition to the index r=
ule
+ and have multiple rules with the same nodeType. The first index ru=
le
+ that matches will apply and all remaining ones are
+ ignored:<?xml version=3D"1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
+<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
+ <index-rule nodeType=3D"nt:unstructured"
+ boost=3D"2.0"
+ condition=3D"@priority =3D 'high'">
+ <property>Text</property>
+ </index-rule>
+ <index-rule nodeType=3D"nt:unstructured">
+ <property>Text</property>
+ </index-rule>
+</configuration>
+
+ In the above example the first rule only applies if the
+ nt:unstructured node has a priority property with a value 'high'. =
The
+ condition syntax supports only the equals operator and a string
+ literal.
+
+ You may also reference properties in the condition that are =
not
+ on the current node:<?xml version=3D"1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
+<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
+ <index-rule nodeType=3D"nt:unstructured"
+ boost=3D"2.0"
+ condition=3D"ancestor::*/@priority =3D 'high'">
+ <property>Text</property>
+ </index-rule>
+ <index-rule nodeType=3D"nt:unstructured"
+ boost=3D"0.5"
+ condition=3D"parent::foo/@priority =3D 'low'">
+ <property>Text</property>
+ </index-rule>
+ <index-rule nodeType=3D"nt:unstructured"
+ boost=3D"1.5"
+ condition=3D"bar/@priority =3D 'medium'">
+ <property>Text</property>
+ </index-rule>
+ <index-rule nodeType=3D"nt:unstructured">
+ <property>Text</property>
+ </index-rule>
+</configuration>
+
+ The indexing configuration also allows you to specify the ty=
pe
+ of a node in the condition. Please note however that the type match
+ must be exact. It does not consider sub types of the specified node
+ type.
+
+ <?xml version=3D"1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
+<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
+ <index-rule nodeType=3D"nt:unstructured"
+ boost=3D"2.0"
+ condition=3D"element(*, nt:unstructured)/@priority =3D 'high=
'">
+ <property>Text</property>
+ </index-rule>
+</configuration>
+
+
+
+ Exclusion from the Node Scope Index
+
+ Per default all configured properties are fulltext indexed if
+ they are of type STRING and included in the node scope index. A no=
de
+ scope search finds normally all nodes of an index. That is, the se=
lect
+ jcr:contains(., 'foo') returns all nodes that have a string proper=
ty
+ containing the word 'foo'. You can exclude explicitly a property f=
rom
+ the node scope index:<?xml version=3D"1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
+<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
+ <index-rule nodeType=3D"nt:unstructured">
+ <property nodeScopeIndex=3D"false">Text</property>
+ </index-rule>
+</configuration>
+
+
+
+
+ Index Aggregates
+
+ Sometimes it is useful to include the contents of descendant n=
odes
+ into a single node to easier search on content that is scattered acr=
oss
+ multiple nodes.
+
+ JCR allows you to define index aggregates based on relative pa=
th
+ patterns and primary node types.
+
+ The following example creates an index aggregate on nt:file th=
at
+ includes the content of the jcr:content node:<?xm=
l version=3D"1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
+<configuration xmlns:jcr=3D"http://www.jcp.org/jcr/1.0"
+ xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
+ <aggregate primaryType=3D"nt:file">
+ <include>jcr:content</include>
+ </aggregate>
+</configuration>
+
+ You can also restrict the included nodes to a certain
+ type:<?xml version=3D"1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
+<configuration xmlns:jcr=3D"http://www.jcp.org/jcr/1.0"
+ xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
+ <aggregate primaryType=3D"nt:file">
+ <include primaryType=3D"nt:resource">jcr:content</include>
+ </aggregate>
+</configuration>
+
+ You may also use the * to match all child nodes:<?xml version=3D"1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
+<configuration xmlns:jcr=3D"http://www.jcp.org/jcr/1.0"
+ xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
+ <aggregate primaryType=3D"nt:file">http://wiki.exoplatform.com/xwi=
ki/bin/edit/JCR/Search+Configuration
+ <include primaryType=3D"nt:resource">*</include>
+ </aggregate>
+</configuration>
+
+ If you wish to include nodes up to a certain depth below the
+ current node you can add multiple include elements. E.g. the nt:file
+ node may contain a complete XML document under
+ jcr:content:<?xml version=3D"1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
+<configuration xmlns:jcr=3D"http://www.jcp.org/jcr/1.0"
+ xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
+ <aggregate primaryType=3D"nt:file">
+ <include>*</include>
+ <include>*/*</include>
+ <include>*/*/*</include>
+ </aggregate>
+</configuration>
+
+
+
+ Property-Level Analyzers
+
+
+ Example
+
+ In this configuration section you define how a property has =
to
+ be analyzed. If there is an analyzer configuration for a property,
+ this analyzer is used for indexing and searching of this property.=
For
+ example:<?xml version=3D"1.0"?>
+<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing=
-configuration-1.0.dtd">
+<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0">
+ <analyzers> =
+ <analyzer class=3D"org.apache.lucene.analysis.KeywordAnalyzer"&=
gt;
+ <property>mytext</property>
+ </analyzer>
+ <analyzer class=3D"org.apache.lucene.analysis.WhitespaceAnalyze=
r">
+ <property>mytext2</property>
+ </analyzer>
+ </analyzers> =
+</configuration>
+
+ The configuration above means that the property "mytext" for=
the
+ entire workspace is indexed (and searched) with the Lucene
+ KeywordAnalyzer, and property "mytext2" with the WhitespaceAnalyze=
r.
+ Using different analyzers for different languages is particularly
+ useful.
+
+ The WhitespaceAnalyzer tokenizes a property, the KeywordAnal=
yzer
+ takes the property as a whole.
+
+
+
+ Characteristics of Node Scope Searches
+
+ When using analyzers, you may encounter an unexpected behavi=
or
+ when searching within a property compared to searching within a no=
de
+ scope. The reason is that the node scope always uses the global
+ analyzer.
+
+ Let's suppose that the property "mytext" contains the text :
+ "testing my analyzers" and that you haven't configured any analyze=
rs
+ for the property "mytext" (and not changed the default analyzer in
+ SearchIndex).
+
+ If your query is for example:xpath =3D "//*[=
jcr:contains(mytext,'analyzer')]"
+
+ This xpath does not return a hit in the node with the proper=
ty
+ above and default analyzers.
+
+ Also a search on the node scopexpath =3D "//=
*[jcr:contains(.,'analyzer')]"won't
+ give a hit. Realize, that you can only set specific analyzers on a
+ node property, and that the node scope indexing/analyzing is always
+ done with the globally defined analyzer in the SearchIndex
+ element.
+
+ Now, if you change the analyzer used to index the "mytext"
+ property above to<analyzer class=3D"org.apache.=
lucene.analysis.Analyzer.GermanAnalyzer">
+ <property>mytext</property>
+</analyzer>and you do the same search again, then
+ forxpath =3D "//*[jcr:contains(mytext,'analyzer')]=
"you
+ would get a hit because of the word stemming (analyzers -
+ analyzer).
+
+ The other search,xpath =3D "//*[jcr:contains=
(.,'analyzer')]"still
+ would not give a result, since the node scope is indexed with the
+ global analyzer, which in this case does not take into account any
+ word stemming.
+
+ In conclusion, be aware that when using analyzers for specif=
ic
+ properties, you might find a hit in a property for some search tex=
t,
+ and you do not find a hit with the same search text in the node sc=
ope
+ of the property!
+
+
+ Both index rules and index aggregates influence how conten=
t is
+ indexed in JCR. If you change the configuration the existing con=
tent
+ is not automatically re-indexed according to the new rules. You
+ therefore have to manually re-index the content when you change =
the
+ configuration!
+
+
+
+
+
--===============1101572748759254311==--