From do-not-reply at jboss.org Tue Jun 15 11:02:37 2010 Content-Type: multipart/mixed; boundary="===============3021534417893080192==" MIME-Version: 1.0 From: do-not-reply at jboss.org To: exo-jcr-commits at lists.jboss.org Subject: [exo-jcr-commits] exo-jcr SVN: r2610 - jcr/trunk/docs/reference/en/src/main/docbook/en-US/modules/jcr. Date: Tue, 15 Jun 2010 11:02:37 -0400 Message-ID: <201006151502.o5FF2bQx003184@svn01.web.mwc.hst.phx2.redhat.com> --===============3021534417893080192== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Author: sergiykarpenko Date: 2010-06-15 11:02:36 -0400 (Tue, 15 Jun 2010) New Revision: 2610 Modified: jcr/trunk/docs/reference/en/src/main/docbook/en-US/modules/jcr/search-co= nfiguration.xml Log: EXOJCR-787: new prameters added to search configuration docbook Modified: jcr/trunk/docs/reference/en/src/main/docbook/en-US/modules/jcr/se= arch-configuration.xml =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- jcr/trunk/docs/reference/en/src/main/docbook/en-US/modules/jcr/search-c= onfiguration.xml 2010-06-15 14:50:10 UTC (rev 2609) +++ jcr/trunk/docs/reference/en/src/main/docbook/en-US/modules/jcr/search-c= onfiguration.xml 2010-06-15 15:02:36 UTC (rev 2610) @@ -1,774 +1,799 @@ - - - - - - Search Configuration - -
- XML Configuration - - JCR index configuration. You can find this file here: - .../portal/WEB-INF/conf/jcr/repository-configuration.xml - - <repository-service default-repository=3D"db1"> - <repositories> - <repository name=3D"db1" system-workspace=3D"ws" default-workspace= =3D"ws"> - .... - <workspaces> - <workspace name=3D"ws"> - .... - <query-handler class=3D"org.exoplatform.services.jcr.impl.cor= e.query.lucene.SearchIndex"> - <properties> - <property name=3D"index-dir" value=3D"${java.io.tmpdir}/t= emp/index/db1/ws" /> - <property name=3D"synonymprovider-class" value=3D"org.exo= platform.services.jcr.impl.core.query.lucene.PropertiesSynonymProvider" /&g= t; - <property name=3D"synonymprovider-config-path" value=3D"/= synonyms.properties" /> - <property name=3D"indexing-config-path" value=3D"/indexin= g-configuration.xml" /> - <property name=3D"query-class" value=3D"org.exoplatform.s= ervices.jcr.impl.core.query.QueryImpl" /> - </properties> - </query-handler> - ... = - </workspace> - </workspaces> - </repository> = - </repositories> -</repository-service> -
- -
- Configuration parameters - - - - - - - - Parameter - - Default - - Description - - Since - - - - - - index-dir - - none - - The location of the index directory. This parameter is - mandatory. Up to 1.9 this parameter called "indexDir" - - 1.0 - - - - use-compoundfile - - true - - Advises lucene to use compound files for the index - files. - - 1.9 - - - - min-merge-docs - - 100 - - Minimum number of nodes in an index until segments are - merged. - - 1.9 - - - - volatile-idle-time - - 3 - - Idle time in seconds until the volatile index part is m= oved - to a persistent index even though minMergeDocs is not - reached. - - 1.9 - - - - max-merge-docs - - Integer.MAX_VALUE - - Maximum number of nodes in segments that will be merged. - The default value changed in JCR 1.9 to Integer.MAX_VALUE. - - 1.9 - - - - merge-factor - - 10 - - Determines how often segment indices are merged. - - 1.9 - - - - max-field-length - - 10000 - - The number of words that are fulltext indexed at most p= er - property. - - 1.9 - - - - cache-size - - 1000 - - Size of the document number cache. This cache maps uuid= s to - lucene document numbers - - 1.9 - - - - force-consistencycheck - - false - - Runs a consistency check on every startup. If false, a - consistency check is only performed when the search index dete= cts - a prior forced shutdown. - - 1.9 - - - - auto-repair - - true - - Errors detected by a consistency check are automatically - repaired. If false, errors are only written to the log. - - 1.9 - - - - query-class - - QueryImpl - - Class name that implements the javax.jcr.query.Query - interface.This class must also extend from the class: - org.exoplatform.services.jcr.impl.core.query.AbstractQueryImpl= . - - 1.9 - - - - document-order - - true - - If true and the query does not contain an 'order by' - clause, result nodes will be in document order. For better - performance when queries return a lot of nodes set to - 'false'. - - 1.9 - - - - result-fetch-size - - Integer.MAX_VALUE - - The number of results when a query is executed. Default - value: Integer.MAX_VALUE (-> all). - - 1.9 - - - - excerptprovider-class - - DefaultXMLExcerpt - - The name of the class that implements - org.exoplatform.services.jcr.impl.core.query.lucene.ExcerptPro= vider - and should be used for the rep:excerpt() function in a - query. - - 1.9 - - - - support-highlighting - - false - - If set to true additional information is stored in the - index to support highlighting using the rep:excerpt() - function. - - 1.9 - - - - synonymprovider-class - - none - - The name of a class that implements - org.exoplatform.services.jcr.impl.core.query.lucene.SynonymPro= vider. - The default value is null (-> not set). - - 1.9 - - - - synonymprovider-config-path - - none - - The path to the synonym provider configuration file. Th= is - path interpreted relative to the path parameter. If there is a - path element inside the SearchIndex element, then this path is - interpreted relative to the root path of the path. Whether this - parameter is mandatory depends on the synonym provider - implementation. The default value is null (-> not set). - - 1.9 - - - - indexing-configuration-path - - none - - The path to the indexing configuration file. - - 1.9 - - - - indexing-configuration-class - - IndexingConfigurationImpl - - The name of the class that implements - org.exoplatform.services.jcr.impl.core.query.lucene.IndexingCo= nfiguration. - - 1.9 - - - - force-consistencycheck - - false - - If set to true a consistency check is performed dependi= ng - on the parameter forceConsistencyCheck. If set to false no - consistency check is performed on startup, even if a redo log = had - been applied. - - 1.9 - - - - spellchecker-class - - none - - The name of a class that implements - org.exoplatform.services.jcr.impl.core.query.lucene.SpellCheck= er. - - 1.9 - - - - errorlog-size - - 50(Kb) - - The default size of error log file in Kb. - - 1.9 - - - - upgrade-index - - false - - Allows JCR to convert an existing index into the new - format. Also it is possible to set this property via system - property, for example: -Dupgrade-index=3Dtrue Indexes before J= CR - 1.12 will not run with JCR 1.12. Hence you have to run an - automatic migration: Start JCR with -Dupgrade-index=3Dtrue. Th= e old - index format is then converted in the new index format. After = the - conversion the new format is used. On the next start you don't - need this option anymore. The old index is replaced and a back - conversion is not possible - therefore better take a backup of= the - index before. (Only for migrations from JCR 1.9 and - later.) - - 1.12 - - - - analyzer - - org.apache.lucene.analysis.standard.StandardAnalyzer - - Class name of a lucene analyzer to use for fulltext - indexing of text. - - 1.12 - - - -
-
- -
- Global Search Index - -
- Global Search Index Configuration - - The global search index is configured in the above-mentioned - configuration file - (portal/WEB-INF/conf/jcr/repository-configuration.xml) - in the tag "query-handler". - - <query-handler class=3D"org.exoplatform.services.= jcr.impl.core.query.lucene.SearchIndex"> - - In fact when using Lucene you always should use the same analy= zer - for indexing and for querying - otherwise the results are unpredicta= ble. - You don't have to worry about this, eXo JCR does this for you - automatically. If you don't like the StandardAnalyzer configured by - default just replace it by your own. - - If you don't have a handy QueryHandler you will learn how crea= te a - customized Handler in 5 minutes. -
- -
- Customized Search Indexes and Analyzers - - By default Exo JCR uses the Lucene standard Analyzer to index - contents. This analyzer uses some standard filters in the method that - analyzes the content:public TokenStream tokenStream(= String fieldName, Reader reader) { - StandardTokenizer tokenStream =3D new StandardTokenizer(reader, replac= eInvalidAcronym); - tokenStream.setMaxTokenLength(maxTokenLength); - TokenStream result =3D new StandardFilter(tokenStream); - result =3D new LowerCaseFilter(result); - result =3D new StopFilter(result, stopSet); - return result; - } - - The first one (StandardFilter) removes 's (as 's in - "Peter's") from the end of words and removes dots from - acronyms. - - - - The second one (LowerCaseFilter) normalizes token text to - lower case. - - - - The last one (StopFilter) removes stop words from a token - stream. The stop set is defined in the analyzer. - - - - For specific cases, you may wish to use additional filters like - ISOLatin1AccentFilter, which replaces accented - characters in the ISO Latin 1 character set (ISO-8859-1) by their - unaccented equivalents. - - In order to use a different filter, you have to create a new - analyzer, and a new search index to use the analyzer. You put it in a - jar, which is deployed with your application. - -
- Create the filter - - The ISOLatin1AccentFilter is not present in the current Luce= ne - version used by Exo. You can use the attached file. You can also - create your own filter, the relevant method ispubl= ic final Token next(final Token reusableToken) throws java.io.IOExceptionwhich - defines how chars are read and used by the filter. -
- -
- Create the analyzer - - The analyzer have to extends - org.apache.lucene.analysis.standard.StandardAnalyzer, and overload= the - methodpublic TokenStream tokenStream(String fieldN= ame, Reader reader)to - put your own filters. You can have a glance at the example analyzer - attached to this article. -
- -
- Create the search index - - Now, we have the analyzer, we have to write the SearchIndex, - which will use the analyzer. Your have to extends - org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex. Y= ou - have to write the constructor, to set the right analyzer, and the - methodpublic Analyzer getAnalyzer() { - return MyAnalyzer; - }to return your analyzer. You can see the attached - SearchIndex. - - - Since 1.12 version we can set Analyzer directly in - configuration. So, creation new SearchIndex only for new Analyze= r is - redundant. - -
- -
- Configure your application to use your SearchIndex - - In - portal/WEB-INF/conf/jcr/repository-configuration.xml, - you have to replace each<query-handler class=3D= "org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">by - your own class<query-handler class=3D"mypackage= .indexation.MySearchIndex"> -
- -
- Configure your application to use your Analyzer - - In - portal/WEB-INF/conf/jcr/repository-configuration.xml, - you have to add parameter "analyzer" to each query-handler - config:<query-handler class=3D"org.exoplatform.= services.jcr.impl.core.query.lucene.SearchIndex"> - <properties> - ... - <property name=3D"analyzer" value=3D"org.exoplatform.services.jcr= .impl.core.MyAnalyzer"/> - ... - </properties> -</query-handler> - - When you start exo, your SearchIndex will start to index - contents with the specified filters. -
-
-
- -
- Index Adjustments - -
- IndexingConfiguration - - Starting with version 1.9, the default search index implementa= tion - in JCR allows you to control which properties of a node are indexed.= You - also can define different analyzers for different nodes. - - The configuration parameter is called indexingConfiguration and - per default is not set. This means all properties of a node are - indexed. - - If you wish to configure the indexing behavior you need to add= a - parameter to the query-handler element in your configuration - file. - - <param name=3D"indexing-configuration-path" value= =3D"/indexing_configuration.xml"/> -
- -
- Index rules - -
- Node Scope Limit - - To optimize the index size you can limit the node scope so t= hat - only certain properties of a node type are - indexed. - - With the below configuration only properties named Text are - indexed for nodes of type nt:unstructured. This configuration also - applies to all nodes whose type extends from nt:unstructured. - - <?xml version=3D"1.0"?> -<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> -<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> - <index-rule nodeType=3D"nt:unstructured"> - <property>Text</property> - </index-rule> -</configuration> - - Please note that you have to declare the namespace - prefixes in the configuration element that you are using - throughout the XML file! -
- -
- Index Boost Value - - It is also possible to configure a boost value - for the nodes that match the index rule. The default boost value is - 1.0. Higher boost values (a reasonable range is 1.0 - 5.0) will yi= eld - a higher score value and appear as more relevant. - - <?xml version=3D"1.0"?> -<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> -<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> - <index-rule nodeType=3D"nt:unstructured" - boost=3D"2.0"> - <property>Text</property> - </index-rule> -</configuration> - - If you do not wish to boost the complete node but only certa= in - properties you can also provide a boost value for the listed - properties:<?xml version=3D"1.0"?> -<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> -<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> - <index-rule nodeType=3D"nt:unstructured"> - <property boost=3D"3.0">Title</property> - <property boost=3D"1.5">Text</property> - </index-rule> -</configuration> -
- -
- Conditional Index Rules - - You may also add a condition to the index r= ule - and have multiple rules with the same nodeType. The first index ru= le - that matches will apply and all remaining ones are - ignored:<?xml version=3D"1.0"?> -<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> -<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> - <index-rule nodeType=3D"nt:unstructured" - boost=3D"2.0" - condition=3D"@priority =3D 'high'"> - <property>Text</property> - </index-rule> - <index-rule nodeType=3D"nt:unstructured"> - <property>Text</property> - </index-rule> -</configuration> - - In the above example the first rule only applies if the - nt:unstructured node has a priority property with a value 'high'. = The - condition syntax supports only the equals operator and a string - literal. - - You may also reference properties in the condition that are = not - on the current node:<?xml version=3D"1.0"?> -<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> -<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> - <index-rule nodeType=3D"nt:unstructured" - boost=3D"2.0" - condition=3D"ancestor::*/@priority =3D 'high'"> - <property>Text</property> - </index-rule> - <index-rule nodeType=3D"nt:unstructured" - boost=3D"0.5" - condition=3D"parent::foo/@priority =3D 'low'"> - <property>Text</property> - </index-rule> - <index-rule nodeType=3D"nt:unstructured" - boost=3D"1.5" - condition=3D"bar/@priority =3D 'medium'"> - <property>Text</property> - </index-rule> - <index-rule nodeType=3D"nt:unstructured"> - <property>Text</property> - </index-rule> -</configuration> - - The indexing configuration also allows you to specify the ty= pe - of a node in the condition. Please note however that the type match - must be exact. It does not consider sub types of the specified node - type. - - <?xml version=3D"1.0"?> -<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> -<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> - <index-rule nodeType=3D"nt:unstructured" - boost=3D"2.0" - condition=3D"element(*, nt:unstructured)/@priority =3D 'high= '"> - <property>Text</property> - </index-rule> -</configuration> -
- -
- Exclusion from the Node Scope Index - - Per default all configured properties are fulltext indexed if - they are of type STRING and included in the node scope index. A no= de - scope search finds normally all nodes of an index. That is, the se= lect - jcr:contains(., 'foo') returns all nodes that have a string proper= ty - containing the word 'foo'. You can exclude explicitly a property f= rom - the node scope index:<?xml version=3D"1.0"?> -<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> -<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> - <index-rule nodeType=3D"nt:unstructured"> - <property nodeScopeIndex=3D"false">Text</property> - </index-rule> -</configuration> -
-
- -
- Index Aggregates - - Sometimes it is useful to include the contents of descendant n= odes - into a single node to easier search on content that is scattered acr= oss - multiple nodes. - - JCR allows you to define index aggregates based on relative pa= th - patterns and primary node types. - - The following example creates an index aggregate on nt:file th= at - includes the content of the jcr:content node:<?xm= l version=3D"1.0"?> -<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> -<configuration xmlns:jcr=3D"http://www.jcp.org/jcr/1.0" - xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> - <aggregate primaryType=3D"nt:file"> - <include>jcr:content</include> - </aggregate> -</configuration> - - You can also restrict the included nodes to a certain - type:<?xml version=3D"1.0"?> -<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> -<configuration xmlns:jcr=3D"http://www.jcp.org/jcr/1.0" - xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> - <aggregate primaryType=3D"nt:file"> - <include primaryType=3D"nt:resource">jcr:content</include> - </aggregate> -</configuration> - - You may also use the * to match all child nodes:<?xml version=3D"1.0"?> -<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> -<configuration xmlns:jcr=3D"http://www.jcp.org/jcr/1.0" - xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> - <aggregate primaryType=3D"nt:file">http://wiki.exoplatform.com/xwi= ki/bin/edit/JCR/Search+Configuration - <include primaryType=3D"nt:resource">*</include> - </aggregate> -</configuration> - - If you wish to include nodes up to a certain depth below the - current node you can add multiple include elements. E.g. the nt:file - node may contain a complete XML document under - jcr:content:<?xml version=3D"1.0"?> -<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> -<configuration xmlns:jcr=3D"http://www.jcp.org/jcr/1.0" - xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> - <aggregate primaryType=3D"nt:file"> - <include>*</include> - <include>*/*</include> - <include>*/*/*</include> - </aggregate> -</configuration> -
- -
- Property-Level Analyzers - -
- Example - - In this configuration section you define how a property has = to - be analyzed. If there is an analyzer configuration for a property, - this analyzer is used for indexing and searching of this property.= For - example:<?xml version=3D"1.0"?> -<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> -<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> - <analyzers> = - <analyzer class=3D"org.apache.lucene.analysis.KeywordAnalyzer"&= gt; - <property>mytext</property> - </analyzer> - <analyzer class=3D"org.apache.lucene.analysis.WhitespaceAnalyze= r"> - <property>mytext2</property> - </analyzer> - </analyzers> = -</configuration> - - The configuration above means that the property "mytext" for= the - entire workspace is indexed (and searched) with the Lucene - KeywordAnalyzer, and property "mytext2" with the WhitespaceAnalyze= r. - Using different analyzers for different languages is particularly - useful. - - The WhitespaceAnalyzer tokenizes a property, the KeywordAnal= yzer - takes the property as a whole. -
- -
- Characteristics of Node Scope Searches - - When using analyzers, you may encounter an unexpected behavi= or - when searching within a property compared to searching within a no= de - scope. The reason is that the node scope always uses the global - analyzer. - - Let's suppose that the property "mytext" contains the text : - "testing my analyzers" and that you haven't configured any analyze= rs - for the property "mytext" (and not changed the default analyzer in - SearchIndex). - - If your query is for example:xpath =3D "//*[= jcr:contains(mytext,'analyzer')]" - - This xpath does not return a hit in the node with the proper= ty - above and default analyzers. - - Also a search on the node scopexpath =3D "//= *[jcr:contains(.,'analyzer')]"won't - give a hit. Realize, that you can only set specific analyzers on a - node property, and that the node scope indexing/analyzing is always - done with the globally defined analyzer in the SearchIndex - element. - - Now, if you change the analyzer used to index the "mytext" - property above to<analyzer class=3D"org.apache.= lucene.analysis.Analyzer.GermanAnalyzer"> - <property>mytext</property> -</analyzer>and you do the same search again, then - forxpath =3D "//*[jcr:contains(mytext,'analyzer')]= "you - would get a hit because of the word stemming (analyzers - - analyzer). - - The other search,xpath =3D "//*[jcr:contains= (.,'analyzer')]"still - would not give a result, since the node scope is indexed with the - global analyzer, which in this case does not take into account any - word stemming. - - In conclusion, be aware that when using analyzers for specif= ic - properties, you might find a hit in a property for some search tex= t, - and you do not find a hit with the same search text in the node sc= ope - of the property! - - - Both index rules and index aggregates influence how conten= t is - indexed in JCR. If you change the configuration the existing con= tent - is not automatically re-indexed according to the new rules. You - therefore have to manually re-index the content when you change = the - configuration! - -
-
-
-
+ + + + + + Search Configuration + +
+ XML Configuration + + JCR index configuration. You can find this file here: + .../portal/WEB-INF/conf/jcr/repository-configuration.xml + + <repository-service default-repository=3D"db1"> + <repositories> + <repository name=3D"db1" system-workspace=3D"ws" default-workspace= =3D"ws"> + .... + <workspaces> + <workspace name=3D"ws"> + .... + <query-handler class=3D"org.exoplatform.services.jcr.impl.cor= e.query.lucene.SearchIndex"> + <properties> + <property name=3D"index-dir" value=3D"${java.io.tmpdir}/t= emp/index/db1/ws" /> + <property name=3D"synonymprovider-class" value=3D"org.exo= platform.services.jcr.impl.core.query.lucene.PropertiesSynonymProvider" /&g= t; + <property name=3D"synonymprovider-config-path" value=3D"/= synonyms.properties" /> + <property name=3D"indexing-config-path" value=3D"/indexin= g-configuration.xml" /> + <property name=3D"query-class" value=3D"org.exoplatform.s= ervices.jcr.impl.core.query.QueryImpl" /> + </properties> + </query-handler> + ... = + </workspace> + </workspaces> + </repository> = + </repositories> +</repository-service> +
+ +
+ Configuration parameters + + + + + + + + Parameter + + Default + + Description + + Since + + + + + + index-dir + + none + + The location of the index directory. This parameter is + mandatory. Up to 1.9 this parameter called "indexDir" + + 1.0 + + + + use-compoundfile + + true + + Advises lucene to use compound files for the index + files. + + 1.9 + + + + min-merge-docs + + 100 + + Minimum number of nodes in an index until segments are + merged. + + 1.9 + + + + volatile-idle-time + + 3 + + Idle time in seconds until the volatile index part is m= oved + to a persistent index even though minMergeDocs is not + reached. + + 1.9 + + + + max-merge-docs + + Integer.MAX_VALUE + + Maximum number of nodes in segments that will be merged. + The default value changed in JCR 1.9 to Integer.MAX_VALUE. + + 1.9 + + + + merge-factor + + 10 + + Determines how often segment indices are merged. + + 1.9 + + + + max-field-length + + 10000 + + The number of words that are fulltext indexed at most p= er + property. + + 1.9 + + + + cache-size + + 1000 + + Size of the document number cache. This cache maps uuid= s to + lucene document numbers + + 1.9 + + + + force-consistencycheck + + false + + Runs a consistency check on every startup. If false, a + consistency check is only performed when the search index dete= cts + a prior forced shutdown. + + 1.9 + + + + auto-repair + + true + + Errors detected by a consistency check are automatically + repaired. If false, errors are only written to the log. + + 1.9 + + + + query-class + + QueryImpl + + Class name that implements the javax.jcr.query.Query + interface.This class must also extend from the class: + org.exoplatform.services.jcr.impl.core.query.AbstractQueryImpl= . + + 1.9 + + + + document-order + + true + + If true and the query does not contain an 'order by' + clause, result nodes will be in document order. For better + performance when queries return a lot of nodes set to + 'false'. + + 1.9 + + + + result-fetch-size + + Integer.MAX_VALUE + + The number of results when a query is executed. Default + value: Integer.MAX_VALUE (-> all). + + 1.9 + + + + excerptprovider-class + + DefaultXMLExcerpt + + The name of the class that implements + org.exoplatform.services.jcr.impl.core.query.lucene.ExcerptPro= vider + and should be used for the rep:excerpt() function in a + query. + + 1.9 + + + + support-highlighting + + false + + If set to true additional information is stored in the + index to support highlighting using the rep:excerpt() + function. + + 1.9 + + + + synonymprovider-class + + none + + The name of a class that implements + org.exoplatform.services.jcr.impl.core.query.lucene.SynonymPro= vider. + The default value is null (-> not set). + + 1.9 + + + + synonymprovider-config-path + + none + + The path to the synonym provider configuration file. Th= is + path interpreted relative to the path parameter. If there is a + path element inside the SearchIndex element, then this path is + interpreted relative to the root path of the path. Whether this + parameter is mandatory depends on the synonym provider + implementation. The default value is null (-> not set). + + 1.9 + + + + indexing-configuration-path + + none + + The path to the indexing configuration file. + + 1.9 + + + + indexing-configuration-class + + IndexingConfigurationImpl + + The name of the class that implements + org.exoplatform.services.jcr.impl.core.query.lucene.IndexingCo= nfiguration. + + 1.9 + + + + force-consistencycheck + + false + + If set to true a consistency check is performed dependi= ng + on the parameter forceConsistencyCheck. If set to false no + consistency check is performed on startup, even if a redo log = had + been applied. + + 1.9 + + + + spellchecker-class + + none + + The name of a class that implements + org.exoplatform.services.jcr.impl.core.query.lucene.SpellCheck= er. + + 1.9 + + + + spellchecker-more-popular + + true + + If set true - spellchecker return only the suggest words + that are as frequent or more frequent than the checked word. If + set false, spellchecker return null (if checked word exit in + dictionary), or spellchecker will return most close suggest + word. + + 1.10 + + + + spellchecker-min-distance + + 0.55f + + Minimal distance between checked word and proposed sugg= est + word. + + 1.10 + + + + errorlog-size + + 50(Kb) + + The default size of error log file in Kb. + + 1.9 + + + + upgrade-index + + false + + Allows JCR to convert an existing index into the new + format. Also it is possible to set this property via system + property, for example: -Dupgrade-index=3Dtrue Indexes before J= CR + 1.12 will not run with JCR 1.12. Hence you have to run an + automatic migration: Start JCR with -Dupgrade-index=3Dtrue. Th= e old + index format is then converted in the new index format. After = the + conversion the new format is used. On the next start you don't + need this option anymore. The old index is replaced and a back + conversion is not possible - therefore better take a backup of= the + index before. (Only for migrations from JCR 1.9 and + later.) + + 1.12 + + + + analyzer + + org.apache.lucene.analysis.standard.StandardAnalyzer + + Class name of a lucene analyzer to use for fulltext + indexing of text. + + 1.12 + + + +
+
+ +
+ Global Search Index + +
+ Global Search Index Configuration + + The global search index is configured in the above-mentioned + configuration file + (portal/WEB-INF/conf/jcr/repository-configuration.xml) + in the tag "query-handler". + + <query-handler class=3D"org.exoplatform.services.= jcr.impl.core.query.lucene.SearchIndex"> + + In fact when using Lucene you always should use the same analy= zer + for indexing and for querying - otherwise the results are unpredicta= ble. + You don't have to worry about this, eXo JCR does this for you + automatically. If you don't like the StandardAnalyzer configured by + default just replace it by your own. + + If you don't have a handy QueryHandler you will learn how crea= te a + customized Handler in 5 minutes. +
+ +
+ Customized Search Indexes and Analyzers + + By default Exo JCR uses the Lucene standard Analyzer to index + contents. This analyzer uses some standard filters in the method that + analyzes the content:public TokenStream tokenStream(= String fieldName, Reader reader) { + StandardTokenizer tokenStream =3D new StandardTokenizer(reader, replac= eInvalidAcronym); + tokenStream.setMaxTokenLength(maxTokenLength); + TokenStream result =3D new StandardFilter(tokenStream); + result =3D new LowerCaseFilter(result); + result =3D new StopFilter(result, stopSet); + return result; + } + + The first one (StandardFilter) removes 's (as 's in + "Peter's") from the end of words and removes dots from + acronyms. + + + + The second one (LowerCaseFilter) normalizes token text to + lower case. + + + + The last one (StopFilter) removes stop words from a token + stream. The stop set is defined in the analyzer. + + + + For specific cases, you may wish to use additional filters like + ISOLatin1AccentFilter, which replaces accented + characters in the ISO Latin 1 character set (ISO-8859-1) by their + unaccented equivalents. + + In order to use a different filter, you have to create a new + analyzer, and a new search index to use the analyzer. You put it in a + jar, which is deployed with your application. + +
+ Create the filter + + The ISOLatin1AccentFilter is not present in the current Luce= ne + version used by Exo. You can use the attached file. You can also + create your own filter, the relevant method ispubl= ic final Token next(final Token reusableToken) throws java.io.IOExceptionwhich + defines how chars are read and used by the filter. +
+ +
+ Create the analyzer + + The analyzer have to extends + org.apache.lucene.analysis.standard.StandardAnalyzer, and overload= the + methodpublic TokenStream tokenStream(String fieldN= ame, Reader reader)to + put your own filters. You can have a glance at the example analyzer + attached to this article. +
+ +
+ Create the search index + + Now, we have the analyzer, we have to write the SearchIndex, + which will use the analyzer. Your have to extends + org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex. Y= ou + have to write the constructor, to set the right analyzer, and the + methodpublic Analyzer getAnalyzer() { + return MyAnalyzer; + }to return your analyzer. You can see the attached + SearchIndex. + + + Since 1.12 version we can set Analyzer directly in + configuration. So, creation new SearchIndex only for new Analyze= r is + redundant. + +
+ +
+ Configure your application to use your SearchIndex + + In + portal/WEB-INF/conf/jcr/repository-configuration.xml, + you have to replace each<query-handler class=3D= "org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex">by + your own class<query-handler class=3D"mypackage= .indexation.MySearchIndex"> +
+ +
+ Configure your application to use your Analyzer + + In + portal/WEB-INF/conf/jcr/repository-configuration.xml, + you have to add parameter "analyzer" to each query-handler + config:<query-handler class=3D"org.exoplatform.= services.jcr.impl.core.query.lucene.SearchIndex"> + <properties> + ... + <property name=3D"analyzer" value=3D"org.exoplatform.services.jcr= .impl.core.MyAnalyzer"/> + ... + </properties> +</query-handler> + + When you start exo, your SearchIndex will start to index + contents with the specified filters. +
+
+
+ +
+ Index Adjustments + +
+ IndexingConfiguration + + Starting with version 1.9, the default search index implementa= tion + in JCR allows you to control which properties of a node are indexed.= You + also can define different analyzers for different nodes. + + The configuration parameter is called indexingConfiguration and + per default is not set. This means all properties of a node are + indexed. + + If you wish to configure the indexing behavior you need to add= a + parameter to the query-handler element in your configuration + file. + + <param name=3D"indexing-configuration-path" value= =3D"/indexing_configuration.xml"/> +
+ +
+ Index rules + +
+ Node Scope Limit + + To optimize the index size you can limit the node scope so t= hat + only certain properties of a node type are + indexed. + + With the below configuration only properties named Text are + indexed for nodes of type nt:unstructured. This configuration also + applies to all nodes whose type extends from nt:unstructured. + + <?xml version=3D"1.0"?> +<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> +<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> + <index-rule nodeType=3D"nt:unstructured"> + <property>Text</property> + </index-rule> +</configuration> + + Please note that you have to declare the namespace + prefixes in the configuration element that you are using + throughout the XML file! +
+ +
+ Index Boost Value + + It is also possible to configure a boost value + for the nodes that match the index rule. The default boost value is + 1.0. Higher boost values (a reasonable range is 1.0 - 5.0) will yi= eld + a higher score value and appear as more relevant. + + <?xml version=3D"1.0"?> +<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> +<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> + <index-rule nodeType=3D"nt:unstructured" + boost=3D"2.0"> + <property>Text</property> + </index-rule> +</configuration> + + If you do not wish to boost the complete node but only certa= in + properties you can also provide a boost value for the listed + properties:<?xml version=3D"1.0"?> +<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> +<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> + <index-rule nodeType=3D"nt:unstructured"> + <property boost=3D"3.0">Title</property> + <property boost=3D"1.5">Text</property> + </index-rule> +</configuration> +
+ +
+ Conditional Index Rules + + You may also add a condition to the index r= ule + and have multiple rules with the same nodeType. The first index ru= le + that matches will apply and all remaining ones are + ignored:<?xml version=3D"1.0"?> +<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> +<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> + <index-rule nodeType=3D"nt:unstructured" + boost=3D"2.0" + condition=3D"@priority =3D 'high'"> + <property>Text</property> + </index-rule> + <index-rule nodeType=3D"nt:unstructured"> + <property>Text</property> + </index-rule> +</configuration> + + In the above example the first rule only applies if the + nt:unstructured node has a priority property with a value 'high'. = The + condition syntax supports only the equals operator and a string + literal. + + You may also reference properties in the condition that are = not + on the current node:<?xml version=3D"1.0"?> +<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> +<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> + <index-rule nodeType=3D"nt:unstructured" + boost=3D"2.0" + condition=3D"ancestor::*/@priority =3D 'high'"> + <property>Text</property> + </index-rule> + <index-rule nodeType=3D"nt:unstructured" + boost=3D"0.5" + condition=3D"parent::foo/@priority =3D 'low'"> + <property>Text</property> + </index-rule> + <index-rule nodeType=3D"nt:unstructured" + boost=3D"1.5" + condition=3D"bar/@priority =3D 'medium'"> + <property>Text</property> + </index-rule> + <index-rule nodeType=3D"nt:unstructured"> + <property>Text</property> + </index-rule> +</configuration> + + The indexing configuration also allows you to specify the ty= pe + of a node in the condition. Please note however that the type match + must be exact. It does not consider sub types of the specified node + type. + + <?xml version=3D"1.0"?> +<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> +<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> + <index-rule nodeType=3D"nt:unstructured" + boost=3D"2.0" + condition=3D"element(*, nt:unstructured)/@priority =3D 'high= '"> + <property>Text</property> + </index-rule> +</configuration> +
+ +
+ Exclusion from the Node Scope Index + + Per default all configured properties are fulltext indexed if + they are of type STRING and included in the node scope index. A no= de + scope search finds normally all nodes of an index. That is, the se= lect + jcr:contains(., 'foo') returns all nodes that have a string proper= ty + containing the word 'foo'. You can exclude explicitly a property f= rom + the node scope index:<?xml version=3D"1.0"?> +<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> +<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> + <index-rule nodeType=3D"nt:unstructured"> + <property nodeScopeIndex=3D"false">Text</property> + </index-rule> +</configuration> +
+
+ +
+ Index Aggregates + + Sometimes it is useful to include the contents of descendant n= odes + into a single node to easier search on content that is scattered acr= oss + multiple nodes. + + JCR allows you to define index aggregates based on relative pa= th + patterns and primary node types. + + The following example creates an index aggregate on nt:file th= at + includes the content of the jcr:content node:<?xm= l version=3D"1.0"?> +<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> +<configuration xmlns:jcr=3D"http://www.jcp.org/jcr/1.0" + xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> + <aggregate primaryType=3D"nt:file"> + <include>jcr:content</include> + </aggregate> +</configuration> + + You can also restrict the included nodes to a certain + type:<?xml version=3D"1.0"?> +<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> +<configuration xmlns:jcr=3D"http://www.jcp.org/jcr/1.0" + xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> + <aggregate primaryType=3D"nt:file"> + <include primaryType=3D"nt:resource">jcr:content</include> + </aggregate> +</configuration> + + You may also use the * to match all child nodes:<?xml version=3D"1.0"?> +<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> +<configuration xmlns:jcr=3D"http://www.jcp.org/jcr/1.0" + xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> + <aggregate primaryType=3D"nt:file">http://wiki.exoplatform.com/xwi= ki/bin/edit/JCR/Search+Configuration + <include primaryType=3D"nt:resource">*</include> + </aggregate> +</configuration> + + If you wish to include nodes up to a certain depth below the + current node you can add multiple include elements. E.g. the nt:file + node may contain a complete XML document under + jcr:content:<?xml version=3D"1.0"?> +<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> +<configuration xmlns:jcr=3D"http://www.jcp.org/jcr/1.0" + xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> + <aggregate primaryType=3D"nt:file"> + <include>*</include> + <include>*/*</include> + <include>*/*/*</include> + </aggregate> +</configuration> +
+ +
+ Property-Level Analyzers + +
+ Example + + In this configuration section you define how a property has = to + be analyzed. If there is an analyzer configuration for a property, + this analyzer is used for indexing and searching of this property.= For + example:<?xml version=3D"1.0"?> +<!DOCTYPE configuration SYSTEM "http://www.exoplatform.org/dtd/indexing= -configuration-1.0.dtd"> +<configuration xmlns:nt=3D"http://www.jcp.org/jcr/nt/1.0"> + <analyzers> = + <analyzer class=3D"org.apache.lucene.analysis.KeywordAnalyzer"&= gt; + <property>mytext</property> + </analyzer> + <analyzer class=3D"org.apache.lucene.analysis.WhitespaceAnalyze= r"> + <property>mytext2</property> + </analyzer> + </analyzers> = +</configuration> + + The configuration above means that the property "mytext" for= the + entire workspace is indexed (and searched) with the Lucene + KeywordAnalyzer, and property "mytext2" with the WhitespaceAnalyze= r. + Using different analyzers for different languages is particularly + useful. + + The WhitespaceAnalyzer tokenizes a property, the KeywordAnal= yzer + takes the property as a whole. +
+ +
+ Characteristics of Node Scope Searches + + When using analyzers, you may encounter an unexpected behavi= or + when searching within a property compared to searching within a no= de + scope. The reason is that the node scope always uses the global + analyzer. + + Let's suppose that the property "mytext" contains the text : + "testing my analyzers" and that you haven't configured any analyze= rs + for the property "mytext" (and not changed the default analyzer in + SearchIndex). + + If your query is for example:xpath =3D "//*[= jcr:contains(mytext,'analyzer')]" + + This xpath does not return a hit in the node with the proper= ty + above and default analyzers. + + Also a search on the node scopexpath =3D "//= *[jcr:contains(.,'analyzer')]"won't + give a hit. Realize, that you can only set specific analyzers on a + node property, and that the node scope indexing/analyzing is always + done with the globally defined analyzer in the SearchIndex + element. + + Now, if you change the analyzer used to index the "mytext" + property above to<analyzer class=3D"org.apache.= lucene.analysis.Analyzer.GermanAnalyzer"> + <property>mytext</property> +</analyzer>and you do the same search again, then + forxpath =3D "//*[jcr:contains(mytext,'analyzer')]= "you + would get a hit because of the word stemming (analyzers - + analyzer). + + The other search,xpath =3D "//*[jcr:contains= (.,'analyzer')]"still + would not give a result, since the node scope is indexed with the + global analyzer, which in this case does not take into account any + word stemming. + + In conclusion, be aware that when using analyzers for specif= ic + properties, you might find a hit in a property for some search tex= t, + and you do not find a hit with the same search text in the node sc= ope + of the property! + + + Both index rules and index aggregates influence how conten= t is + indexed in JCR. If you change the configuration the existing con= tent + is not automatically re-indexed according to the new rules. You + therefore have to manually re-index the content when you change = the + configuration! + +
+
+
+
--===============3021534417893080192==--