From do-not-reply at jboss.org Wed May 8 01:35:05 2013 Content-Type: multipart/mixed; boundary="===============2260912015684865299==" MIME-Version: 1.0 From: do-not-reply at jboss.org To: gatein-commits at lists.jboss.org Subject: [gatein-commits] gatein SVN: r9273 - in epp/docs/JPP/trunk/Reference_Guide/en-US: modules and 1 other directory. Date: Wed, 08 May 2013 01:35:05 -0400 Message-ID: <201305080535.r485Z5f4004398@svn01.web.mwc.hst.phx2.redhat.com> --===============2260912015684865299== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Author: jaredmorgs Date: 2013-05-08 01:35:04 -0400 (Wed, 08 May 2013) New Revision: 9273 Modified: epp/docs/JPP/trunk/Reference_Guide/en-US/Reference_Guide.xml epp/docs/JPP/trunk/Reference_Guide/en-US/modules/eXoJCR.xml Log: eXo JCR portion commented out of the guide Modified: epp/docs/JPP/trunk/Reference_Guide/en-US/Reference_Guide.xml =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- epp/docs/JPP/trunk/Reference_Guide/en-US/Reference_Guide.xml 2013-05-08= 04:22:10 UTC (rev 9272) +++ epp/docs/JPP/trunk/Reference_Guide/en-US/Reference_Guide.xml 2013-05-08= 05:35:04 UTC (rev 9273) @@ -8,14 +8,11 @@ - - = - + Web Services for Remote Portlets (WSRP) - - + Modified: epp/docs/JPP/trunk/Reference_Guide/en-US/modules/eXoJCR.xml =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- epp/docs/JPP/trunk/Reference_Guide/en-US/modules/eXoJCR.xml 2013-05-08 = 04:22:10 UTC (rev 9272) +++ epp/docs/JPP/trunk/Reference_Guide/en-US/modules/eXoJCR.xml 2013-05-08 = 05:35:04 UTC (rev 9273) @@ -5,29 +5,5629 @@ ]> The Java Content Repository (JCR) - - - - - - - - - - - - - - - - - - + + Introduction + + eXo JCR usage + + The JBoss Portal Platform is using a JCR API to store its info= rmation for internal usage. We do not support usage of the JCR to store app= lication information. + + + The information below is intended to assist users to understan= d particular low level details on how the JBoss Portal Platform works and h= ow it can be fine-tuned. + + + + The term JCR refers to the Java= Content Repository. The JCR is the data store of JBoss Portal Platform. Al= l content is stored and managed via the JCR. + + + The eXo JCR included with JBoss Portal Platform &VY; is a (JSR-170) compliant implementation of the JCR 1.0 specification. The JCR provid= es versioning, textual search, access control, content event monitoring, an= d is used to storing text and binary data for the portal internal usage. Th= e back-end storage of the JCR is configurable and can be a file system or a= database. + +
+ Concepts + + + Repository + + + A repository is a form of data storage device. A &= apos;repository' differs from a 'database' in the nature of = the information contained. While a database holds hard data in rigid tables= , a repository may access the data on a database by using less rigid meta-data. In this sense a repository operates as an 'i= nterpreter' between the database(s) and the user. + + + + The data model for the interface (the reposito= ry) is rarely the same as the data model used by the repository's unde= rlying storage subsystems (such as a database), however the repository is a= ble to make persistent data changes in the storage subsystem. + + + + + + Workspace + + + The eXo JCR uses 'workspaces' as the mai= n data abstraction in its data model. The content is stored in a workspace = as a hierarchy of items and each workspace has its own= hierarchy of items. + + + Repositories access one or more workspaces. Persis= tent JCR workspaces consist of a directed acyclic graph of items<= /emphasis> where the edges represent the parent-child relation. + + + + + Items + + + An item is either a node or a property. Properties contain the= data (either simple values or binary data). The nodes of a workspace give = it its structure while the properties hold the data itself. + + + + Nodes + + + Nodes are identified using accepted namespacing conventions. Changed nodes may be versioned = through an associated version graph to preserve data integrity. + + + Nodes can have various properties or c= hild nodes associated to them. + + + + + Properties + + + Properties hold data as values of pred= efined types, such as: String, Binary, Long, = Boolean, Double<= /emphasis>, Date, Reference and Path. + + + + + + + + The Data Model + + + The core of any Content Repository is the data mod= el. The data model defines the 'data elements' (fields, columns, = attributes, etc.) that are stored in the CR and the relationships between t= hese elements. + + + Data elements can be singular pieces of informatio= n (the value 3.14, for example), or compound values ('pi' =3D 3.14). A data model uses concepts like 'nodes',= 'arrays' and 'links' to define relationships between d= ata elements. + + + The use and structure of these elements forms the = content repository's 'data model'. + + + + + Data Abstraction + + + Data abstraction describes the separation between = abstract and concrete properties = of data stored in a repository. The concrete propertie= s of the data refer to its implementation details. + + + The concrete properties of th= e data implementation may be changed without affecting the abstra= ct properties of the data itself, which are read by the data cli= ent. + + + Consider the presentation of data in a list, graph= or table. While the information implementation may ch= ange, the data itself is unaffected, and readers to whom the data is presen= ted can perform a mental abstraction to interpret it correctly, regardless = of the implementation. + + + + +
+
+ + Multi-language Support + + Whenever a relational database is used to store multilingual text = data in the eXo Java Content Repository the configuration must be adapted t= o support UTF-8 encoding. Dialect is automatically detected for certified d= atabase. You can still enforce it in case of failure, see below. + + + The following sections describe enabling UTF-8 support with variou= s databases. + + + + + + + + + + + + + + + + + + + + + + + + + + + NEEDINFO - FILE PATHS - The path needs to be updated wit= h the equivalent path for JBoss Portal Platform instead of gatein, please s= ee below para. New info required? + + The configuration file to be modified for these change= s is JPP_HOME/gatein/gatein.ear/portal= .war/WEB-INF/conf/jcr/repository-configuration.xml. + + + + + The datasource jdbcjcr used in = the following examples can be configured via the InitialContextIni= tializer component. + + + + +
+ Oracle + + In order to run multilanguage JCR on an Oracle backend Unicode= encoding for characters set should be applied to the database. Other Oracl= e globalization parameters do not have any effect. The property to modify i= s NLS_CHARACTERSET. + + + The NLS_CHARACTERSET =3D AL32UTF8 entry has= been successfully tested with many European and Asian languages. + + + Example of database configuration: + + NLS_LANGUAGE AMERICAN +NLS_TERRITORY AMERICA +NLS_CURRENCY $ +NLS_ISO_CURRENCY AMERICA +NLS_NUMERIC_CHARACTERS ., +NLS_CHARACTERSET AL32UTF8 +NLS_CALENDAR GREGORIAN +NLS_DATE_FORMAT DD-MON-RR +NLS_DATE_LANGUAGE AMERICAN +NLS_SORT BINARY +NLS_TIME_FORMAT HH.MI.SSXFF AM +NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM +NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR +NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR +NLS_DUAL_CURRENCY $ +NLS_COMP BINARY +NLS_LENGTH_SEMANTICS BYTE +NLS_NCHAR_CONV_EXCP FALSE +NLS_NCHAR_CHARACTERSET AL16UTF16 + + Create database with Unicode encoding and use Oracle dialect f= or the Workspace Container: + + +
+
+ DB2 + + DB2 Universal Database (DB2 UDB) supports UTF-8 and UTF-16/UCS-2. When a Uni= code database is created, CHAR, VARCHAR and LONG VARCHAR data are stored in UTF-8= form. + + + This enables JCR multi-lingual support. + + + Below is an example of creating a UTF-8 database using the db2 dialect for a workspace container with DB2 version = 9 and higher: + + DB2 CREATE DATABASE dbname USING CODESET UTF-8 TERRI= TORY US + + + + + For DB2 version 8.x support cha= nge the property "dialect" to db2v8. + + +
+
+ MySQL + + Using JCR with a MySQL-back end requires a special dialect MySQL-UTF8 t= o be used for internationalization support. + + + The database default charset should be latin1 so as to use limited index space effectively (767 for InnoD= B). + + + If the database default charset is multibyte, a JCR database i= nitialization error is encountered concerning index creation failure. + + + JCR can work on any single byte default charset of database, w= ith UTF8 supported by MySQL server. However it has only been tested using t= he latin1 charset. + + + An example entry: + + +
+
+ PostgreSQL + + Multilingual support can be enabled with a PostgreSQL-back end= in different ways: + + + + + Using the locale features of the operating system to p= rovide locale-specific collation order, number formatting, translated messa= ges, and other aspects. + + + UTF-8 is widely used on Linux distributions by default= , so it can be useful in such cases. + + + + + Providing a number of different character sets defined= in the PostgreSQL server, including multiple-byte character sets, to suppo= rt storing text any language, and providing character set translation betwe= en client and server. + + + Using UTF-8 database charset is recommended as it will= allow any-to-any conversations and make this issue transparent for the JCR. + + + + + Example of a database with UTF-8 encoding using PgSQL dialect = for the Workspace Container: + + +
+
+ + Configuring Search + + The search function in JCR can be configured to perform in specifi= c ways. This section will discuss configuring the search function to improv= e search performance and results. + + + Below is an example of the configuration file that governs search = behaviors. Refer to for how searching operates in JCR and informatio= n about customized searches. + + + The JCR index configuration file is located at JPP_HOME/gatein/gatein.ear/portal.war/WEB-INF/conf/jcr/re= pository-configuration.xml. + + + A code example is included below with a list of the configuration = parameters shown below that. + + + + The table below outlines some o= f the Configuration Parameters available, their default setting, which vers= ion of eXo JCR they were implemented in and other useful information (furth= er parameters are explained in ): + + +Configuration= parameters + + + + + + + + + + Parameter + + + + + Default + + + + + Description + + + Implemented in Version + + + + + + + index-dir + + + + + none + + + + + The location of the index directory. This para= meter is mandatory. It is called "indexDir" in= versions prior to eXo JCR version 1.9. + + + 1.0 + + + + + use-compoundfile + + + + + true + + + + + Advises lucene to use compound files for the i= ndex files. + + + 1.9 + + + + + min-merge-docs + + + + + 100 + + + + + The minimum number of nodes in an index until = segments are merged. + + + 1.9 + + + + + volatile-idle-time + + + 3 + + + Idle time in seconds until the volatile index = part is moved to a persistent index even though minMergeDocs is not reached. + + + 1.9 + + + + + max-merge-docs + + + + + Integer.MAX_VALUE + + + + + The maximum number of nodes in segments that w= ill be merged. The default value changed to Integer.MAX_VALUE in eXo JCR version 1.9. + + + 1.9 + + + + + merge-factor + + + + + 10 + + + + + Determines how often segment indices are merge= d. + + + 1.9 + + + + + max-field-length + + + + + 10000 + + + + + The number of words that are full-text indexed= at most per property. + + + 1.9 + + + + + cache-size + + + + + 1000 + + + + + Size of the document number cache. This cache = maps UUID to lucene document numbers + + + 1.9 + + + + + force-consistencycheck + + + + + false + + + + + Runs a consistency check on every start up. If= false, a consistency check is only performed when the search index detects= a prior forced shutdown. + + + 1.9 + + + + + auto-repair + + + + + true + + + + + Errors detected by a consistency check are aut= omatically repaired. If false, errors are only written to the log. + + + 1.9 + + + query-class + QueryImpl + + + Classname that implements the javax.jcr.query.= Query interface. + + + This class must also extend from the class: org.exoplatform.services.jcr.impl.core. query.AbstractQueryImpl. + + + 1.9 + + + + + document-order + + + + + true + + + + + If true and the query does not contain an &apo= s;order by' clause, result nodes will be in document order. For better= performance set to 'false' when queries return many nodes. + + + 1.9 + + + + + result-fetch-size + + + + + Integer.MAX_VALUE + + + + + The number of results when a query is executed= . Default value: Integer.MAX_VALUE. + + + 1.9 + + + + + excerptprovider-class + + + + + DefaultXMLExcerpt + + + + + The name of the class that implements org.exoplatform.services.jcr.impl.core. query.lucene.ExcerptProvider. + + + This should be used for the rep:excer= pt() function in a query. + + + 1.9 + + + + + support-highlighting + + + + + false + + + + + If set to true additional information is store= d in the index to support highlighting using the rep:excerpt() function. + + + 1.9 + + + + + synonymprovider-class + + + + + none + + + + + The name of a class that implements o= rg.exoplatform.services.jcr.impl.core. query.lucene.SynonymProvider. + + + The default value is null. + + + 1.9 + + + + + synonymprovider-config-path + + + + + none + + + + + The path to the synonym provider configuration= file. This path is interpreted relative to the path parameter. If there is= a path element inside the SearchIndex element, then thi= s path is interpreted relative to the root path of the path. Whether this p= arameter is mandatory depends on the synonym provider implementation. The d= efault value is null. + + + 1.9 + + + + + indexing-configuration-path + + + + + none + + + + + The path to the indexing configuration file. + + + 1.9 + + + + + indexing-configuration-class + + + + + IndexingConfigurationImpl + + + + + The name of the class that implements org.exoplatform.services.jcr.impl.core. query.lucene.IndexingConfiguration= . + + + 1.9 + + + + + force-consistencycheck + + + + + false + + + + + If set to true a consistency check is performe= d depending on the parameter forceConsistencyCheck. If s= et to false no consistency check is performed on start up, even if a redo l= og had been applied. + + + 1.9 + + + + + spellchecker-class + + + + + none + + + + + The name of a class that implements o= rg.exoplatform.services.jcr.impl.core. query.lucene.SpellChecker. + + + 1.9 + + + + + errorlog-size + + + + + 50(KB) + + + + + The default size of error log file in KB. + + + 1.9 + + + + + upgrade-index + + + + + false + + + + + Allows JCR to convert an existing index into t= he new format. It is also possible to set this property via system property. + + + Indexes prior to eXo JCR 1.12 will not run wit= h eXo JCR 1.12. You must run an automatic migration. + + + Start eXo JCR with: + + -Dupgrade-index=3Dtrue + + The old index format is then converted in the = new index format. After the conversion the new format is used. + + + On subsequent starts this option is no longer = needed. The old index is replaced and a back conversion is not possible + + + It is recommended that a backup of the index b= e made before conversion. (Only for migrations from JCR 1.9 and later.) + + + 1.12 + + + + + analyzer + + + + + org.apache.lucene.analysis. standard.StandardA= nalyzer + + + + + Class name of a lucene analyzer to use for ful= l-text indexing of text. + + + 1.12 + + + +
+
+ Global Search Index + + By default eXo JCR uses the Lucene standard Analyzer to index = contents. This analyzer uses some standard filters in the method that analy= zes the content + + + Standard Analyzed Filters + + + Comment #1: The first filter (StandardFilter) remo= ves possessive apostrophes ('s) fro= m the end of words and removes periods (.) from acronyms. + + + Comment #2: The second filter (LowerCaseFilter) no= rmalizes token text to lower case. + + + Comment #3: The last filter (StopFilter) removes s= top words from a token stream. The stop set is defined in the analyzer. + + + + The global search index is configured in the JPP_HOME/gatein/gatein.ear/portal.war/WEB-INF/conf/jcr/= repository-configuration.xml configuration file within the "= ;query-handler" tag. + + <query-handler clas= s=3D"org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex&q= uot;> + + + The same analyzer should always be used for indexing and for q= uerying in lucene otherwise results may be unpredictable. eXo JCR does this= automatically. The StandardAnalyzer (configured by default) can, however, = be replaced with another. + + + A customized QueryHandler can also be easily created. + + + Customized Search Indexes and Analyzers + + By default Exo JCR uses the Lucene standard Analyzer to in= dex contents. This analyzer uses some standard filters in the method that a= nalyzes the content: + + + public TokenStream t= okenStream(String fieldName, Reader reader) { + StandardTokenizer tokenStream =3D new StandardTokenizer(reader, replac= eInvalidAcronym); + tokenStream.setMaxTokenLength(maxTokenLength); + TokenStream result =3D new StandardFilter(tokenStream); + result =3D new LowerCaseFilter(result); + result =3D new StopFilter(result, stopSet); + return result; + } + + + + The first one (StandardFilter) removes 's (as &ap= os;s in "Peter's") from the end of words and removes dots fr= om acronyms. + + + + + The second one (LowerCaseFilter) normalizes token text= to lower case. + + + + + The last one (StopFilter) removes stop words from a to= ken stream. The stop set is defined in the analyzer. + + + + + Additional filters can be used in specific cases. The = ISOLatin1AccentFilter filter, for example, which replaces accented= characters in the ISO Latin 1 character set (ISO-8859-1) by their unaccent= ed equivalents. + + + The ISOLatin1AccentFilter is not present in t= he current lucene version used by eXo. + + + In order to use a different filter, a new analyzer must be cre= ated, as well as new search index to use the analyzer. These are packaged i= nto a jar file, which is then deployed with the application. + + + Create a new filter, analyzer and search index + + + Create a new filter with the method: + + public final Tok= en next(final Token reusableToken) throws java.io.IOException + + + This defines how characters are read and used by the f= ilter. + + + + + Create the analyzer. + + + The analyzer must extend org.apache.lucene.an= alysis.standard.StandardAnalyzer and overload the method. + + + Use the following to use new filters. + + public TokenStre= am tokenStream(String fieldName, Reader reader) + + + + + To create the new search index, extend org.ex= oplatform.services.jcr.impl.core.query.lucene.SearchIndex and wri= te the constructor to set the correct analyzer. + + + Use the method below to return your analyzer: + + public Analyzer = getAnalyzer() { +return MyAnalyzer; +} + + + + + + In eXo JCR version 1.12 (and later) the analyzer can be di= rectly set in the configuration. For users with this version the creation o= f a new SearchIndex for new analyzers is redundant. + + + + To configure an application to use a new SearchIndex<= /literal>, replace the following code: + + <query-handler clas= s=3D"org.exoplatform.services.jcr.impl.core.query.lucene.SearchIndex&q= uot;> + + + + in JPP_HOME/gatein/gatein= .ear/portal.war/WEB-INF/conf/jcr/repository-configuration.xml wi= th the new class: + + <query-handler clas= s=3D"mypackage.indexation.MySearchIndex> + + + + To configure an application to use a new analyzer, add the analyzer parameter to each query-handler configuration = in JPP_HOME/gatein/gatein.ear/portal.w= ar/WEB-INF/conf/jcr/repository-configuration.xml: + + + + The new SearchIndex will start to index con= tents with the specified filters when the JCR is next started. + +
+
+ IndexingConfiguration + + From version 1.9, the default search index implementation in J= CR allows user control over which properties of a node are indexed. Differe= nt analyzers can also be set for different nodes. + + + The configuration parameter is called indexingConfigu= ration and is not set by default. This means all properties of a = node are indexed. + + + To configure the indexing behavior add a parameter to the quer= y-handler element in your configuration file. + + <param name=3D"= ;indexing-configuration-path" value=3D"/indexing_configuration.xm= l"/> + + + + Node Scope Limit + + The node scope can be limited so that only certain propert= ies of a node type are indexed. This can optimize the index size. + + + + With the configuration below only properties named = Text are indexed for nt:unstructured nod= e types. This configuration also applies to all nodes whose type extends fr= om nt:unstructured. + + + + Namespace Prefixes + + The namespace prefixes must be declared t= hroughout the XML file in the configuration element that is being used. + + + + Indexing Boost Value + + It is also possible to configure a boost value for the nodes that match the index rule. The default boost value is 1= .0. Higher boost values (a reasonable range is 1.0 - 5.0) will yield a high= er score value and appear as more relevant. + + + + + If you do not wish to boost the complete node, but only certai= n properties, you can also provide a boost value for the listed properties: + + + + Conditional Index Rules + + You may also add a condition to the index= rule and have multiple rules with the same nodeType. The first index rule = that matches will apply and all remaining ones are ignored: + + + + + In the above example the first rule only applies if the nt:unstructured node has a priority property with a value = high. The condition syntax only supports the equals = operator and a string literal. + + + Properties may also be referenced on the condition that are no= t on the current node: + + + + The indexing configuration allows the type of a node in the co= ndition to be specified. Please note however that the type match must be ex= act. It does not consider sub types of the specified node type. + + + + Exclusion from the Node Scope Index + + All configured properties are full-text indexed by default= (if they are of type STRING and included in the node scope index). + + + + A node scope search normally finds all nodes of an index. That= is to say; jcr:contains(., 'foo') returns all= nodes that have a string property containing the word 'f= oo'. + + + Properties can be explicitly excluded from the node scope inde= x with: + + + + Index Aggregates + + Sometimes it is useful to include the contents of descenda= nt nodes into a single node to more easily search on content that is scatte= red across multiple nodes. + + + + JCR allows the definition of index aggregates based on relativ= e path patterns and primary node types. + + + The following example creates an index aggregate on n= t:file that includes the content of the jcr:content node: + + + + Included nodes can also be restricted to a certain type: + + + + The * wild-card can be used= to match all child nodes: + + + + Nodes to a certain depth below the current node can be include= d by adding multiple include elements. The nt:file n= ode may contain a complete XML document under jcr:content for example: + + + + Property-Level Analyzers + + How a property has to be analyzed can be defined in the fo= llowing configuration section. If there is an analyzer configuration for a = property, this analyzer is used for indexing and searching of this property= . For example: + + + + + The configuration above sets lucene Ke= ywordAnalyzer to index and search the property "mytext" across the entire workspace while the "mytext2" property is searched with the WhitespaceAnalyzer. + + + The WhitespaceAnalyzer toke= nizes a property, the KeywordAnalyzer ta= kes the property as a whole. + + + Using different analyzers for different languages can be parti= cularly useful. + + + Characteristics of Node Scope Searches + + Unexpected behavior may be encountered when using analyzer= s to search within a property compared to searching wi= thin a node scope. This is because the node scope alwa= ys uses the global analyzer. + + + + For example: the property "mytext&= quot; contains the text; "testing my analyzers&qu= ot; but no analyzers have been configured for this property (and the defaul= t analyzer in SearchIndex has not been changed). + + + If the query is: + + xpath =3D "//*[= jcr:contains(mytext,'analyzer')]" + + + The xpath does not return a result in the n= ode with the property above and default analyzers. + + + Also, if a search is done on the node scope as follows: + + xpath =3D "//*[= jcr:contains(.,'analyzer')]" + + + No result will be returned. + + + Only specific analyzers can be set on a node property, and the= node scope indexing and analyzing is always done with the globally defined= analyzer in the SearchIndex element. + + + If the analyzer used to index the "mytext" property = above is changed to: + + <analyzer class=3D&= quot;org.apache.lucene.analysis.Analyzer.GermanAnalyzer"> +<property>mytext</property> +</analyzer> + + + The search below would return a result because of the word ste= mming (analyzers - analyzer). + + xpath =3D "//*[= jcr:contains(mytext,'analyzer')]" + + + The second search in the example: + + xpath =3D "//*[= jcr:contains(.,'analyzer')]" + + + Would still not give a result, since the node scope is indexed= with the global analyzer, which in this case does not take into account an= y word stemming. + + + Be aware that when using analyzers for specific properties, a = result may be found in a property for certain search text, but the same sea= rch text in the node scope of the property may not find a result. + + + + Both index rules and index aggregates influence how conten= t is indexed in JCR. If the configuration is changed, the existing content = is not automatically re-indexed according to the new rules. + + + Content must be manually re-indexed when the configuration= is changed. + + +
+
+ Advanced features + + eXo JCR supports some advanced features, which are not specifi= ed in JSR 170: + + + + + Get a text excerpt with highli= ghted words that matches the query: >. + + + + + Search a term and its synonyms= : . + + + + + Search similar node= s: . + + + + + Check spelling of a= full text query statement: . + + + + + Define index aggregates and ru= les: IndexingConfiguration. + + + +
+
+ + Configuring the JDBC Data Container +
+ Introduction + + eXo JCR persistent data container can work in two configuratio= n modes: + + + + + Multi-database: One database for each= workspace (used in standalone eXo JCR service mode) + + + + + Single-database: All workspaces persi= sted in one database (used in embedded eXo JCR service mode, e.g. in eXo po= rtal) + + + + + The data container uses the JDBC driver to communicate with th= e actual database software, i.e. any JDBC-enabled data storage can be used = with eXo JCR implementation. + + + Currently the data container is tested with the following RDBM= S: + + + Supported databases + + + + Database + Driver Version + + + + + IBM DB2 9.7 (FP5) + IBM DB2 JDBC Universal Driver Architecture 4.13.80 <= /entry> + + + Oracle 11g R1 (11.1.0.7.0) + Oracle JDBC Driver 11.1.0.7 + + + Oracle 11g R1 RAC (11.1.0.7.0) + Oracle JDBC Driver 11.1.0.7 + + + Oracle 11g R2 (11.2.0.3.0) + Oracle JDBC Driver v11.2.0.3.0 + + + Oracle 11g R2 RAC (11.2.0.3.0) + Oracle JDBC Driver v11.2.0.3.0 + + + MySQL 5.1 + MySQL Connector/J 5.1.21 + + + MySQL 5.5 + MySQL Connector/J 5.1.21 + + + Microsoft SQL Server 2008 + Microsoft SQL Server JDBC Driver 3.0.1301.101, Micro= soft SQL Server JDBC Driver 4.0.2206.100 + + + Microsoft SQL Server 2008 R2 + Microsoft SQL Server JDBC Driver 3.0.1301.101, Micro= soft SQL Server JDBC Driver 4.0.2206.100 + + + PostgreSQL 8.4.8 + JDBC4 Postgresql Driver, Version 8.4-703 + + + PostgreSQL 9.1.0 + JDBC4 Postgresql Driver, Version 9.1-903 + + + Sybase ASE 15.7 + Sybase jConnect JDBC driver v7 + + + +
+ + Isolation Levels + + The JCR requires at least the READ_COMMITED isolation level and other RDBMS configurations can cause some side= -effects and issues. So, please, make sure proper isolation level is config= ured on database server side. + + + + + One more mandatory JCR requirement for underlying database= s is a case sensitive collation. Microsoft SQL Server both 2005 and 2008 cu= stomers must configure their server with collation corresponding to persona= l needs and requirements, but obligatorily case sensitive. For more informa= tion please refer to Microsoft SQL Server documentation page "Selectin= g a SQL Server Collation" here. + + + + + Be aware that JCR does not support MyISAM storage engine f= or the MySQL relational database management system. + + + + Each database software supports ANSI SQL standards but also ha= s its own specifics. Therefore each database has its own configuration sett= ing in the eXo JCR as a database dialect parameter. More detailed configura= tion of the database can be set by editing the metadata SQL-script files. + + NEEDINFO - FILE PATHS - The path needs to be updated with th= e equivalent path for JBoss Portal Platform instead of gatein, please see b= elow para. New info required? + + You can find SQL-scripts in conf/storage/= directory of the JPP_HOME/modules/org= /gatein/lib/main/exo.jcr.component.core-&JCR_VERSION;.jar file . + + + The following tables show the correspondence between the scrip= ts and databases: + + + Single-database + + + + Database + Script + + + + + MySQL DB + + jcr-sjdbc.mysql.sql + + + + MySQL DB with utf-8 + + jcr-sjdbc.mysql-utf8.sql + + + + PostgresSQL + + jcr-sjdbc.pqsql.sql + + + + Oracle DB + + jcr-sjdbc.ora.sql + + + + DB2 9.7 + + jcr-sjdbc.db2.sql + + + + MS SQL Server + + jcr-sjdbc.mssql.sql + + + + Sybase + + jcr-sjdbc.sybase.sql + + + + HSQLDB + + jcr-sjdbc.sql + + + + +
+ + Multi-database + + + + Database + Script + + + + + MySQL DB + + jcr-mjdbc.mysql.sql + + + + MySQL DB with utf-8 + + jcr-mjdbc.mysql-utf8.sql + + + + PostgresSQL + + jcr-mjdbc.pqsql.sql + + + + Oracle DB + + jcr-mjdbc.ora.sql + + + + DB2 9.7 + + jcr-mjdbc.db2.sql + + + + MS SQL Server + + jcr-mjdbc.mssql.sql + + + + Sybase + + jcr-mjdbc.sybase.sql + + + + HSQLDB + + jcr-mjdbc.sql + + + + +
+ + If a non-ANSI node name is used, you must use a database with = MultiLanguage support. Some JDBC drivers need additional parameters for est= ablishing a Unicode friendly connection. For example under mysql it is nece= ssary to add an additional parameter for the JDBC driver at the end of JDBC= URL: + + + There are preconfigured configuration files for HSQLDB. Look f= or these files in /conf/portal and /conf/standalone folders of the jar-file= exo.jcr.component.core-&JCR_VERSION;.jar or source-dist= ribution of eXo JCR implementation. + + + Example Parameter + jdbc:mysql://exoua.dnsalias.net/portal?chara= cterEncoding=3Dutf8 + + + The configuration files are located in service jars = /conf/portal/configuration.xml (eXo services including JCR Repos= itory Service) and exo-jcr-config.xml (repositories co= nfiguration) by default. In JBoss Portal Platform, the JCR is configured in= portal web application portal/WEB-INF/conf/jcr/jcr-configuration= .xml (JCR Repository Service and related services) and repository-configuration.xml (repositories configuration). + + + Read more about . + +
+
+ Multi-database Configuration + + You need to configure each workspace in a repository as part o= f multi-database configuration. Databases may reside on remote servers as r= equired. + + + + <step> + <para> + Configure the data containers in the <literal>org.exop= latform.services.naming.InitialContextInitializer</literal> service. It&apo= s;s the JNDI context initializer which registers (binds) naming resources (= DataSources) for data containers. + </para> + <para> + For example (two data containers <parameter>jdbcjcr</p= arameter> - local HSQLDB, <parameter>jdbcjcr1</parameter> - remote MySQL): + </para> + <programlisting language=3D"XML" role=3D"XML"> +<xi:include xmlns:xi=3D"http://www.w3.org/2001/XInclude" href=3D"../../../= ../extras/Advanced_Development_JCR_Configuration/example-1.xml" parse=3D"te= xt"/></programlisting> + <substeps> + <step> + <para> + Configure the database connection parameters: + </para> + <itemizedlist> + <listitem> + <para> + <parameter>driverClassName</parameter>= , e.g. "org.hsqldb.jdbcDriver", "com.mysql.jdbc.Driver"= , "org.postgresql.Driver" + </para> + </listitem> + <listitem> + <para> + <parameter>url</parameter>, e.g. "= ;jdbc:hsqldb:file:target/temp/data/portal", "jdbc:mysql://exoua.d= nsalias.net/jcr" + </para> + </listitem> + <listitem> + <para> + <parameter>username</parameter>, e.g. = "sa", "exoadmin" + </para> + </listitem> + <listitem> + <para> + <parameter>password</parameter>, e.g. = "", "exo12321" + </para> + </listitem> + </itemizedlist> + </step> + </substeps> + <para> + There can be connection pool configuration parameters = (org.apache.commons.dbcp.BasicDataSourceFactory): + </para> + <itemizedlist> + <listitem> + <para> + <parameter>maxActive</parameter>, e.g. 50 + </para> + </listitem> + <listitem> + <para> + <parameter>maxIdle</parameter>, e.g. 5 + </para> + </listitem> + <listitem> + <para> + <parameter>initialSize</parameter>, e.g. 5 + </para> + </listitem> + <listitem> + <para> + and other according to <ulink url=3D"http://ja= karta.apache.org/commons/dbcp/configuration.html">Apache DBCP configuration= </ulink> + </para> + </listitem> + </itemizedlist> + </step> + <step> + <para> + Configure the repository service. Each workspace will = be configured for its own data container. + </para> + <para> + For example (two workspaces <parameter>ws</parameter> = - jdbcjcr, <parameter>ws1</parameter> - jdbcjcr1): + </para> + <programlisting language=3D"XML" role=3D"XML"> +<xi:include xmlns:xi=3D"http://www.w3.org/2001/XInclude" href=3D"../../../= ../extras/Advanced_Development_JCR_Configuration/example-2.xml" parse=3D"te= xt"/></programlisting> + <itemizedlist> + <listitem> + <para> + <parameter>source-name</parameter>: A javax.sq= l.DataSource name configured in InitialContextInitializer component (was <p= arameter>sourceName</parameter> prior JCR 1.9); + </para> + </listitem> + <listitem> + <para> + <parameter>dialect</parameter>: A database dia= lect, one of <literal>hsqldb</literal>, <literal>mysql</literal>, <literal>= mysql-utf8</literal>, <literal>pgsql</literal>, <literal>oracle</literal>, = <literal>oracle-oci</literal>, <literal>mssql</literal>, <literal>sybase</l= iteral>, <literal>derby</literal>, <literal>db2</literal>, <literal>db2v8</= literal> or <literal>auto</literal> for dialect autodetection; + </para> + </listitem> + <listitem> + <para> + <parameter>multi-db</parameter>: Enable multi-= database container with this parameter (set value "true"); + </para> + </listitem> + <listitem> + <para> + <parameter>max-buffer-size: A</parameter> a th= reshold (in bytes) after which a <literal>javax.jcr.Value</literal> content= will be swapped to a file in a temporary storage. A swap for pending chang= es, for example. + </para> + </listitem> + <listitem> + <para> + <parameter>swap-directory</parameter>: A path = in the file system used to swap the pending changes. + </para> + </listitem> + </itemizedlist> + </step> + </procedure> + <para> + This procedure configures two workspace which will be persiste= nt in two different databases (<emphasis>ws</emphasis> in HSQLDB and <empha= sis>ws1</emphasis> in MySQL). + </para> + </section> + <section id=3D"sect-Reference_Guide-JDBC_Data_Container_Config-Single_= database_Configuration"> + <title>Single-database Configuration + + Configuring a single-database data container is easier than co= nfiguring a multi-database data container as only one naming resource must = be configured. + + + <parameter>jdbcjcr</parameter> Data Container + + + + + Configure repository workspaces with this one database. The multi-db parameter must be set as false. + + + For example (two workspaces ws - jdbcjcr, ws1 - jdbcjcr): + + + Example + + + + + This configures two persistent workspaces in one database (Pos= tgreSQL). + +
+ Configuration without DataSource + + It is possible to configure the repository without binding= javax.sql.DataSource in the JNDI service if you have a = dedicated JDBC driver implementation with special features like XA transact= ions, statements/connections pooling etc: + + + + <step> + <para> + Remove the configuration in <literal>InitialContex= tInitializer</literal> for your database and configure a new one directly i= n the workspace container. + </para> + </step> + <step> + <para> + Remove parameter <parameter>source-name</parameter= > and add next lines instead. Describe your values for a JDBC driver, datab= ase URL and username. + </para> + </step> + </procedure> + <warning> + <title>Connection Pooling + + Ensure the JDBC driver provides connection pooling. Co= nnection pooling is strongly recommended for use with the JCR to prevent a = database overload. + + + <workspace name= =3D"ws" auto-init-root-nodetype=3D"nt:unstructured"> + <container class=3D"org.exoplatform.services.jcr.impl.storage.jd= bc.JDBCWorkspaceDataContainer"> + <properties> + <property name=3D"dialect" value=3D"hsqldb"/&= gt; + <property name=3D"driverliteral" value=3D"org.hsql= db.jdbcDriver"/> + <property name=3D"url" value=3D"jdbc:hsqldb:file:t= arget/temp/data/portal"/> + <property name=3D"username" value=3D"su"/> + <property name=3D"password" value=3D""/> = + ...... +
+
+ Dynamic Workspace Creation + + Workspaces can be added dynamically during runtime. + + + This can be performed in two steps: + + + + <step> + <para> + <literal>ManageableRepository.configWorkspace(Work= spaceEntry wsConfig)</literal>: Register a new configuration in RepositoryC= ontainer and create a WorkspaceContainer. + </para> + </step> + <step> + <para> + <literal>ManageableRepository.createWorkspace(Stri= ng workspaceName)</literal>: Creation a new workspace. + </para> + </step> + </procedure> + </section> + </section> + <section id=3D"sect-Reference_Guide-JDBC_Data_Container_Config-Simple_= and_Complex_queries"> + <title>Simple and Complex queries + + eXo JCR provides two ways to interact with the database; + + + + <varlistentry> + <term> + <literal>JDBCStorageConnection</literal> + </term> + <listitem> + <para> + Which uses simple queries. Simple queries do not u= se sub queries, left or right joins. They are implemented in such a way as = to support as many database dialects as possible. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term> + <literal>CQJDBCStorageConection</literal> + </term> + <listitem> + <para> + Which uses complex queries. Complex queries are op= timized to reduce the number of database calls. + </para> + </listitem> + </varlistentry> + </variablelist> + <para> + Simple queries will be used if you chose <literal>org.exoplatf= orm.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer</literal>: + </para> + <programlisting language=3D"XML" role=3D"XML"><workspaces> + <workspace name=3D"ws" auto-init-root-nodetype=3D"nt:u= nstructured"> + <container class=3D"org.exoplatform.services.jcr.impl.storage.= jdbc.JDBCWorkspaceDataContainer"> + ... + </workspace> +</worksapces> +</programlisting> + <para> + Complex queries will be used if you chose <literal>org.exoplat= form.services.jcr.impl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContain= er</literal>: + </para> + <programlisting language=3D"XML" role=3D"XML"><workspaces> + <workspace name=3D"ws" auto-init-root-nodetype=3D"nt:u= nstructured"> + <container class=3D"org.exoplatform.services.jcr.impl.storage.= jdbc.optimisation.CQJDBCWorkspaceDataContainer"> + ... + </workspace> +</worksapces></programlisting> + </section> + <section id=3D"sect-Reference_Guide-JDBC_Data_Container_Config-Force_Q= uery_Hints"> + <title>Force Query Hints + + Some databases, such as Oracle and MySQL, support hints to inc= rease query performance. The eXo JCR has separate Complex Query implementat= ions for the Orcale database dialect, which uses query hints to increase pe= rformance for few important queries. + + + To enable this option, use the following configuration propert= y: + + <workspace name=3D&= quot;ws" auto-init-root-nodetype=3D"nt:unstructured"> + <container class=3D"org.exoplatform.services.jcr.impl.storage.jd= bc.JDBCWorkspaceDataContainer"> + <properties> + <property name=3D"dialect" value=3D"oracle"/&= gt; + <property name=3D"force.query.hints" value=3D"true= " /> + ...... + + Query hints are only used for Complex Queries with the Oracle = dialect. For all other dialects this parameter is ignored. + +
+
+ Notes for Microsoft Windows users + + The current configuration of eXo JCR uses Apache DBCP connection pool (or= g.apache.commons.dbcp.BasicDataSourceFactory). + + + It is possible to set a high value for the maxActiv= e parameter in the configuration.xml file.= This creates a high use of TCP/IP ports from a client machine inside the p= ool (the JDBC driver, for example). As a result, the data container can thr= ow exceptions like "Address already in use". + + + To solve this problem, you must configure the client's ma= chine networking software to use shorter timeouts for open TCP/IP ports. + + + This is done by editing two registry keys within the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters node. Both of these keys are unset by default. To set the keys as = required: + + + + <step> + <para> + Set the <parameter>MaxUserPort</parameter> registry ke= y to <parameter>=3Ddword:00001b58</parameter>. This sets the maximum of ope= n ports to 7000 or higher (the default is 5000). + </para> + </step> + <step> + <para> + Set <parameter>TcpTimedWaitDelay</parameter> to <param= eter>=3Ddword:0000001e</parameter>. This sets <parameter>TIME_WAIT</paramet= er> parameter to 30 seconds (the default is 240). + </para> + </step> + </procedure> + <example id=3D"exam-Reference_Guide-Notes_for_Microsoft_Windows_user= s-Sample_Registry_File"> + <title>Sample Registry File + Windows Registry Editor Version 5.00 + +[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters] +"MaxUserPort"=3Ddword:00001b58 +"TcpTimedWaitDelay"=3Ddword:0000001e + +
+ + + External Value Storages +
+ Introduction + + JCR values are stored in the Workspace Data container by defau= lt. The eXo JCR offers an additional option of storing JCR values separatel= y from the Workspace Data container which can help keep Binary Large Object= s (BLOBs) separate. + + + Tree-based storage is recommended in most cases. + +
+
+ Tree File Value Storage + + Tree File Value Storage holds values in tree-like file system = files. Path property points to the root directory to s= tore the files. + + + This is a recommended type of external storage because it can = contain large amount of files limited only by disk/volume free space. + + + However, using Tree File Value Storage can result in a higher = time on value deletion, due to the removal of unused tree-nodes. + + + Tree File Value Storage Configuration + + + Comment #1: The id is the value storage unique identifier, used for linking with propertie= s stored in a workspace container. + + + Comment #2: the path is a location where value files will be stored. + + + + Each file value storage can have the filters for incoming values. A filter can match values by property-ty= pe, property-name, ancestor-path<= /property>. It can also match the size of values stored (min-valu= e-size) in bytes. + + + In the previous example a filter with property-type<= /property> and min-value-size has been used. This resu= lts in storage for binary values with size greater of 1MB. + + + It is recommended that properties with large values are stored= in file value storage only. + + + The example below shows a value storage with different locatio= ns for large files (min-value-size a 20Mb-sized filter= ). + + + A value storage uses ORed logic in the process of filter selec= tion. This means the first filter in the list will be called first and if i= t is not matched the next will be called, and so on. + + + In this example a value matches the 20MB filter min-= value-size and will be stored in the path "data/20= Mvalues". All other filters will be stored in "data/values". + + +
+
+ Disabling value storage + + The JCR allows you to disable value storage by adding the foll= owing property into its configuration. + + <property name=3D"enabled&q= uot; value=3D"false" /> + + Warning + + It is recommended that this functionality be used for inte= rnal and testing purpose only, and with caution, as all stored values will = be inaccessible. + + +
+
+ + Workspace Data Container + + Each Workspace of the JCR has its own persistent storage to hold t= hat workspace's items data. The eXo JCR can be configured so that it c= an use one or more workspaces that are logical units of the repository cont= ent. + + + The physical data storage mechanism is configured using mandatory = element container. The type of container= is described in the attribute class =3D fully_qual= ified_name_of_org.exoplatform.services.jcr.storage.WorkspaceDataContainer_s= ubclass. + + + Physical Data Storage Configuration + <container class=3D= "org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataConta= iner"> + <properties> + <property name=3D"source-name" value=3D"jdbcjcr1&quo= t;/> + <property name=3D"dialect" value=3D"hsqldb"/> + <property name=3D"multi-db" value=3D"true"/> + <property name=3D"max-buffer-size" value=3D"200K&quo= t;/> + <property name=3D"swap-directory" value=3D"target/te= mp/swap/ws"/> + <property name=3D"lazy-node-iterator-page-size" value=3D&= quot;50"/> + <property name=3D"acl-bloomfilter-false-positive-probability&q= uot; value=3D"0.1d"/> + <property name=3D"acl-bloomfilter-elements-number" value= =3D"1000000"/> + </properties> + + source-name: The JDBC data source n= ame which is registered in JDNI by InitialContextInitializer. This was know= n as sourceName in versions prior to 1.9. + + + dialect: The database dialect. Must= be one of the following: hsqldb, mysql, mysql-utf8, pgsql, oracl= e, oracle-oci, mssql, sybase, derby, db2 or <= literal>db2v8). + + + multi-db: This parameter, if true, enables multi-database container. + + + max-buffer-size: A threshold in byt= es. If a value size is greater than this setting, then it will be spooled t= o a temporary file. + + + swap-directory: A location where th= e value will be spooled if no value storage is configured but a ma= x-buffer-size is exceeded. + + + lazy-node-iterator-page-size: "= ;Lazy" child nodes iterator settings. Defines size of page, the number= of nodes that are retrieved from persistent storage at once. + + + acl-bloomfilter-false-positive-probability: ACL Bloom-filter settings. ACL Bloom-filter desired false positive= probability. Range [0..1]. Default value 0.1d. + + + acl-bloomfilter-elements-number: AC= L Bloom-filter settings. Expected number of ACL-elements in the Bloom-filte= r. Default value 1000000. + + + + + Bloom filters are not supported by all the cache implementatio= ns so far only the inplementation for infinispan supports it. + + + Bloom-filter used to avoid read nodes that definitely do not h= ave ACL. acl-bloomfilter-false-positive-probability= and acl-bloomfilter-elements-number used to configure such filters. Bloom filters are not supported by = all the cache implementations so far only the inplementation for infinispan= supports it. + + + More about Bloom filters you can read here http://en.wikipedia.org/wiki/Bloom_fi= lter. + + + + The eXo JCR has a JDBC-based, relational database, production read= y Workspace Data Container. + + + Workspace Data Container may support external= storages for javax.jcr.Value (which can be the case for= BLOB values for example) using the optional element value-storage= s. + + + The Data Container will try to read or write a Value using the und= erlying value storage plug-in if the filter criteria (see below) match the = current property. + + + External Value Storage Configuration + <value-storages> + <value-storage id=3D"Storage #1" class=3D"org.exoplatf= orm.services.jcr.impl.storage.value.fs.TreeFileValueStorage"> + <properties> + <property name=3D"path" value=3D"data/values"= /> + </properties> + <filters> + <filter property-type=3D"Binary" min-value-size=3D"= 1M"/><!-- Values large of 1Mbyte --> + </filters> +......... +</value-storages> + + value-storage is the subclass of org.exoplatform.services.jcr.storage.value.ValueStoragePlugin and properties are optional plug-in specific paramet= ers. + + + filters: Each file value storage ca= n have the filter(s) for incoming values. If there are several filter crite= ria, they all have to match (AND-Condition). + + + + A filter can match values by property type (property-t= ype), property name (property-name), ancestor path (ancestor-path) and/or t= he size of values stored (min-value-size, e.g. 1M, 4.2G, 100 (bytes)). + + + In a code sample, we use a filter with property-type a= nd min-value-size only. That means that the storage is only for binary valu= es whose size is greater than 1Mbyte. + + + It is recommended that you store properties with large= values in a file value storage only. + + + + Configuring Cluster +
+ Launching Cluster +
+ Configuring JCR to use external configuration + + + + To manually configure a repository, create a new c= onfiguration file (exo-jcr-configuration.xml for examp= le). For details, see . + + + The configuration file must be formatted as follow= s: + + + External Configuration + <repository= -service default-repository=3D"repository1"> + <repositories> + <repository name=3D"repository1" system-workspace=3D&qu= ot;ws1" default-workspace=3D"ws1"> + <security-domain>exo-domain</security-domain> + <access-control>optional</access-control> + <authentication-policy>org.exoplatform.services.jcr.impl.co= re.access.JAASAuthenticator</authentication-policy> + <workspaces> + <workspace name=3D"ws1"> + <container class=3D"org.exoplatform.services.jcr.im= pl.storage.jdbc.optimisation.CQJDBCWorkspaceDataContainer"> + <properties> + <property name=3D"source-name" value=3D&= quot;jdbcjcr" /> + <property name=3D"dialect" value=3D"= ;oracle" /> + <property name=3D"multi-db" value=3D&quo= t;false" /> + <property name=3D"update-storage" value= =3D"false" /> + <property name=3D"max-buffer-size" value= =3D"200k" /> + <property name=3D"swap-directory" value= =3D"../temp/swap/production" /> + </properties> + <value-storages> + ]]> + </value-storages> + </container> + <initializer class=3D"org.exoplatform.services.jcr.= impl.core.ScratchWorkspaceInitializer"> + <properties> + <property name=3D"root-nodetype" value= =3D"nt:unstructured" /> + </properties> + </initializer> + <cache enabled=3D"true" class=3D"org.exop= latform.services.jcr.impl.dataflow.persistent.jbosscache.JBossCacheWorkspac= eStorageCache"> + ]]> = + </cache> + <query-handler class=3D"org.exoplatform.services.jc= r.impl.core.query.lucene.SearchIndex"> + ]]> + </query-handler> + <lock-manager class=3D"org.exoplatform.services.jcr= .impl.core.lock.jbosscache.CacheableLockManagerImpl"> + ]]> = + </lock-manager> + </workspace> + <workspace name=3D"ws2"> + ... + </workspace> + <workspace name=3D"wsN"> + ... + </workspace> + </workspaces> + </repository> + </repositories> +</repository-service> + + Comment #1: Refer to . + + + Comment #3: Refer to . + + + Comment #4: Refer to . + + + + + + Then, update RepositoryServiceConfigura= tion configuration in the exo-configuration.xml to reference your file: + + <component> + <key>org.exoplatform.services.jcr.config.RepositoryServiceConfigu= ration</key> + <type>org.exoplatform.services.jcr.impl.config.RepositoryServiceC= onfigurationImpl</type> + <init-params> + <value-param> + <name>conf-path</name> + <description>JCR configuration file</description> + <value>exo-jcr-configuration.xml</value> + </value-param> + </init-params> +</component> + + +
+
+
+ Requirements +
+ Environment requirements + + + + Every node of the cluster = must have the same mounted Network File System (NFS) with the read and write permissions on it. + + + + + Every node of cluster must= use the same database. + + + + + The same Clusters on different nodes must have the same names. + + + Example + + If the Indexer cluster in= the production workspace on the first node is named <= literal>production_indexer_cluster, then indexer clusters in the production workspace on all other= nodes must also be named produ= ction_indexer_cluster. + + + + +
+
+ Configuration requirements + + The configuration of every workspace in the repository mus= t contain the following elements: + + + Value Storage configuration + <value-storages= > + <value-storage id=3D"system" class=3D"org.exoplatform= .services.jcr.impl.storage.value.fs.TreeFileValueStorage"> + <properties> + <property name=3D"path" value=3D"/mnt/tornado/t= emp/values/production" /> <!--path within NFS where ValueStor= age will hold it's data--> + </properties> + <filters> + <filter property-type=3D"Binary" /> + </filters> + </value-storage> +</value-storages> + + + Cache configuration + <cache enabled= =3D"true" class=3D"org.exoplatform.services.jcr.impl.dataflo= w.persistent.jbosscache.JBossCacheWorkspaceStorageCache"> + <properties> + <property name=3D"jbosscache-configuration" value=3D&qu= ot;jar:/conf/portal/test-jbosscache-data.xml" /> <!-- pat= h to JBoss Cache configuration for data storage --> + <property name=3D"jgroups-configuration" value=3D"= jar:/conf/portal/udp-mux.xml" /> <!-- pat= h to JGroups configuration --> + <property name=3D"jbosscache-cluster-name" value=3D&quo= t;JCR_Cluster_cache_production" /> <!-- JBo= ss Cache data storage cluster name --> + <property name=3D"jgroups-multiplexer-stack" value=3D&q= uot;true" /> + </properties> +</cache> + + + Indexer configuration + <query-handler = class=3D"org.exoplatform.services.jcr.impl.core.query.lucene.SearchInd= ex"> + <properties> + <property name=3D"changesfilter-class" value=3D"or= g.exoplatform.services.jcr.impl.core.query.jbosscache.JBossCacheIndexChange= sFilter" /> + <property name=3D"index-dir" value=3D"/mnt/tornado= /temp/jcrlucenedb/production" /> <!-- p= ath within NFS where ValueStorage will hold it's data --> + <property name=3D"jbosscache-configuration" value=3D&qu= ot;jar:/conf/portal/test-jbosscache-indexer.xml" /> <!-- p= ath to JBoss Cache configuration for indexer --> + <property name=3D"jgroups-configuration" value=3D"= jar:/conf/portal/udp-mux.xml" /> <!-- p= ath to JGroups configuration --> + <property name=3D"jbosscache-cluster-name" value=3D&quo= t;JCR_Cluster_indexer_production" /> <!-- J= Boss Cache indexer cluster name --> + <property name=3D"jgroups-multiplexer-stack" value=3D&q= uot;true" /> + </properties> +</query-handler> + + + Lock Manager configuration + <lock-manager c= lass=3D"org.exoplatform.services.jcr.impl.core.lock.jbosscache.Cacheab= leLockManagerImpl"> + <properties> + <property name=3D"time-out" value=3D"15m" /&g= t; + <property name=3D"jbosscache-configuration" value=3D&qu= ot;jar:/conf/portal/test-jbosscache-lock.xml" /> <!-- p= ath to JBoss Cache configuration for lock manager --> + <property name=3D"jgroups-configuration" value=3D"= jar:/conf/portal/udp-mux.xml" /> <!-- p= ath to JGroups configuration --> + <property name=3D"jgroups-multiplexer-stack" value=3D&q= uot;true" /> + <property name=3D"jbosscache-cluster-name" value=3D&quo= t;JCR_Cluster_lock_production" /> <!-- J= Boss Cache locks cluster name --> + = + <property name=3D"jbosscache-cl-cache.jdbc.table.name" = value=3D"jcrlocks_production"/> <!-- t= he name of the DB table where lock's data will be stored --> + <property name=3D"jbosscache-cl-cache.jdbc.table.create"= ; value=3D"true"/> + <property name=3D"jbosscache-cl-cache.jdbc.table.drop" = value=3D"false"/> + <property name=3D"jbosscache-cl-cache.jdbc.table.primarykey&= quot; value=3D"jcrlocks_production_pk"/> + <property name=3D"jbosscache-cl-cache.jdbc.fqn.column" = value=3D"fqn"/> + <property name=3D"jbosscache-cl-cache.jdbc.node.column"= value=3D"node"/> + <property name=3D"jbosscache-cl-cache.jdbc.parent.column&quo= t; value=3D"parent"/> + <property name=3D"jbosscache-cl-cache.jdbc.datasource" = value=3D"jdbcjcr"/> + </properties> +</lock-manager> + +
+
+
+ + Configuring JBoss Cache +
+ Indexer, lock manager and data container configuration + + Each mentioned component uses instances of the JBoss Cache pro= duct for caching in clustered environment. So every element has its own tra= nsport and has to be configured correctly. As usual, workspaces have simila= r configuration differing only in cluster-names (and, possibly, some other = parameters). The simplest way to configure them is to define their own conf= iguration files for each component in each workspace: + + <property name=3D&q= uot;jbosscache-configuration" value=3D"conf/standalone + /test-jbosscache-lock-db1-ws1.xml" /> + + But if there are few workspaces, configuring them in such a wa= y can be painful and hard-manageable. eXo JCR offers a template-based confi= guration for JBoss Cache instances. You can have one template for Lock Mana= ger, one for Indexer and one for data container and use them in all the wor= kspaces, defining the map of substitution parameters in a main configuratio= n file. Just simply define ${jbosscache-<parameter name>} inside xml-= template and list correct value in JCR configuration file just below "= jbosscache-configuration", as shown: + + + Template: + + ... +<clustering mode=3D"replication" clusterName=3D"${jbossc= ache-cluster-name}"> + <stateRetrieval timeout=3D"20000" fetchInMemoryState=3D&quo= t;false" /> +... + + and JCR configuration file: + + ... +<property name=3D"jbosscache-configuration" value=3D"jar= :/conf/portal/jbosscache-lock.xml" /> +<property name=3D"jbosscache-cluster-name" value=3D"JCR-= cluster-locks-db1-ws" /> +... +
+
+ JGroups configuration + + JGroups is used by JBoss Cache for network communications and = transport in a clustered environment. If the property is defined in compone= nt configuration, it will be injected into the JBoss Cache instance on star= t up. + + <property name=3D&q= uot;jgroups-configuration" value=3D"your/path/to/modified-udp.xml= " /> + + As outlined above, each component (lock manager, data containe= r and query handler) for each workspace requires its own clustered environm= ent. In other words, they have their own clusters with unique names. + + + Each cluster should, by default, perform multi-casts on a sepa= rate port. This configuration leads to much unnecessary overhead on cluster= . This is why JGroups offers a multiplexer feature, providing ability to us= e one single channel for set of clusters. + + + The multiplexer reduces network overheads and increase perform= ance and stability of application. To enable multiplexer stack, you should = define appropriate configuration file (upd-mux.xml is = pre-shipped one with eXo JCR) and set "jgroups-multiplexer-stack"= into "true". + + <property name=3D&q= uot;jgroups-configuration" value=3D"jar:/conf/portal/udp-mux.xml&= quot; /> +<property name=3D"jgroups-multiplexer-stack" value=3D"tr= ue" /> +
+
+ Sharing JBoss Cache instances + + As a single JBoss Cache instance can be demanding on resources= , and the default setup will have an instance each for the indexer, the loc= k manager and the data container on each workspace, an environment that use= s multiple workspace may benefit from sharing a JBoss Cache instance betwee= n several instances of the same type (the lock manager instance, for exampl= e). + + + This feature is disabled by default and can be enabled at the = component configuration level by setting the jbosscache-shareabl= e property to true: + + <property name=3D&q= uot;jbosscache-shareable" value=3D"true" /> + + Once enabled, this feature will allow the JBoss Cache instance= used by a component to be re-used by another components of the same type w= ith the same JBoss Cache configuration (with the exception of the eviction = configuration, which can differ). + + + This means that all the parameters of type jbosscac= he-<PARAM_NAME> must be identi= cal between the components of same type of different workspaces. + + + Therefore, if you can use the same values for the parameters i= n each workspace, you only need three JBoss Cache instances (one instance e= ach for the indexer, lock manager and data container) running at once. This= can relieve resource stress significantly. + +
+
+ Shipped JBoss Cache configuration templates + + The eXo JCR implementation is shipped with ready-to-use JBoss = Cache configuration templates for JCR's components. They are located i= n JPP_HOME/gatein/gatein.ear/portal.wa= r/WEB-INF/conf/jcr/jbosscache directory, inside either the cluster or local directory. + +
+ Data container template + + The data container template is config.xml: + + <?xml version=3D&= quot;1.0" encoding=3D"UTF-8"?> +<jbosscache xmlns:xsi=3D"http://www.w3.org/2001/XMLSchema-instance= " xmlns=3D"urn:jboss:jbosscache-core:config:3.1"> + + <locking useLockStriping=3D"false" concurrencyLevel=3D&quo= t;50000" lockParentForChildInsertRemove=3D"false" + lockAcquisitionTimeout=3D"20000" /> + + <clustering mode=3D"replication" clusterName=3D"${jbo= sscache-cluster-name}"> + <stateRetrieval timeout=3D"20000" fetchInMemoryState=3D= "false" /> + <jgroupsConfig multiplexerStack=3D"jcr.stack" /> + <sync /> + </clustering> + + <!-- Eviction configuration --> + <eviction wakeUpInterval=3D"5000"> + <default algorithmClass=3D"org.jboss.cache.eviction.LRUAlgor= ithm" + actionPolicyClass=3D"org.exoplatform.services.jcr.impl.dataf= low.persistent.jbosscache.ParentNodeEvictionActionPolicy" + eventQueueSize=3D"1000000"> + <property name=3D"maxNodes" value=3D"1000000&qu= ot; /> + <property name=3D"timeToLive" value=3D"120000&q= uot; /> + </default> + </eviction> +</jbosscache> +
+
+ Lock manager template + + The lock manager template is lock-config.xml: + + +
+
+ Query handler (indexer) template + + The query handler template is called indexer-con= fig.xml: + + +
+
+
+ + LockManager + + The LockManager stores lock objects. It can lock or release object= s as required. It is also responsible for removing stale locks. + + + The LockManager in JBoss Portal Platform is implemented with org.exoplatform.services.jcr.impl.core.lock.jbosscache.CacheableLockM= anagerImpl. + + + It is enabled by adding lock-manager-configuration to workspace-configuration. + + + For example: + + +
+ CacheableLockManagerImpl + + CacheableLockManagerImpl stores lock ob= jects in JBoss-cache (which implements JDBCCacheLoader to store locks in a = database). This means its locks are replicable and can affect an entire clu= ster rather than just a single node. + + + The length of time LockManager allows a lock to remain in plac= e can be configured with the "time-out" proper= ty. + + + The LockRemover thread periodically polls LockManager for lock= s that have passed the time-out limit and must be removed. + + + The time-out for LockRemover is set as follows (the default va= lue is 30m): + + + + There are a number of ways to configure CacheableLo= ckManagerImpl. Each involves configuring JBoss Cache and JDBCCa= cheLoader. + + + + + + + + + + + + + + + Refer to http://community.jboss.org/wiki/JBossCacheJDBCCacheLoad= er for more information about JBoss Cache and JDBCCacheLoader. + +
+ Simple JBoss Cache Configuration + + One method to configure the LockManager is to put a JBoss = Cache configuration file path into CacheableLockManagerImpl. + + + + This is not the most efficient method for configuring = the LockManager as it requires a JBoss Cache configuration file for each Lo= ckManager configuration in each workspace of each repository. The configura= tion set up can subsequently become quite difficult to manage. + + + This method is useful, however, if a single, specially= configured LockManager is required. + + + + The required configuration is shown in the example below: + + + + Sample content of the jbosscache-= lock-config.xml file specified in the jbosscache= -configuration property is shown in the code example below. + + + Sample Content of the jbosscache-lock-config.xml File</ti= tle> + <programlisting language=3D"XML" role=3D"XML"><xi:include xmlns:= xi=3D"http://www.w3.org/2001/XInclude" href=3D"../../../extras/Advanced_Dev= elopment_JCR_lock-manager-config/default50.xml" parse=3D"text"/></programli= sting> + <para> + Comment #1: The cluster name at <parameter>clu= stering mode=3D"replication" clusterName=3D"JBoss-Cache-Lock= -Cluster_Name"</parameter> must be unique; + </para> + <para> + Comment #2: The <parameter>cache.jdbc.table.na= me</parameter> must be unique per datasource. + </para> + <para> + Comment #3: The <parameter>cache.jdbc.node.typ= e</parameter> and <parameter>cache.jdbc.fqn.type</parameter> parameters mus= t be configured according to the database in use. Refer to the table below = for information about data types. + </para> + </example> + <table id=3D"tabl-Reference_Guide-Simple_JBoss_Cache_Configuration= -Data_Types_in_Different_Databases"> + <title>Data Types in Different Databases + + + + DataBase name + Node data type + FQN data type + + + + + default + BLOB + VARCHAR(512) + + + HSSQL + OBJECT + VARCHAR(512) + + + MySQL + LONGBLOB + VARCHAR(512) + + + ORACLE + BLOB + VARCHAR2(512) + + + PostgreSQL + bytea + VARCHAR(512) + + + MSSQL + VARBINARY(MAX) + VARCHAR(512) + + + DB2 + BLOB + VARCHAR(512) + + + Sybase + IMAGE + VARCHAR(512) + + + + +
+
+ Template JBoss Cache Configuration + + Another method to configure LockManager is to use a JBoss = Cache configuration template for all LockManagers. + + + Below is an example test-jbosscache-lock.xml template file: + + + + The parameters that will populate the above file are shown= below: + + + JBoss Cache Configuration Parameters + + + Comment #1: The jgroups-configuration= has been moved to a separate configuration file (udp-m= ux.xml, shown below). In this case the udp-mux.xml is a common configuration for all JGroup components (QueryHandler, = cache, LockManager), but this is not a requirement of the configuration met= hod. + + + Comment #2: The jbosscache-cl-cache= .jdbc.fqn.column and jbosscache-cl-cache.jdbc.node.t= ype parameters are not explicitly defined as cache.j= dbc.fqn.type and cache.jdbc.node.type ar= e defined in the JBoss Cache configuration. + + + + Refer to for = information about setting these parameters or set them as AUTO and the data type will by detected automatically. + + + udp-mux.xml: + + +
+
+ Lock Migration + + There are three options available: + + + Lock Migration Options + + When new Shareable Cache feature is not going to be used= and all locks should be kept after migration. + + + + <step> + <para> + Ensure that the same lock tables are u= sed in configuration + </para> + </step> + <step> + <para> + Start the server + </para> + </step> + </procedure> + </listitem> + </varlistentry> + <varlistentry> + <term>When new Shareable Cache feature is not going to be used= and all locks should be removed after migration.</term> + <listitem> + <procedure> + <title/> + <step> + <para> + Ensure that the same lock tables used = in configuration + </para> + </step> + <step> + <para> + Start the sever WITH system property: + </para> + <programlisting>-Dorg.exoplatform.jcr.locks.force.remove= =3Dtrue +</programlisting> + </step> + <step> + <para> + Stop the server + </para> + </step> + <step> + <para> + Start the server WITHOUT system proper= ty: + </para> + <programlisting>-Dorg.exoplatform.jcr.locks.force.remove +</programlisting> + </step> + </procedure> + </listitem> + </varlistentry> + <varlistentry> + <term>When new Shareable Cache feature will be used (in this c= ase all locks are removed after migration).</term> + <listitem> + <procedure> + <title/> + <step> + <para> + Start the sever WITH system property: + </para> + <programlisting>-Dorg.exoplatform.jcr.locks.force.remove= =3Dtrue +</programlisting> + </step> + <step> + <para> + Stop the server. + </para> + </step> + <step> + <para> + Start the server WITHOUT system proper= ty: + </para> + <programlisting>-Dorg.exoplatform.jcr.locks.force.remove +</programlisting> + </step> + <step> + <title>Optional: + + Manually remove old tables for lock. + + + + + + +
+
+
+ + Configuring QueryHandler +
+ Indexing in clustered environment + + JCR offers indexing strategies for clustered environments usin= g the advantages of running in a single JVM or doing the best to use all re= sources available in cluster. JCR uses Lucene library as underlying search = and indexing engine, but it has several limitations that greatly reduce pos= sibilities and limits the usage of cluster advantages. That's why eXo = JCR offers two strategies that are suitable for it's own usecases. The= y are clustered with shared index and clustered with local indexes. Each on= e has it's pros and cons. + + + Clustered implementation with local indexes combines in-memory= buffer index directory with delayed file-system flushing. This index is ca= lled "Volatile" and it is invoked in searches also. Within some c= onditions volatile index is flushed to the persistent storage (file system)= as new index directory. This allows to achieve great results for write ope= rations. + +
+ Local Index Diagram + + + + + +
+ + As this implementation designed for clustered environment it h= as additional mechanisms for data delivery within cluster. Actual text extr= action jobs done on the same node that does content operations (i.e. write = operation). Prepared "documents" (Lucene term that means block of= data ready for indexing) are replicated withing cluster nodes and processe= d by local indexes. So each cluster instance has the same index content. Wh= en new node joins the cluster it has no initial index, so it must be create= d. There are some supported ways of doing this operation. The simplest is t= o simply copy the index manually but this is not intended for use. If no in= itial index found JCR uses automated scenarios. They are controlled via con= figuration (see "index-recovery-mode" parameter) offering full re= -indexing from database or copying from another cluster node. + + + For some reasons having a multiple index copies on each instan= ce can be costly. So shared index can be used instead (see diagram below). + +
+ Shared Index Diagram + + + + + +
+ + This indexing strategy combines advantages of in-memory index = along with shared persistent index offering "near" real time sear= ch capabilities. This means that newly added content is accessible via sear= ch practically immediately. This strategy allows nodes to index data in the= ir own volatile (in-memory) indexes, but persistent indexes are managed by = single "coordinator" node only. Each cluster instance has a read = access for shared index to perform queries combining search results found i= n own in-memory index also. Take in account that shared folder must be conf= igured in your system environment (i.e. mounted NFS folder). But this strat= egy in some extremely rare cases can have a bit different volatile indexes = within cluster instances for a while. In a few seconds they will be up2date. + + + See more about . + +
+
+ Configuration +
+ Query-handler configuration overview + + Configuration example: + + <workspace name= =3D"ws"> + <query-handler class=3D"org.exoplatform.services.jcr.impl.core.= query.lucene.SearchIndex"> + <properties> + <property name=3D"index-dir" value=3D"shareddir= /index/db1/ws" /> + <property name=3D"changesfilter-class" + value=3D"org.exoplatform.services.jcr.impl.core.query.jbo= sscache.JBossCacheIndexChangesFilter" /> + <property name=3D"jbosscache-configuration" value=3D= "jbosscache-indexer.xml" /> + <property name=3D"jgroups-configuration" value=3D&qu= ot;udp-mux.xml" /> + <property name=3D"jgroups-multiplexer-stack" value= =3D"true" /> + <property name=3D"jbosscache-cluster-name" value=3D&= quot;JCR-cluster-indexer-ws" /> + <property name=3D"max-volatile-time" value=3D"6= 0" /> + <property name=3D"rdbms-reindexing" value=3D"tr= ue" /> + <property name=3D"reindexing-page-size" value=3D&quo= t;1000" /> + <property name=3D"index-recovery-mode" value=3D"= ;from-coordinator" /> + <property name=3D"index-recovery-filter" value=3D&qu= ot;org.exoplatform.services.jcr.impl.core.query.lucene.DocNumberRecoveryFil= ter" /> + </properties> + </query-handler> +</workspace> + + + Configuration properties + + + + Property name + Description + + + + + index-dir + path to index + + + changesfilter-class + template of JBoss-cache configuration for all quer= y-handlers in repository + + + jbosscache-configuration + template of JBoss-cache configuration for all quer= y-handlers in repository + + + jgroups-configuration + jgroups-configuration is template configuration fo= r all components (search, cache, locks) [Add link to document describing te= mplate configurations] + + + jgroups-multiplexer-stack + [TODO about jgroups-multiplexer-stack - add link t= o JBoss doc] + + + jbosscache-cluster-name + cluster name (must be unique) + + + max-volatile-time + max time to live for Volatile Index + + + rdbms-reindexing + indicate that need to use rdbms reindexing mechani= sm if possible, the default value is true + + + reindexing-page-size + maximum amount of nodes which can be retrieved fro= m storage for re-indexing purpose, the default value is 100 + + + index-recovery-mode + If the parameter has been set to from-ind= exing, so a full indexing will be automatically launched (default= behavior), if the parameter has been set to from-coordinator, the index will be retrieved from coordinator + + + index-recovery-filter + Defines implementation class or classes of Recover= yFilters, the mechanism of index synchronization for Local Index strategy. = + + + async-reindexing + Controls the process of re-indexing on JCR's = startup. If this flag is set, indexing will be launched asynchronously, wit= hout blocking the JCR. Default is "false". + + + +
+ + Improving Query Performance With <literal>postgreSQL</lit= eral> and <parameter>rdbms-reindexing</parameter> + + If you use postgreSQL and rdbms-reindexing is set to true, the perfo= rmance of the queries used while indexing can be improved by: + + + + + <step> + <para> + Set the parameter "<parameter>enable_seqscan<= /parameter>" to "<literal>off</literal>" + </para> + <para> + <emphasis role=3D"bold">OR</emphasis> + </para> + <para> + Set "<parameter>default_statistics_target</pa= rameter>" to at least "<literal>50</literal>". + </para> + </step> + <step> + <para> + Restart DB server and make analyze of the JCR_SVAL= UE (or JCR_MVALUE) table. + </para> + </step> + </procedure> + <formalpara id=3D"form-Reference_Guide-Query_handler_configuration= _overview-Improving_Query_Performance_With_DB2_and_rdbms_reindexing"> + <title>Improving Query Performance With <literal>DB2</literal> a= nd <parameter>rdbms-reindexing</parameter> + + If you use DB2 and rdbms= -reindexing is set to true, the performance = of the queries used while indexing can be improved by: + + + + + <step> + <para> + Make statistics on tables by running the following= for <literal>JCR_SITEM</literal> (or <literal>JCR_MITEM</literal>) and <li= teral>JCR_SVALUE</literal> (or <literal>JCR_MVALUE</literal>) tables: + </para> + <programlisting><code>RUNSTATS ON TABLE <scheme>.<tab= le> WITH DISTRIBUTION AND INDEXES ALL</code></programlisting> + </step> + </procedure> + </section> + <section id=3D"sect-Reference_Guide-Configuration-Cluster_ready_inde= xing"> + <title>Cluster-ready indexing + + For both cluster-ready implementations JBoss Cache, JGroup= s and Changes Filter values must be defined. Shared index requires some kin= d of remote or shared file system to be attached in a system (i.e. NFS, SMB= or etc). Indexing directory ("indexDir" value) must point to it.= Setting "changesfilter-class" to "org.exoplatform.services.= jcr.impl.core.query.jbosscache.JBossCacheIndexChangesFilter" will enab= le shared index implementation. + + <workspace name= =3D"ws"> + <query-handler class=3D"org.exoplatform.services.jcr.impl.core.= query.lucene.SearchIndex"> + <properties> + <property name=3D"index-dir" value=3D"/mnt/nfs_= drive/index/db1/ws" /> + <property name=3D"changesfilter-class" + value=3D"org.exoplatform.services.jcr.impl.core.query.jbo= sscache.JBossCacheIndexChangesFilter" /> + <property name=3D"jbosscache-configuration" value=3D= "jbosscache-indexer.xml" /> + <property name=3D"jgroups-configuration" value=3D&qu= ot;udp-mux.xml" /> + <property name=3D"jgroups-multiplexer-stack" value= =3D"true" /> + <property name=3D"jbosscache-cluster-name" value=3D&= quot;JCR-cluster-indexer-ws" /> + <property name=3D"max-volatile-time" value=3D"6= 0" /> + <property name=3D"rdbms-reindexing" value=3D"tr= ue" /> + <property name=3D"reindexing-page-size" value=3D&quo= t;1000" /> + <property name=3D"index-recovery-mode" value=3D"= ;from-coordinator" /> + </properties> + </query-handler> +</workspace> + + In order to use cluster-ready strategy based on local inde= xes, when each node has own copy of index on local file system, the followi= ng configuration must be applied. Indexing directory must point to any fold= er on local file system and "changesfilter-class" must be set to = "org.exoplatform.services.jcr.impl.core.query.jbosscache.LocalIndexCha= ngesFilter". + + <workspace name= =3D"ws"> + <query-handler class=3D"org.exoplatform.services.jcr.impl.core.= query.lucene.SearchIndex"> + <properties> + <property name=3D"index-dir" value=3D"/mnt/nfs_= drive/index/db1/ws" /> + <property name=3D"changesfilter-class" + value=3D"org.exoplatform.services.jcr.impl.core.query.jbo= sscache.LocalIndexChangesFilter" /> + <property name=3D"jbosscache-configuration" value=3D= "jbosscache-indexer.xml" /> + <property name=3D"jgroups-configuration" value=3D&qu= ot;udp-mux.xml" /> + <property name=3D"jgroups-multiplexer-stack" value= =3D"true" /> + <property name=3D"jbosscache-cluster-name" value=3D&= quot;JCR-cluster-indexer-ws" /> + <property name=3D"max-volatile-time" value=3D"6= 0" /> + <property name=3D"rdbms-reindexing" value=3D"tr= ue" /> + <property name=3D"reindexing-page-size" value=3D&quo= t;1000" /> + <property name=3D"index-recovery-mode" value=3D"= ;from-coordinator" /> + </properties> + </query-handler> +</workspace> + +
+
+ Local Index Recovery Filters + + A common usecase for all cluster-ready applications is a h= ot joining and leaving of processing units. All nodes that are joining a cl= uster for the first time or nodes joining after some downtime, must be in a= synchronized state. + + + When using shared value storages, databases and indexes, c= luster nodes are synchronized at any given time. But is not the case when a= local index strategy is used. + + + If a new node joins a cluster, without an index it is retr= ieved or recreated. Nodes can be also be restarted and thus the index is no= t empty. By default, even though the existing index is thought to be up to = date, it can be outdated. + + + The JBoss Portal Platform JCR offers a mechanism called RecoveryFilters that will automatically retrieve index for= the joining node on start up. This feature is a set of filters that can be= defined via QueryHandler configuration: + + <property name=3D"index-r= ecovery-filter" value=3D"org.exoplatform.services.jcr.impl.core.q= uery.lucene.DocNumberRecoveryFilter" /> + + Filter numbers are not limited so they can be combined: + + <property name=3D"index-r= ecovery-filter" value=3D"org.exoplatform.services.jcr.impl.core.q= uery.lucene.DocNumberRecoveryFilter" /> + <property name=3D"index-recovery-filter" value=3D"or= g.exoplatform.services.jcr.impl.core.query.lucene.SystemPropertyRecoveryFil= ter" /> + + + If any one returns fires, the index is re-synchronized. Th= is feature uses standard index recovery mode defined by previously describe= d parameter (can be "from-indexing" (default) or "from-coord= inator") + + <property name=3D"index-r= ecovery-mode" value=3D"from-coordinator" /> + + + There are multiple filter implementations: + + <property name=3D"i= ndex-recovery-filter" value=3D"org.exoplatform.services.jcr.impl.= core.query.lucene.ConfigurationPropertyRecoveryFilter" /> + <property name=3D"index-recovery-filter-forcereindexing" = value=3D"true" /> + + + + + org.exoplatform.services.jcr.impl.core.query.lucene.DocN= umberRecoveryFilter + + + Checks the number of documents in index on coo= rdinator side and self-side. It returns true if the coun= t differs. + + + The advantage of this filter compared to other= s, is that it will skip reindexing for workspaces where the index was not m= odified. + + + For example; if there is ten repositories with= three workspaces in each and only one is heavily used in the cluster, this= filter will only reindex those workspaces that have been changed, without = affecting other indexes. + + + This greatly reduces start up time. + + + + +
+
+ JBoss-Cache template configuration + + JBoss-Cache template configuration for query handler is ab= out the same for both clustered strategies. + + + jbosscache-indexer.xml + <?xml version= =3D"1.0" encoding=3D"UTF-8"?> +<jbosscache xmlns:xsi=3D"http://www.w3.org/2001/XMLSchema-instance= " xmlns=3D"urn:jboss:jbosscache-core:config:3.1"> + <locking useLockStriping=3D"false" concurrencyLevel=3D&quo= t;50000" lockParentForChildInsertRemove=3D"false" + lockAcquisitionTimeout=3D"20000" /> + <!-- Configure the TransactionManager --> + <transaction transactionManagerLookupClass=3D"org.jboss.cache.t= ransaction.JBossStandalone + JTAManagerLookup" /> + <clustering mode=3D"replication" clusterName=3D"${jbo= sscache-cluster-name}"> + <stateRetrieval timeout=3D"20000" fetchInMemoryState=3D= "false" /> + <jgroupsConfig multiplexerStack=3D"jcr.stack" /> + <sync /> + </clustering> + <!-- Eviction configuration --> + <eviction wakeUpInterval=3D"5000"> + <default algorithmClass=3D"org.jboss.cache.eviction.FIFOAlgo= rithm" eventQueueSize=3D"1000000"> + <property name=3D"maxNodes" value=3D"10000"= ; /> + <property name=3D"minTimeToLive" value=3D"60000= " /> + </default> + </eviction> +</jbosscache> + + + Read more about template configurations . + +
+
+
+ Asynchronous Re-indexing + + Managing a large data set using a JCR in a production environm= ent at times requires special operations with Indexes, stored on File Syste= m. One of those maintenance operations is a recreation of it. Also called &= quot;re-indexing". There are various usecases when it's important= to do. They include hardware faults, hard restarts, data-corruption, migra= tions and JCR updates that brings new features related to index. Usually in= dex re-creation requested on server's startup or in runtime. + +
+ On startup indexing + + A common usecase for updating and re-creating the index is= to stop the server and manually remove indexes for workspaces requiring it= . When the server is re-started, the missing indexes are automatically reco= vered by re-indexing. + + + The eXo JCR Supports direct RDBMS re-indexing, which can = be faster than ordinary and can be configured via QueryHandler parameter rdbms-reindexing set to tr= ue. + + + A new feature is asynchronous indexing on startup. Usually= startup is blocked until the indexing process is finished. This block can = take any period of time, depending on amount of data persisted in repositor= ies. But this can be resolved by using an asynchronous approaches of startu= p indexation. + + + Essentially, all indexing operations are performed in the = background without blocking the repository. This is controlled by the value= of the async-reindexing parameter in Query= Handler configuration. + + + With asynchronous indexation active, the JCR starts with n= o active indexes present. Queries on JCR still can be executed without exce= ptions, but no results will be returned until index creation completed. + + + The index state check is accomplished via QueryMa= nagerImpl: + + + = +boolean online =3D ((QueryManagerImpl)Worksp= ace.getQueryManager()).getQueryHandeler().isOnline(); + + + + The OFFLINE state means= that the index is currently re-creating. When the state is changed, a corr= esponding log event is printed. When the background index task starts the i= ndex is switched to OFFLINE, with follow= ing log event : + + [INFO] Setting index OFFLINE (repository/productio= n[system]). + + When the indexing process is finished, the following two e= vents are logged : + + [INFO] Created initial index for 143018 nodes (rep= ository/production[system]). +[INFO] Setting index ONLINE (repository/production[system]). + + Those two log lines indicates the end of process for works= pace given in brackets. Calling isOnline() as mentioned above, will also re= turn true. + +
+
+ Hot Asynchronous Workspace Re-indexing using JMX + + Some hard system faults, errors during upgrades, migration= issues and some other factors may corrupt the index. Current versions of = JCR supports Hot Asynchronous Workspace Reindexing<= /emphasis> feature. It allows Service Administrators to launch the process = in background without stopping or blocking the whole application by using a= ny JMX-compatible console. + +
+ JMX Jconsole + + + + + +
+ + The server can continue working as expected while the inde= x is recreated. + + + This depends on the flag allow queries being passed via JMX interface to the reindex operation invocation. If= the flag is set, the application continues working. + + + However, there is one critical limitation users must be aw= are of; the index is frozen while the background task is running<= /emphasis>. + + + This means that queries are performed on a version of the = index present at the moment the indexing task is started, and that data wri= tten into the repository after startup will not be available through the se= arch until process completes. + + + Data added during re-indexation is also indexed, but will = be available only when reindexing is complete. The JCR makes a snapshot of = indexes at the invocation of the asynchronous indexing task and uses that s= napshot for searches. + + + When the operation is finished, the stale index is replace= d by the newly created index, which included any newly added data. + + + If the allow queries flag is set to= false, then all queries will throw an exception while t= ask is running. The current state can be acquired using the following JMX o= peration: + + + + + getHotReindexingState() - returns information abou= t latest invocation: start time, if in progress or finish time if done. + + + +
+
+ Notices + + Hot re-indexing via JMX cannot be launched if the index is= already in offline mode. This means that the index is currently involved i= n some other operations, such as re-indexing at startup, copying in cluster= to another node or whatever. + + + Also; Hot Asynchronous Reindexing via JMX and on startup reindexing are different features. = So you can't get the state of startup reindexing using command g= etHotReindexingState in JMX interface, but there are some common JMX= operations: + + + + + getIOMode - returns current index IO mode (READ_ON= LY / READ_WRITE), belongs to clustered configuration states; + + + + + getState - returns current state: ONLINE / OFFLINE. + + + +
+
+
+ Advanced tuning +
+ Lucene tuning + + As mentioned, JCR Indexing is based on the Lucene indexing= library as the underlying search engine. It uses Directories to store inde= x and manages access to index by Lock Factories. + + + By default, the JCR implementation uses optimal combinatio= n of Directory implementation and Lock Factory implementation. + + + The SimpleFSDirectory is used in Window= s environments and the NIOFSDirectory implementation is = used in non-Windows systems. + + + NativeFSLockFactory is an optimal solut= ion for a wide variety of cases including clustered environment with NFS sh= ared resources. + + + But those defaults can be overridden in the system propert= ies. + + + Two properties: org.exoplatform.jcr.lucene.store.= FSDirectoryLockFactoryClass and org.exoplatform.jcr.luce= ne.FSDirectory.class control (and change) the default behavior. + + + The first defines the implementation of abstract Lucene LockFactory class and the second sets implementation class= for FSDirectory instances. + + + For more information, refer to the Lucene documentation. B= ut be careful, for while the JCR allows users to change implementation clas= ses of Lucene internals, it does not guarantee the stability and functional= ity of those changes. + +
+
+
+ + JBossTransactionsService +
+ Introduction + + JBossTransactionsService implements eXo TransactionService and provides= access to JBoss Transaction S= ervice (JBossTS) JTA implementation via eXo container dependency. + + + TransactionService used in JCR cache org.exoplatform.services= .jcr.impl.dataflow.persistent.jbosscache.JBossCacheWorkspaceStorageCache implementation. + +
+
+ Configuration + + Example configuration: + + <component> + <key>org.exoplatform.services.transaction.TransactionService<= /key> + <type>org.exoplatform.services.transaction.jbosscache.JBossTrans= actionsService</type> + <init-params> + <value-param> + <name>timeout</name> + <value>3000</value> + </value-param> + </init-params> = + </component> + + timeout - XA transaction timeout in seconds + +
+
+ + JCR Query Use-cases +
+ Introduction + + The JCR supports two query languages; JCR and XPath. A query, = whether XPath or SQL, specifies a subset of nodes within a workspace, calle= d the result set. The result set constitutes all the nodes in the workspace= that meet the constraints stated in the query. + +
+
+ Query Lifecycle +
+ Query Creation and Execution + + SQL + // get QueryMana= ger +QueryManager queryManager =3D workspace.getQueryManager();  +// make SQL query +Query query =3D queryManager.createQuery("SELECT * FROM nt:base "= ;, Query.SQL); +// execute query +QueryResult result =3D query.execute(); + + + XPath + // get QueryMana= ger +QueryManager queryManager =3D workspace.getQueryManager(); = +// make XPath query +Query query =3D queryManager.createQuery("//element(*,nt:base)",= Query.XPATH); +// execute query +QueryResult result =3D query.execute(); + +
+
+ Query Result Processing + // fetch query res= ult +QueryResult result =3D query.execute(); + + To fetch the nodes: + + NodeIterator it = =3D result.getNodes(); + + The results can be formatted in a table: + + // get column names +String[] columnNames =3D result.getColumnNames(); +// get column rows +RowIterator rowIterator =3D result.getRows(); +while(rowIterator.hasNext()){ + // get next row + Row row =3D rowIterator.nextRow(); + // get all values of row + Value[] values =3D row.getValues(); +} +
+
+ Scoring + + The result returns a score for each row in the result set.= The score contains a value that indicates a rating of how well the result = node matches the query. A high value means a better matching than a low val= ue. This score can be used for ordering the result. + + + eXo JCR Scoring is a mapping of Lucene scoring. For a more= in-depth understanding, please study Lucene documentation. + + + The jcr:score is calculated as; (lucene score)*1000f. + +
+
+
+ Tips and tricks +
+ XPath queries containing node names starting with a number<= /title> + <para> + If you execute an XPath request like this... + </para> + <programlisting language=3D"Java" role=3D"Java">// get QueryManager +QueryManager queryManager =3D workspace.getQueryManager(); = +// make XPath query +Query query =3D queryManager.createQuery("/jcr:root/Documents/Publie/= 2010//element(*, exo:article)", Query.XPATH);</programlisting> + <para> + ...you will receive an <code>Invalid request</code> error. This is becau= se XML (and thus XPath) does not allow names starting with a number. + </para> + <para> + Therefore, XPath requests using a node name that starts with a number ar= e invalid. + </para> + <para> + Some possible alternatives are: + </para> + <itemizedlist> + <listitem> + <para> + Use an SQL request. + </para> + </listitem> + <listitem> + <para> + Use escaping: + </para> + <programlisting language=3D"Java" role=3D"Java">// get QueryMa= nager +QueryManager queryManager =3D workspace.getQueryManager(); = +// make XPath query +Query query =3D queryManager.createQuery("/jcr:root/Documents/Publie/= _x0032_010//element(*, exo:article)", Query.XPATH);</programlisting> + </listitem> + </itemizedlist> + </section> + </section> + </chapter> + <chapter xmlns=3D"" id=3D"chap-Reference_Guide-Searching_Repository_Cont= ent"> + <title>Searching Repository Content +
+ Introduction + + You can find the JCR configuration file here: JP= P_DIST/gatein/gatein.ear/portal.war/portal/WEB-INF/conf/jcr/r= epository-configuration.xml. + + + Please refer to for more information about index configuration. + +
+
+ Bi-directional RangeIterator + + QueryResult.getNodes() will return bi-directional NodeIterator implementation. + + + + Bi-directional NodeIterator is not supported in two cases: + + + + + SQL query: select * from nt:base + + + + + XPath query: //* . + + + + + + TwoWayRangeIterator interface: + + /** + * Skip a number of elements in the iterator. + * = + * @param skipNum the non-negative number of elements to skip + * @throws java.util.NoSuchElementException if skipped past the first elem= ent + * in the iterator. + */ +public void skipBack(long skipNum); + + Usage: + + NodeIterator iter = =3D queryResult.getNodes(); +while (iter.hasNext()) { + if (skipForward) { + iter.skip(10); // Skip 10 nodes in forward direction + } else if (skipBack) { + TwoWayRangeIterator backIter =3D (TwoWayRangeIterator) iter; = + backIter.skipBack(10); // Skip 10 nodes back = + } + ....... +} +
+
+ Fuzzy Searches + + The JBoss Portal Platform JCR supports features such as Lucene Fuzzy Se= arches. To perform a fuzzy search, form your query like the one below: + + QueryManager qman = =3D session.getWorkspace().getQueryManager(); +Query q =3D qman.createQuery("select * from nt:base where contains(fi= eld, 'ccccc~')", Query.SQL); +QueryResult res =3D q.execute(); +
+
+ SynonymSearch + + Searching with synonyms is integrated in the jcr:contains() function and uses the same syntax as synonym searches in web search= engines (Google, for example). If a search term is prefixed by a tilde sym= bol ( ~ ), synonyms of the search term are taken into consideration. For ex= ample: + + SQL: select * from nt:resource where contains(., &ap= os;~parameter') + +XPath: //element(*, nt:resource)[jcr:contains(., '~parameter') + + This feature is disabled by default and you need to add a configuration= parameter to the query-handler element in your JCR configuration file to e= nable it. + + <param name=3D&quo= t;synonymprovider-config-path" value=3D"..you path to configurati= on file....."/> +<param name=3D"synonymprovider-class" value=3D"org.exop= latform.services.jcr.impl.core.query.lucene.PropertiesSynonymProvider"= /> + /** + * <code>SynonymProvider</code> defines an interface for a com= ponent that + * returns synonyms for a given term. + */ +public interface SynonymProvider { + + /** + * Initializes the synonym provider and passes the file system resource= to + * the synonym provider configuration defined by the configuration valu= e of + * the <code>synonymProviderConfigPath</code> parameter. Th= e resource may be + * <code>null</code> if the configuration parameter is not = set. + * + * @param fsr the file system resource to the synonym provider + * configuration. + * @throws IOException if an error occurs while initializing the synonym + * provider. + */ + public void initialize(InputStream fsr) throws IOException; + + /** + * Returns an array of terms that are considered synonyms for the given + * <code>term</code>. + * + * @param term a search term. + * @return an array of synonyms for the given <code>term</code= > or an empty + * array if no synonyms are known. + */ + public String[] getSynonyms(String term); +} +
+
+ Highlighting + + An ExcerptProvider retrieves text excerpts for a nod= e in the query result and marks up the words in the text that match the que= ry terms. + + + By default, match highlighting is disabled because as it requires that = additional information is written to the search index. + + + To enable this feature, you need to add a configuration parameter to th= e query-handler element in your JCR configuration fi= le: + + <param name=3D"= ;support-highlighting" value=3D"true"/> + + Additionally, there is a parameter that controls the format of the exce= rpt created. In JCR 1.9, the default is set to org.exoplatform.ser= vices.jcr.impl.core.query.lucene.DefaultHTMLExcerpt. The configur= ation parameter for this setting is: + + <param name=3D"= ;excerptprovider-class" value=3D"org.exoplatform.services.jcr.imp= l.core.query.lucene.DefaultXMLExcerpt"/> +
+ DefaultXMLExcerpt + + This excerpt provider creates an XML fragment of the following form: + + <excerpt> + <fragment> + <highlight>exoplatform</highlight> implements both the= mandatory + XPath and optional SQL <highlight>query</highlight> sy= ntax. + </fragment> + <fragment> + Before parsing the XPath <highlight>query</highlight> = in + <highlight>exoplatform</highlight>, the statement is s= urrounded + </fragment> +</excerpt> +
+
+ DefaultHTMLExcerpt + + This excerpt provider creates an HTML fragment of the following form: + + <div> + <span> + <strong>exoplatform</strong> implements both the manda= tory XPath + and optional SQL <strong>query</strong> syntax. + </span> + <span> + Before parsing the XPath <strong>query</strong> in + <strong>exoplatform</strong>, the statement is surroun= ded + </span> +</div> +
+
+ Usage + + If you are using XPath, you must use the rep:excerpt() fu= nction in the last location step, just like you would select properties: + + QueryManager qm = =3D session.getWorkspace().getQueryManager(); +Query q =3D qm.createQuery("//*[jcr:contains(., 'exoplatform&apo= s;)]/(@Title|rep:excerpt(.))", Query.XPATH); +QueryResult result =3D q.execute(); +for (RowIterator it =3D result.getRows(); it.hasNext(); ) { + Row r =3D it.nextRow(); + Value title =3D r.getValue("Title"); + Value excerpt =3D r.getValue("rep:excerpt(.)"); +} + + The above code searches for nodes that contain the word exop= latform and then gets the value of the Title property and an excerpt for each resultant node. + + + It is also possible to use a relative path in the call Row.getVa= lue() while the query statement still remains the same. Also, you ma= y use a relative path to a string property. The returned value will then be= an excerpt based on string value of the property. + + + Both available excerpt providers will create fragments of about 150 ch= aracters and up to three fragments. + + + In SQL, the function is called excerpt() without the rep = prefix, but the column in the RowIterator will nonethele= ss be labelled rep:excerpt(.). + + QueryManager qm = =3D session.getWorkspace().getQueryManager(); +Query q =3D qm.createQuery("select excerpt(.) from nt:resource where = contains(., 'exoplatform')", Query.SQL); +QueryResult result =3D q.execute(); +for (RowIterator it =3D result.getRows(); it.hasNext(); ) { + Row r =3D it.nextRow(); + Value excerpt =3D r.getValue("rep:excerpt(.)"); +} +
+
+
+ SpellChecker + + The lucene based query handler implementation supports a pluggable spel= l-checker mechanism. By default, spell checking is not available, it must b= e configured first. + + + Information about the spellCheckerClass paramete= r is available in . + + + The JCR currently provides an implementation class which uses the lucene-spellch= ecker. + + + The dictionary is derived from the fulltext, indexed content of the wor= kspace and updated periodically. You can configure the refresh interval by = picking one of the available inner classes of org.exoplatform.serv= ices.jcr.impl.core.query.lucene.spell.LuceneSpellChecker: + + + + + OneMinuteRefreshInterval + + + + + FiveMinutesRefreshInterval + + + + + ThirtyMinutesRefreshInterval + + + + + OneHourRefreshInterval + + + + + SixHoursRefreshInterval + + + + + TwelveHoursRefreshInterval + + + + + OneDayRefreshInterval + + + + + For example, if you want a refresh interval of six hours, the class nam= e would be; org.exoplatform.services.jcr.impl.core.query.lucene.sp= ell.LuceneSpellChecker$SixHoursRefreshInterval. + + + If you use org.exoplatform.services.jcr.impl.core.query.lucene= .spell.LuceneSpellChecker, the refresh interval will be one hour. + + + The spell checker dictionary is stored as a lucene index under <index-dir>/spellchecker. If this index does not exist, = a background thread will create it on start up. Similarly, the dictionary r= efresh is also done in a background thread so as not to block regular queri= es. + +
+ Usage + + You can spell check a fulltext statement either with an XPath or a SQL= query: + + // rep:spellcheck(= 'explatform') will always evaluate to true +Query query =3D qm.createQuery("/jcr:root[rep:spellcheck('explat= form')]/(rep:spellcheck())", Query.XPATH); +RowIterator rows =3D query.execute().getRows(); +// the above query will always return the root node no matter what string = we check +Row r =3D rows.nextRow(); +// get the result of the spell checking +Value v =3D r.getValue("rep:spellcheck()"); +if (v =3D=3D null) { + // no suggestion returned, the spelling is correct or the spell checker + // does not know how to correct it. +} else { + String suggestion =3D v.getString(); +} + + And the same using SQL: + + // SPELLCHECK(&apo= s;exoplatform') will always evaluate to true +Query query =3D qm.createQuery("SELECT rep:spellcheck() FROM nt:base = WHERE jcr:path =3D '/' AND SPELLCHECK('explatform')&quo= t;, Query.SQL); +RowIterator rows =3D query.execute().getRows(); +// the above query will always return the root node no matter what string = we check +Row r =3D rows.nextRow(); +// get the result of the spell checking +Value v =3D r.getValue("rep:spellcheck()"); +if (v =3D=3D null) { + // no suggestion returned, the spelling is correct or the spell checker + // does not know how to correct it. +} else { + String suggestion =3D v.getString(); +} +
+
+
+ Similarity + + Starting with version, 1.12 JCR allows you to search for nodes that are= similar to an existing node. + + + Similarity is determined by looking up terms that are common to nodes. = There are some conditions that must be met for a term to be considered. Thi= s is required to limit the number possibly relevant terms. + + + To be considered, terms must: + + + + + Be at least four characters long. + + + + + Occur at least twice in the source node. + + + + + Occur in at least five other nodes. + + + + + Note + + The similarity function requires that the support Hightlighting is ena= bled. Please make sure that you have the following parameter set for the qu= ery handler in your workspace.xml. + + <param name=3D&qu= ot;support-highlighting" value=3D"true"/> + + + The functions (rep:similar() in XPath and similar()<= /code> in SQL) have two arguments: + + + + <varlistentry> + <term>relativePath</term> + <listitem> + <para> + A relative path to a descendant node or a period (<literal>.</litera= l>) for the current node. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term>absoluteStringPath</term> + <listitem> + <para> + A string literal that contains the path to the node for which to fin= d similar nodes. + </para> + </listitem> + </varlistentry> + </variablelist> + <warning> + <title>Warning + + Relative path is not supported yet. + + + + Example + //element(*, nt:resource)[rep:similar(., '/pa= rentnode/node.txt/jcr:content')] + + Finds nt:resource nodes, which are similar to node = by path /parentnode/node.txt/jcr:content. + + +
+ + + Full Text Search And Affecting Settings + + Property content indexing + + Each property of a node (if it is indexable) is processed with the Luce= ne analyzer and stored in the Lucene index. This is called indexing of a pr= operty. It allows fulltext searching of these indexed properties. + + +
+ Lucene Analyzers + + The purpose of analyzers is to transform all strings stored in the inde= x into a well-defined condition. The same analyzer(s) is/are used when sear= ching in order to adapt the query string to the index reality. + + + Therefore, performing the same query using different analyzers can retu= rn different results. + + + The example below illustrates how the same string is transformed by dif= ferent analyzers. + + + "The quick brown fox jumped over the lazy dogs"</= title> + <tgroup cols=3D"2"> + <thead> + <row> + <entry> Analyzer </entry> + <entry> Parsed </entry> + </row> + </thead> + <tbody> + <row> + <entry> org.apache.lucene.analysis.WhitespaceAnalyzer </entr= y> + <entry> [The] [quick] [brown] [fox] [jumped] [over] [the] [l= azy] [dogs] </entry> + </row> + <row> + <entry> org.apache.lucene.analysis.SimpleAnalyzer </entry> + <entry> [the] [quick] [brown] [fox] [jumped] [over] [the] [l= azy] [dogs] </entry> + </row> + <row> + <entry> org.apache.lucene.analysis.StopAnalyzer </entry> + <entry> [quick] [brown] [fox] [jumped] [over] [lazy] [dogs] = </entry> + </row> + <row> + <entry> org.apache.lucene.analysis.standard.StandardAnalyzer= </entry> + <entry> [quick] [brown] [fox] [jumped] [over] [lazy] [dogs] = </entry> + </row> + <row> + <entry> org.apache.lucene.analysis.snowball.SnowballAnalyzer= </entry> + <entry> [quick] [brown] [fox] [jump] [over] [lazi] [dog] </e= ntry> + </row> + <row> + <entry> org.apache.lucene.analysis.standard.StandardAnalyzer= (configured without stop word - jcr default analyzer) </entry> + <entry> [the] [quick] [brown] [fox] [jumped] [over] [the] [l= azy] [dogs] </entry> + </row> + </tbody> + </tgroup> + </table> + <table id=3D"tabl-Reference_Guide-Lucene_Analyzers-XYampZ_Corporatio= n_xyzexample.com"> + <title>"XY&Z Corporation - xyz(a)example.com" + + + + Analyzer + Parsed + + + + + org.apache.lucene.analysis.WhitespaceAnalyzer + [XY&Z] [Corporation] [-] [xyz(a)example.com] + + + org.apache.lucene.analysis.SimpleAnalyzer + [xy] [z] [corporation] [xyz] [example] [com] + + + org.apache.lucene.analysis.StopAnalyzer + [xy] [z] [corporation] [xyz] [example] [com] + + + org.apache.lucene.analysis.standard.StandardAnalyzer= + [xy&z] [corporation] [xyz(a)example] [com] + + + org.apache.lucene.analysis.snowball.SnowballAnalyzer= + [xy&z] [corpor] [xyz(a)exampl] [com] + + + org.apache.lucene.analysis.standard.StandardAnalyzer= (configured without stop word - jcr default analyzer) + [xy&z] [corporation] [xyz(a)example] [com] + + + +
+ + + StandardAnalyzer is the default analyzer in the JBo= ss Portal Platform JCR search engine. But it does not use stop words. + + + + You can assign your analyzer as described in . + +
+
+ Property Indexing + + Different properties are indexed in different ways and this affects whe= ther it can be searched via fulltext by property or not. + + + Only two property types are indexed as fulltext searcheable: STRING and BINARY. + + + Fulltext search by different properties + + + + Property Type + Fulltext search by all properties + Fulltext search by exact property + + + + + STRING + YES + YES + + + BINARY + YES + NO + + + +
+ + For example, the jcr:data property (which is BINARY) will not be found with a query structured as: + + SELECT * FROM nt:resource WHERE CONTAINS(jcr:data, &= apos;some string') + + This is because, BINARY is not searchable b= y fulltext search by exact property. + + + However, the following query will return some resu= lts (provided, of course they node contains the targeted data): + + SELECT * FROM nt:resource WHERE CONTAINS( * , '= some string') +
+
+ Different Analyzers + + First of all, we will fill repository by nodes with mixin type 'mi= x:title' and different values of 'jcr:description' property. + + root + =E2=94=9C=E2=94=80=E2=94=80 document1 (mix:title) jcr:description =3D &q= uot;The quick brown fox jumped over the lazy dogs" + =E2=94=9C=E2=94=80=E2=94=80 document2 (mix:title) jcr:description =3D &q= uot;Brown fox live in forest." + =E2=94=94=E2=94=80=E2=94=80 document3 (mix:title) jcr:description =3D &q= uot;Fox is a nice animal." + + + The example below shows different Analyzers in action. The first instan= ce uses base JCR settings, so the string; "The quick brown f= ox jumped over the lazy dogs" will be transformed to the se= t; {[the] [quick] [brown] [fox] [jumped] [over] [th= e] [lazy] [dogs] }. + + // make SQL query +QueryManager queryManager =3D workspace.getQueryManager(); +String sqlStatement =3D "SELECT * FROM mix:title WHERE CONTAINS(jcr:d= escription, 'the')"; +// create query +Query query =3D queryManager.createQuery(sqlStatement, Query.SQL); +// execute query and fetch result +QueryResult result =3D query.execute(); + + The NodeIterator will return document1. + + + However, if the default analyzer is changed to org.apache.luce= ne.analysis.StopAnalyzer, the repository populated again (the new= Analyzer must process node properties) and the same query run, it will ret= urn nothing, because stop words like "the" w= ill be excluded from parsed string set. + +
+
+ + WebDAV +
+ Introduction + + The WebDAV protocol enables you to = use third party tools to communicate with hierarchical content servers via = the HTTP protocol. It is possible to add and remove documents or a set of d= ocuments from a path on the server. + + + DeltaV is an extension of the WebDa= v protocol that allows managing document versioning. The Locking<= /emphasis> feature guarantees protection against multiple access when writi= ng resources. The ordering support allows changing the position of the reso= urce in the list and sort the directory to make the directory tree viewed c= onveniently. The full-text search makes it easy to find the necessary docum= ents. You can search by using two languages: SQL and XPATH. + + + In the eXo JCR, the WebDAV layer (based on the code taken from= the extension modules of the reference implementation) is plugged in on to= p of our JCR implementation. This makes it possible to browse a workspace u= sing the third party tools regardless of operating system environments. You= can use a Java WebDAV client, such as DAVExplorer or Internet Explorer using + File + Open as a Web Folder + . + + + WebDav is an extension of the REST service. To get the WebDav = server ready, you must deploy the REST application. Then, you can access an= y workspaces of your repository by using the following URL: + + + + + + When accessing the WebDAV server via , you can substit= ute production with collaboration. + + + You will be asked to enter your login credentials. These will = then be checked by using the organization service that can be implemented t= hanks to an InMemory (dummy) module or a DB module or an LDAP one and the J= CR user session will be created with the correct JCR Credentials. + + + Note: + + If you try the "in ECM" option, add "@ecm&q= uot; to the user's password. Alternatively, you may modify jaas.conf b= y adding the domain=3Decm option as foll= ows: + + exo-domain { + org.exoplatform.services.security.jaas.BasicLoginModule required doma= in=3Decm; +}; + +
+
+ WebDAV Configuration + + The WebDAV configuration file: + + <component> + <key>org.exoplatform.services.webdav.WebDavServiceImpl</key> + <type>org.exoplatform.services.webdav.WebDavServiceImpl</type&g= t; + <init-params> + + <!-- this parameter indicates the default login and password values + used as credentials for accessing the repository --> + <!-- value-param> + <name>default-identity</name> + <value>admin:admin</value> = + </value-param --> + + <!-- this is the value of WWW-Authenticate header --> + <value-param> + <name>auth-header</name> + <value>Basic realm=3D"eXo-Platform Webdav Server 1.6.1&qu= ot;</value> + </value-param> + + <!-- default node type which is used for the creation of collection= s --> + <value-param> + <name>def-folder-node-type</name> + <value>nt:folder</value> + </value-param> + + <!-- default node type which is used for the creation of files --&g= t; + <value-param> + <name>def-file-node-type</name> + <value>nt:file</value> + </value-param> + + <!-- if MimeTypeResolver can't find the required mime type, = + which conforms with the file extension, and the mimeType header i= s absent + in the HTTP request header, this parameter is used = + as the default mime type--> + <value-param> + <name>def-file-mimetype</name> + <value>application/octet-stream</value> + </value-param> + + <!-- This parameter indicates one of the three cases when you updat= e the content of the resource by PUT command. + In case of "create-version", PUT command creates the ne= w version of the resource if this resource exists. + In case of "replace" - if the resource exists, PUT comm= and updates the content of the resource and its last modification date. + In case of "add", the PUT command tries to create the n= ew resource with the same name (if the parent node allows same-name sibling= s).--> + + <value-param> + <name>update-policy</name> + <value>create-version</value> + <!--value>replace</value --> + <!-- value>add</value --> + </value-param> + + <!-- + This parameter determines how service responds to a method that at= tempts to modify file content. + In case of "checkout-checkin" value, when a modification= request is applied to a checked-in version-controlled resource, the reques= t is automatically preceded by a checkout and followed by a checkin operati= on. + In case of "checkout" value, when a modification request= is applied to a checked-in version-controlled resource, the request is aut= omatically preceded by a checkout operation. + --> = + <value-param> + <name>auto-version</name> + <value>checkout-checkin</value> + <!--value>checkout</value --> + </value-param> + + <!-- + This parameter is responsible for managing Cache-Control header va= lue which will be returned to the client. + You can use patterns like "text/*", "image/*" = or wildcard to define the type of content. + --> = + <value-param> + <name>cache-control</name> + <value>text/xml,text/html:max-age=3D3600;image/png,image/jpg:m= ax-age=3D1800;*/*:no-cache;</value> + </value-param> + = + <!-- + This parameter determines the absolute path to the folder icon fil= e, which is shown + during WebDAV view of the contents + --> + <value-param> + <name>folder-icon-path</name> + <value>/absolute/path/to/file</value> + </value-param> + + </init-params> +</component> +
+
+ Corresponding WebDAV and JCR actions + + + <tgroup cols=3D"2"> + <thead> + <row> + <entry> WebDav </entry> + <entry> JCR </entry> + </row> + </thead> + <tbody> + <row> + <entry> COPY </entry> + <entry> Workspace.copy(...) </entry> + </row> + <row> + <entry> DELETE </entry> + <entry> Node.remove() </entry> + </row> + <row> + <entry> GET </entry> + <entry> Node.getProperty(...); Property.getValue() </entry> + </row> + <row> + <entry> HEAD </entry> + <entry> Node.getProperty(...); Property.getLength() </entry> + </row> + <row> + <entry> MKCOL </entry> + <entry> Node.addNode(...) </entry> + </row> + <row> + <entry> MOVE </entry> + <entry> Session.move(...) or Workspace.move(...) </entry> + </row> + <row> + <entry> PROPFIND </entry> + <entry> Session.getNode(...); Node.getNode(...); Node.getNod= es(...); Node.getProperties() </entry> + </row> + <row> + <entry> PROPPATCH </entry> + <entry> Node.setProperty(...); Node.getProperty(...).remove(= ) </entry> + </row> + <row> + <entry> PUT </entry> + <entry> Node.addNode("node","nt:file"); = Node.setProperty("jcr:data", "data") </entry> + </row> + <row> + <entry> CHECKIN </entry> + <entry> Node.checkin() </entry> + </row> + <row> + <entry> CHECKOUT </entry> + <entry> Node.checkout() </entry> + </row> + <row> + <entry> REPORT </entry> + <entry> Node.getVersionHistory(); VersionHistory.getAllVersi= ons(); Version.getProperties() </entry> + </row> + <row> + <entry> RESTORE </entry> + <entry> Node.restore(...) </entry> + </row> + <row> + <entry> UNCHECKOUT </entry> + <entry> Node.restore(...) </entry> + </row> + <row> + <entry> VERSION-CONTROL </entry> + <entry> Node.addMixin("mix:versionable") </entry> + </row> + <row> + <entry> LOCK </entry> + <entry> Node.lock(...) </entry> + </row> + <row> + <entry> UNLOCK </entry> + <entry> Node.unlock() </entry> + </row> + <row> + <entry> ORDERPATCH </entry> + <entry> Node.orderBefore(...) </entry> + </row> + <row> + <entry> SEARCH </entry> + <entry> Workspace.getQueryManager(); QueryManager.createQuer= y(); Query.execute() </entry> + </row> + </tbody> + </tgroup> + </table> + </section> + <section id=3D"sect-Reference_Guide-WebDAV-WebDAV_Considerations"> + <title>WebDAV Considerations + + There are some restrictions for WebDAV in different operating = systems. + + + Windows 7 + + When attempting to set up a web folder through A= dd a Network Location or Map a Network Drive through My Computer, an error message stating The folder you entered does not appear to be valid. Please choose anoth= er or Windows cannot access =E2=80=A6 Check the spelli= ng of the name. Otherwise, there might be =E2=80=A6 may be encou= ntered. These errors may appear when you are using SSL or non-SSL. + + + + To fix this, do as follows: + + + + + Go to Windows Registry Editor. + + + + + Find a key: \HKEY_LOCAL_MACHINE\SYSTEM\CurrentControls= et\services\WebClient\Parameters\BasicAuthLevel . + + + + + Change the value to 2. + + + + + Microsoft Office 2010 + + If you have: + + + + + + Microsoft Office 2007/2010 applications installed on a= client computer AND... + + + + + The client computer is connected to a web server confi= gured for Basic authentication VIA... + + + + + A connection that does not use Secure Sockets Layer (S= SL) AND... + + + + + You try to access an Office file that is stored on the= remote server... + + + + + You might experience the following symptoms when you t= ry to open or to download the file: + + + + + The Office file does not open or download. + + + + + You do not receive a Basic authentication pass= word prompt when you try to open or to download the file. + + + + + You do not receive an error message when you t= ry to open the file. The associated Office application starts. However, the= selected file does not open. + + + + + + + These outcomes can be circumvented by enabling Basic authentic= ation on the client machine. + + + To enable Basic authentication on the client computer, follow = these steps: + + + + + Click Start, type regedit in the St= art Search box, and then press Enter. + + + + + Locate and then click the following registry subkey: + + + HKEY_CURRENT_USER\Software\Microsoft\Office\14.= 0\Common\Internet + + + + + On the Edit menu, point to New, and then click DWORD Value. + + + + + Type BasicAuthLevel, and then press= Enter. + + + + + Right-click BasicAuthLevel, and the= n click Modify. + + + + + In the Value data box, type 2, and = then click OK. + + + + + + + FTP +
+ Introduction + + The JCR-FTP Server operates as an FTP server with access to a = content stored in JCR repositories in the form of nt:file/nt:folde= r nodes or their successors. The client of an executed Server can= be any FTP client. The FTP server is supported by a standard configuration= which can be changed as required. + +
+
+ Configuration Parameters + + Parameters + + command-port: + + <value-param&= gt; + <name>command-port</name> + <value>21</value> +</value-param> + + The value of the command channel port. The value &= apos;21' is default. + + + If you have already other FTP server installed in = your system, this parameter needs to be changed (to 2121= , for example) to avoid conflicts or if the port is protected. + + + + + data-min-port and data-max-port + + <value-param&= gt; + <name>data-min-port</name> + <value>52000</value> +</value-param> + <value-param&= gt; + <name>data-max-port</name> + <value>53000</value> +</value-param> + + These two parameters indicate the minimum and maxi= mum values of the range of ports, used by the server. The usage of the addi= tional data channel is required by the FTP protocol, which is used to trans= fer the contents of files and the listing of catalogues. This range of port= s should be free from listening by other server-programs. + + + + + system + + <value-param&= gt; + <name>system</name> + + <value>Windows_NT</value> + or + <value>UNIX Type: L8</value> +</value-param> + + Types of formats of listing of catalogues which ar= e supported. + + + + + client-side-encoding + + <value-param&= gt; + <name>client-side-encoding</name> + = + <value>windows-1251</value> + or + <value>KOI8-R</value> + = +</value-param> + + This parameter specifies the coding which is used = for dialogue with the client. + + + + + def-folder-node-type + + <value-param&= gt; + <name>def-folder-node-type</name> + <value>nt:folder</value> +</value-param> + + This parameter specifies the type of a node, when = an FTP-folder is created. + + + + + def-file-node-type + + <value-param&= gt; + <name>def-file-node-type</name> + <value>nt:file</value> +</value-param> + + This parameter specifies the type of a node, when = an FTP-file is created. + + + + + def-file-mime-type + + <value-param&= gt; + <name>def-file-mime-type</name> = + <value>application/zip</value> +</value-param> + + The mime type of a created file is chosen by using= its file extension. In case, a server cannot find the corresponding mime t= ype, this value is used. + + + + + cache-folder-name + + <value-param&= gt; + <name>cache-folder-name</name> + <value>../temp/ftp_cache</value> +</value-param> + + The Path of the cache folder. + + + + + upload-speed-limit + + <value-param&= gt; + <name>upload-speed-limit</name> = + <value>20480</value> +</value-param> + + Restriction of the upload speed. It is measured in= bytes. + + + + + download-speed-limit + + <value-param&= gt; + <name>download-speed-limit</name> + <value>20480</value> = +</value-param> + + Restriction of the download speed. It is measured = in bytes. + + + + + timeout + + <value-param&= gt; + <name>timeout</name> + <value>60</value> +</value-param> + + Defines the value of a timeout. + + + + +
+
+ + Use External Backup Tool +
+ Repository Suspending + + To have the repository content consistent with the search index and val= ue storage, the repository should be suspended. This means all working thre= ads are suspended until a resume operation is performed. The index will be = flushed. + + + JCR provides ability to suspend repository via JMX. + +
+ Repository Suspend Contr= oller + + + + + +
+ + To suspend repository you need to invoke the suspend() operation. The returned result will be "suspended" if everything passed successfully. + +
+ Repository = Suspend Controller Suspended + + + + + +
+ + An "undefined" result means not all comp= onents were successfully suspended. Check the console to review the stack t= races. + +
+
+ Backup + + You can backup your content manually or by using third part software. Y= ou should back up: + + + + + Database. + + + + + Lucene index. + + + + + Value storage (if configured). + + + +
+
+ Repository Resuming + + Once a backup is done you need to invoke the resume() operation to switch the repository back to on-line. The returned result w= ill be "on-line". + +
+ Repository Sus= pend Controller Online + + + + + +
+
+
+ + eXo JCR statistics +
+ Statistics on the Database Access Layer + + In order to have a better idea of the time spent into the data= base access layer, it can be interesting to get some statistics on that par= t of the code, knowing that most of the time spent into eXo JCR is mainly t= he database access. + + + These statistics will then allow you to identify, without usin= g any profiler, what is abnormally slow in this layer which could help diag= nose, and fix, a problem. + + + If you use org.exoplatform.services.jcr.impl.storage.jd= bc.optimisation.CQJDBCWorkspaceDataContainer or org.exoplatf= orm.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer as WorkspaceDataContainer, you can get statistics on the time spe= nt into the database access layer. + + + The database access layer (in eXo JCR) is represented by the m= ethods of the interface org.exoplatform.services.jcr.storage.Workspa= ceStorageConnection, so for all the methods defined in this interfa= ce, we can have the following figures: + + + + + The minimum time spent into the method. + + + + + The maximum time spent into the method. + + + + + The average time spent into the method. + + + + + The total amount of time spent into the method. + + + + + The total amount of time the method has been called. + + + + + Those figures are also available globally for all the methods = which gives us the global behavior of this layer. + + + If you want to enable the statistics, you just need to set the= JVM parameter called JDBCWorkspaceDataContainer.statistics.enab= led to true. The corresponding CSV file is= StatisticsJDBCStorageConnection-${creation-timestamp}.csv for more details about how the CSV files are managed, please refer to = the section dedicated to the statistics manager. + + + The format of each column header is ${method-alia= s}-${metric-alias}. The metric ali= as are described in the statistics manager section. + + + The name of the category of statistics corresponding to these = statistics is JDBCStorageConnection, this name is mostly= needed to access to the statistics through JMX. + +
+ Method Alias + + + + global + This is the alias for all the methods. + + + getItemDataById + This is the alias for the method getItemDa= ta(String identifier). + + + getItemDataByNodeDataNQPathEntry + This is the alias for the method getItemDa= ta(NodeData parentData, QPathEntry name). + + + getChildNodesData + This is the alias for the method getChildN= odesData(NodeData parent). + + + getChildNodesCount + This is the alias for the method getChildN= odesCount(NodeData parent). + + + getChildPropertiesData + This is the alias for the method getChildP= ropertiesData(NodeData parent). + + + listChildPropertiesData + This is the alias for the method listChild= PropertiesData(NodeData parent). + + + getReferencesData + This is the alias for the method getRefere= ncesData(String nodeIdentifier). + + + commit + This is the alias for the method commit().= + + + addNodeData + This is the alias for the method add(NodeD= ata data). + + + addPropertyData + This is the alias for the method add(Prope= rtyData data). + + + updateNodeData + This is the alias for the method update(No= deData data). + + + updatePropertyData + This is the alias for the method update(Pr= opertyData data). + + + deleteNodeData + This is the alias for the method delete(No= deData data). + + + deletePropertyData + This is the alias for the method delete(Pr= opertyData data). + + + renameNodeData + This is the alias for the method rename(No= deData data). + + + rollback + This is the alias for the method rollback(= ). + + + isOpened + This is the alias for the method isOpened(= ). + + + close + This is the alias for the method close().<= /emphasis> + + + +
+
+
+ Statistics on the JCR API accesses + + In order to know exactly how your application uses eXo JCR, it= can be interesting to register all the JCR API accesses in order to easily= create real life test scenario based on pure JCR calls and also to tune yo= ur JCR to better fit your requirements. + + + In order to allow you to specify the configuration which part = of eXo JCR needs to be monitored without applying any changes in your code = and/or building anything, we choose to rely on the Load-time Weaving propos= ed by AspectJ. + + + To enable this feature, you will have to add in your classpath= the following jar files: + + + + + exo.jcr.component.statistics-X.Y.Z.jar corresponding to your eXo JCR version that you can get from the JBoss= maven repository https://re= pository.jboss.org/nexus/content/groups/public/org/exoplatform/jcr/exo.jcr.= component.statistics/. + + + + + aspectjrt-1.6.8.jar that you can get from the main mav= en repository + http://repo2.maven.org/maven2/org/aspectj/aspectjrt + . + + + + + You will also need to get aspectjweaver-1.6.8.jar from the main maven repository http://repo2.maven.org/maven2/org/aspec= tj/aspectjweaver. + + + At this stage, to enable the statistics on the JCR API accesse= s, you will need to add the JVM parameter -javaagent:${pathto}/a= spectjweaver-1.6.8.jar to your command line, for more details p= lease refer to http://www.eclipse.org/aspectj/doc/released/= devguide/ltw-configuration.html. + + + By default, the configuration will collect statistics on all t= he methods of the internal interfaces org.exoplatform.services.jcr= .core.ExtendedSession and org.exoplatform.services.jcr.c= ore.ExtendedNode, and the JCR API interface javax.jcr.Pr= operty. + + + To add and/or remove some interfaces to monitor, you have two = configuration files to change that are bundled into the jar exo.jc= r.component.statistics-X.Y.Z.jar, which are conf/config= uration.xml and META-INF/aop.xml. + + + The file content below is the content of conf/config= uration.xml that you will need to modify to add and/or remove th= e full qualified name of the interfaces to monitor, into the list of parame= ter values of the init param called targetInterfaces. + + <configuration xmln= s:xsi=3D"http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLoc= ation=3D"http://www.exoplaform.org/xml/ns/kernel_1_2.xsd http://www.ex= oplaform.org/xml/ns/kernel_1_2.xsd" + xmlns=3D"http://www.exoplaform.org/xml/ns/kernel_1_2.xsd"> + + <component> + <type>org.exoplatform.services.jcr.statistics.JCRAPIAspectConfig&= lt;/type> + <init-params> + <values-param> + <name>targetInterfaces</name> + <value>org.exoplatform.services.jcr.core.ExtendedSession</= value> + <value>org.exoplatform.services.jcr.core.ExtendedNode</val= ue> + <value>javax.jcr.Property</value> + </values-param> + </init-params> + </component> +</configuration> + + The file content below is the content of META-INF/ao= p.xml that you will to need to modify to add and/or remove the f= ull qualified name of the interfaces to monitor, into the expression filter= of the pointcut called JCRAPIPointcut. + + + By default only JCR API calls from the exoplatform packages are taken into account. This filter can be modified to add= other package names. + + <aspectj> + <aspects> + <concrete-aspect name=3D"org.exoplatform.services.jcr.statisti= cs.JCRAPIAspectImpl" extends=3D"org.exoplatform.services.jcr.stat= istics.JCRAPIAspect"> + <pointcut name=3D"JCRAPIPointcut" + expression=3D"(target(org.exoplatform.services.jcr.core.Exten= dedSession) || target(org.exoplatform.services.jcr.core.ExtendedNode) || ta= rget(javax.jcr.Property)) && call(public * *(..))" /> + </concrete-aspect> + </aspects> + <weaver options=3D"-XnoInline"> + <include within=3D"org.exoplatform..*" /> + </weaver> +</aspectj> + + The corresponding CSV files are of type Statistics${interface-name}-${creation-timestam= p}.csv for more details about how the CS= V files are managed, please refer to the section dedicated to th= e statistics manager. + + + The format of each column header is ${method-alia= s}-${metric-alias}. The method ali= as will be of type ${method-name}(semicolon-delimited-list-of-= parameter-types-to-be-compatible-with-the-CSV-format). + + + The metric alias are described in the statistics manager secti= on. + + + The name of the category of statistics corresponding to these = statistics is the simple name of the monitored interface (e.g. Ext= endedSession for org.exoplatform.services.jcr.core.Exten= dedSession), this name is mostly needed to access to the statisti= cs through JMX. + + + Performance Consideration + + Please note that this feature will affect the performances= of eXo JCR so it must be used with caution. + + +
+
+ Statistics Manager + + The statistics manager manages all the statistics provided by = eXo JCR, it is responsible of printing the data into the CSV files and also= exposing the statistics through JMX and/or Rest. + + + The statistics manager will create all the CSV files for each = category of statistics that it manages, the format of those files is Statistics${category-name}-${creation-timestamp}.csv. Those = files will be created into the user directory if it is possible otherwise i= t will create them into the temporary directory. The format of those files = is CSV (i.e. Comma-Separated Values), one new line will be a= dded regularly (every 5 seconds by default) and one last line will be added= at JVM exit. Each line, will be composed of the 5 figures described below = for each method and globally for all the methods. + + + + Metric Alias + + + + Min + The minimum time spent into the method expressed i= n milliseconds. + + + Max + The maximum time spent into the method expressed i= n milliseconds. + + + Total + The total amount of time spent into the method exp= ressed in milliseconds. + + + Avg + The average time spent into the method expressed i= n milliseconds. + + + Times + The total amount of times the method has been call= ed. + + + +
+ You can disable the persistence of the statistics by setting = the JVM parameter called JCRStatisticsManager.persistence.enable= d to false. It is set to true by default. +
+ + You can also define the period of time between each record (th= at is, line of data into the file) by setting the JVM parameter called JCRStatisticsManager.persistence.timeout to your expecte= d value expressed in milliseconds. It is set to 5000 by = default. + + + You can also access to the statistics via JMX. The available m= ethods are: + + + + JMX Methods + + + + getMin + Give the minimum time spent into the method corres= ponding to the given category name and statistics name. The expected argume= nts are the name of the category of statistics (JDBCStorageConnect= ion for example) and the name of the expected method or global fo= r the global value. + + + getMax + Give the maximum time spent into the method corres= ponding to the given category name and statistics name. The expected argume= nts are the name of the category of statistics and the name of the expected= method or global for the global value. + + + getTotal + Give the total amount of time spent into the metho= d corresponding to the given category name and statistics name. The expecte= d arguments are the name of the category of statistics and the name of the = expected method or global for the global value. + + + getAvg + Give the average time spent into the method corres= ponding to the given category name and statistics name. The expected argume= nts are the name of the category of statistics and the name of the expected= method or global for the global value. + + + getTimes + Give the total amount of times the method has been= called corresponding to the given category name and statistics name. The e= xpected arguments are the name of the category of statistics (e.g. JDBCStor= ageConnection) and the name of the expected method or global for the global= value. + + + reset + Reset the statistics for the given category name a= nd statistics name. The expected arguments are the name of the category of = statistics and the name of the expected method or global for the global val= ue. + + + resetAll + Reset all the statistics for the given category na= me. The expected argument is the name of the category of statistics (e.g. J= DBCStorageConnection). + + + +
+ The full name of the related MBean is xo:service=3Ds= tatistic, view=3Djcr. +
+
+
+ + Checking repository integrity and consistency +
+ JMX-based consistency tool + + It is important to check the integrity and consistency of system regula= rly, especially if there is no, or stale, backups. The JBoss Portal Platfor= m JCR implementation offers an innovative JMX-based complex checking tool. + + + During an inspection, the tool checks every major JCR component, such a= s persistent data layer and the index. The persistent layer includes JDBC D= ata Container and Value-Storages if they are configured. + + + The database is verified using the set of complex specialized domain-sp= ecific queries. The Value Storage tool checks the existence of, and access = to, each file. + + + Access to the check tool is exposed via the JMX interface, with the fol= lowing operations available: + + + Available methods + + + + + checkRepositoryDataConsistency() + + Inspect full repository data (db, value storage and = search index) + + + + checkRepositoryDataBaseConsistency() + + Inspect only DB + + + + checkRepositoryValueStorageConsistency() + + Inspect only ValueStorage + + + + checkRepositorySearchIndexConsistency() + + Inspect only SearchIndex + + + +
+ + All inspection activities and corrupted data details are stored in a fi= le in the app directory and named as per the following= convention: report-<repository name>-dd-MMM-yy-HH-mm.txt . + + + The path to the file will be returned in result message also at the end= of the inspection. + + + + There are three types of inconsistency (Warning, Error and Index) and = two of them are critical (Errors and Index): + + + + + Index faults are marked as "Reindex" and can be fixed by r= e-indexing the workspace. + + + + + Errors can only be fixed manually. + + + + + Warnings can be a normal situation in some cases and usually product= ion system will still remain fully functional. + + + + +
+
- + DOC NOTE: Could possibly be moved to a specific Tuning Guide later= --> + JCR Performance Tuning Guide +
+ Introduction + + This section will show you various ways of improving JCR performance. + + + It is intended for Administrators and others who want to use the JCR fe= atures more efficiently. + +
+
+ JCR Performance and Scalability +
+ Cluster configuration + + The table below contains details about the configuration of the cluste= r used in benchmark testing: + + + EC2 network: 1Gbit + + + + + + + Servers hardware + Specification + + + + + RAM + 7.5 GB + + + Processors + 4 EC2 Compute Units (2 virtual cores with 2 EC2 Co= mpute Units each) + + + Storage + 850 GB (2=C3=97420 GB plus 10 GB root partition) <= /entry> + + + Architecture + 64-bit + + + I/O Performance + High + + + API name + + m1.large + + + + + Note: + + + + NFS and statistics (cacti snmp)= server were located on one physical server. + + + + JBoss Enterprise Application Pla= tform 6 configuration: + + + + + JAVA_OPTS: -Dprogram.name=3Drun.sh -server -Xms4g = -Xmx4g -XX:MaxPermSize=3D512m -Dorg.jboss.resolver.warning=3Dtrue -Dsun.rmi= .dgc.client.gcInterval=3D3600000 -Dsun.rmi.dgc.server.gcInterval=3D3600000 = -XX:+UseParallelGC -Djava.net.preferIPv4Stack=3Dtrue + + + + +
+
+
+ JCR Clustered Performance + + Benchmark test using WebDAV (Complex read/write load test (benchmark))= with 20K same file. To obtain per-operation results we have used custom ou= tput from the test case threads to CSV file. + + + Read operation: = + + Warm-up iterations: 100 + Run iterations: 2000 + Background writing threads: 25 + Reading threads: 225 + + + +
+ EC2 Performance Results + + + + + +
+ + + <tgroup cols=3D"4"> + <thead> + <row> + <entry> Nodes count </entry> + <entry> tps </entry> + <entry> Responses >2s </entry> + <entry> Responses >4s </entry> + </row> + </thead> + <tbody> + <row> + <entry> 1 </entry> + <entry> 523 </entry> + <entry> 6.87% </entry> + <entry> 1.27% </entry> + </row> + <row> + <entry> 2 </entry> + <entry> 1754 </entry> + <entry> 0.64% </entry> + <entry> 0.08% </entry> + </row> + <row> + <entry> 3 </entry> + <entry> 2388 </entry> + <entry> 0.49% </entry> + <entry> 0.09% </entry> + </row> + <row> + <entry> 4 </entry> + <entry> 2706 </entry> + <entry> 0.46% </entry> + <entry> 0.1% </entry> + </row> + </tbody> + </tgroup> + </table> + <para> + <citetitle>Read operation with more threads</citetitle>: + </para> + <simplelist> + <member>Warm-up iterations: 100</member> + <member>Run iterations: 2000</member> + <member>Background writing threads: 50</member> + <member>Reading threads: 450</member> + </simplelist> + <figure> + <title id=3D"perf_EC2_result2">EC2 Performance Results 2 + + + + + + +
+ + <tgroup cols=3D"4"> + <thead> + <row> + <entry> Nodes count </entry> + <entry> tps </entry> + <entry> Responses >2s </entry> + <entry> Responses >4s </entry> + </row> + </thead> + <tbody> + <row> + <entry> 1 </entry> + <entry> 116 </entry> + <entry> ? </entry> + <entry> ? </entry> + </row> + <row> + <entry> 2 </entry> + <entry> 1558 </entry> + <entry> 6.1% </entry> + <entry> 0.6% </entry> + </row> + <row> + <entry> 3 </entry> + <entry> 2242 </entry> + <entry> 3.1% </entry> + <entry> 0.38% </entry> + </row> + <row> + <entry> 4 </entry> + <entry> 2756 </entry> + <entry> 2.2% </entry> + <entry> 0.41% </entry> + </row> + </tbody> + </tgroup> + </table> + </section> + </section> + <section id=3D"sect-Reference_Guide-JCR_Performance_Tuning_Guide-Perfo= rmance_Tuning_Guide"> + <title>Performance Tuning Guide +
+ JBoss Enterprise Application Platform 6 Tuning + + You can use maxThreads parameter to increase ma= ximum amount of threads that can be launched in AS instance. This can impro= ve performance if you need a high level of concurrency. also you can use -XX:+UseParallelGC java directory to use parallel garbage collec= tor. + + + Note + + Beware of setting maxThreads too big, this can= cause OutOfMemoryError. We've got it w= ith maxThreads=3D1250 on such machine: + + + 7.5 GB memory + 4 EC2 Compute Units (2 virtual cores with 2 EC2 Comput= e Units each) + 850 GB instance storage (2=C3=97420 GB plus 10 GB root= partition) + 64-bit platform + I/O Performance: High + API name: m1.large + java -Xmx 4g + + +
+
+ JCR Cache Tuning + + Cache size + + + JCR-cluster implementation is built using JBoss Cache as distributed, = replicated cache. But there is one particularity related to remove action i= n it. Speed of this operation depends on the actual size of cache. As many = nodes are currently in cache as much time is needed to remove one particula= r node (subtree) from it. + + + Eviction + + + Manipulations with eviction wakeUpInterval valu= e does not affect on performance. Performance results with values from 500 = up to 3000 are approximately equal. + + + Transaction Timeout + + + Using short timeout for long transactions such as Export/Import, remov= ing huge subtree defined timeout may cause TransactionTimeou= tException. + +
+
+ Clustering + + For performance it is better to have a load-balancer, DB server and sh= ared NFS on different computers. If in some reasons you see that one node g= ets more load than others you can decrease this load using load value in lo= ad balancer. + + + JGroups configuration + + + It's recommended to use "multiplexer stack" feature pre= sent in JGroups. It is set by default in eXo JCR and offers higher performa= nce in cluster, using less network connections also. If there are two or mo= re clusters in your network, please check that they use different ports and= different cluster names. + + + Write performance in cluster + + + Exo JCR implementation uses Lucene indexing engine to provide search c= apabilities. But Lucene brings some limitations for write operations: it ca= n perform indexing only in one thread. That is why write performance in clu= ster is not higher than in singleton environment. Data is indexed on coordi= nator node, so increasing write-load on cluster may lead to ReplicationTime= out exception. It occurs because writing threads queue in the indexer and u= nder high load timeout for replication to coordinator will be exceeded. + + + Taking in consideration this fact, it is recommended to exceed replTimeout value in cache configurations in case of high w= rite-load. + + + Replication timeout + + + Some operations may take too much time. So if you get R= eplicationTimeoutException try increasing replication timeo= ut: + + <clustering mo= de=3D"replication" clusterName=3D"${jbosscache-cluster-name}= "> + ... + <sync replTimeout=3D"60000" /> + </clustering> + + + value is set in milliseconds. + +
+ + + + eXo JCR with JBoss Portal Platform +
+ How to use a Managed DataSource under JBoss Enterprise Applic= ation Platform 6 +
+ Configurations Steps +
+ Declaring the Datasources in the AS + NEEDINFO - FILE PATHS - I know this isn't right. Wh= ere do these get deployed again? + + To declare the datasources using a JBoss application server, deploy a = ds file (XXX-ds.xml= ) into the deploy directory of the appropri= ate server profile (/server/PROFILE/de= ploy, for example). + + + This file configures all datasources which JBoss Portal Platform will = need (there should be four specifically named: jdbcjcr_portal, jdbcjcr_portal-sample, jdbcidm_port= al and jdbcidm_sample-portal). + + + For example: + + <?xml version= =3D"1.0" encoding=3D"UTF-8"?> +<datasources> + <no-tx-datasource> + <jndi-name>jdbcjcr_portal</jndi-name> + <connection-url>jdbc:hsqldb:${jboss.server.data.dir}/data/jdbc= jcr_portal</connection-url> + <driver-class>org.hsqldb.jdbcDriver</driver-class> + <user-name>sa</user-name> + <password></password> + </no-tx-datasource> + + <no-tx-datasource> + <jndi-name>jdbcjcr_sample-portal</jndi-name> + <connection-url>jdbc:hsqldb:${jboss.server.data.dir}/data/jdbc= jcr_sample-portal</connection-url> + <driver-class>org.hsqldb.jdbcDriver</driver-class> + <user-name>sa</user-name> + <password></password> + </no-tx-datasource> + + <no-tx-datasource> + <jndi-name>jdbcidm_portal</jndi-name> + <connection-url>jdbc:hsqldb:${jboss.server.data.dir}/data/jdbc= idm_portal</connection-url> + <driver-class>org.hsqldb.jdbcDriver</driver-class> + <user-name>sa</user-name> + <password></password> + </no-tx-datasource> + + <no-tx-datasource> + <jndi-name>jdbcidm_sample-portal</jndi-name> + <connection-url>jdbc:hsqldb:${jboss.server.data.dir}/data/jdbc= idm_sample-portal</connection-url> + <driver-class>org.hsqldb.jdbcDriver</driver-class> + <user-name>sa</user-name> + <password></password> + </no-tx-datasource> +</datasources> + + The properties can be set for datasource can be found here: Configuring JD= BC DataSources - The non transactional DataSource configuration schema + +
+
+ Do not bind datasources explicitly + + Do not let the portal explicitly bind datasources. + NEEDINFO - FILE PATHS - I think some of the values have = changed in the referenced file when I look at the new file below. New info = required? + Edit the JPP_HOME/sta= ndalone/configuration/gatein/configuration.properties and commen= t out the following rows in the JCR section: + + #gatein.jcr.datasource.driver=3Dorg.hsqldb.jdbcD= river +#gatein.jcr.datasource.url=3Djdbc:hsqldb:file:${gatein.db.data.dir}/data/j= dbcjcr_${name} +#gatein.jcr.datasource.username=3Dsa +#gatein.jcr.datasource.password=3D + + Comment out the following lines in the IDM section: + + #gatein.idm.datasource.driver=3Dorg.hsqldb.jdbcD= river +#gatein.idm.datasource.url=3Djdbc:hsqldb:file:${gatein.db.data.dir}/data/j= dbcidm_${name} +#gatein.idm.datasource.username=3Dsa +#gatein.idm.datasource.password=3D + + Open the jcr-configuration.xml and idm-= configuration.xml files and comment out references to the plug-i= n InitialContextInitializer. + + <!-- Commented = because, Datasources are declared and bound by AS, not in eXo --> +<!-- +<external-component-plugins> + [...] +</external-component-plugins> +--> +
+
+
+
--===============2260912015684865299==--