[
https://jira.jboss.org/jira/browse/DNA-468?page=com.atlassian.jira.plugin...
]
Randall Hauch commented on DNA-468:
-----------------------------------
Completed the integration and implementation of the JCR interfaces, meaning the
implementation of the query functionality in the JCR layers should be complete.
HOWEVER, THE UNDERLYING CONNECTORS DO NOT YET SUPPORT QUERIES OR SEARCHES, SO THE JCR
QUERY FUNCTIONALITY DOES NOT YET WORK.
Updated the JcrQueryManager implementation of javax.jcr.query.QueryManager to parse the
queries using one of three languages: SQL, XPath, and free-text search. The SQL language
is really that spelled out in JCR 2.0, with enhancements to support UNION, INTERSECT,
EXCEPT, IN clauses, additional join types, BETWEEN criteria, and the PATH(...) and
DEPTH(...) functions for use on the left-hand side of criteria (e.g., dynamic operands).
The XPath language is that specified by JCR 1.0. And the free-text search language just
allows the client to submit a search expression that complies with the Section 6.7.19 of
the JCR 2.0 specification (the same search grammar used in the JCR-JQOM
'FullTextSearch' criteria and the 'CONTAINS(...)' function in JCR-SQL2).
If the submitted query is valid and well-formed, a javax.jcr.query.Query object is created
(using either JcrQuery or JcrSearch, depending upon the language). When executed, the
query/search is pushed down to the graph layer via the new Graph API methods. The Graph
then does its thing (like planning, validating, optimizing, and processing) by pushing
down to the connector (as a single batch) the one or more AccessQueryRequest objects. The
connector then does its thing by computing the results for each the access query and
setting the results on the request. The graph's query engine uses these results and
performs any additional operations (like joins, unions, ordering, or application of
additional criteria that couldn't be pushed down) to produce the final results (in the
form of the org.jboss.dna.graph.query.QueryResults). Back in the JCR layer, the graph
results are wrapped by a javax.jcr.query.QueryResult implementation (JcrQueryResult),
where the client can access the javax.jcr.Node object for each row or the Value objects in
each column of each Row.
There are a couple of things to note about the QueryResult implementation. First, because
of the signature of the NodeIterator (specifically that the methods don't throw
RepositoryException), accessing the nodes in the results is done when the NodeIterator is
obtained. In other words, the Nodes are fetched and loaded into the Session immediately
when the 'QueryResult.getNodes()' method is called. However, the RowIterator
method signatures do throw RepositoryException, so the values are NOT loaded when
'QueryResult.getRows()' is called but are instead loaded lazily as the iterator is
used.
Second, the QueryResult always returns the values cached in the Session, meaning that
while transient changes within the Session are not used to evaluate criteria and determine
the rows, the actual values of the rows DO come from the Session's transient state.
This behavior is spelled out in the specification.
Submitting the queries to the Graph API require supplying an implementation of the
org.jboss.dna.graph.validate.Schemata interface. This is used by the query engine to a)
identify valid tables and columns by their names so that the submitted query can be
resolved; b) to identify all selectable columns in a table when SELECT * is used; c)
determine the appropriate datatype for each column appearing in the SELECT or WHERE
clauses; and d) to obtain the definition of views so that use of views in a query plan can
be replaced with the view's definition.
This commit includes an implementation of Schemata that is based upon the node types in
the repository. Each node should appear in all tables represented by its primary type,
all supertypes of the primary type, all mixin types, and all supertypes of all mixin
types.
Each non-mixin node type is represented as a VIEW defined with a query of the form:
SELECT <propertyList> FROM __ALLNODES__
WHERE [jcr:primaryType] IN (<nodeTypeName>,<subtypeNameList>)
Similarly, each mixin node type is represented as a VIEW defined with a query of the
form:
SELECT <propertyList> FROM __ALLNODES__
WHERE [jcr:mixinTypes] IN (<nodeTypeName>,<subtypeNameList>)
In these queries, the '<propertyList>' is the comma-separated names of the
single-valued, non-residual properties explicitly defined on the node type (and optionally
all its supertypes). Also, '<subtypeNameList>' is the comma-separated names
of all node types that have the node type in question as a supertype.
It was pretty easy to create a Schemata implementation of the node types in the
RepositoryNodeTypeManager (the master definition of node types for all workspaces in the
repository). As NodeTypeSchemata is immutable, it can always be used to provide an
immutable, consistent schemata for a query. Additionally, it can continue to be reused
until a node type changes in the RNTM, and the RNTM ensures this is so. A
NodeTypeSchemata is only created when needed by the QueryManager, and will be discarded
any time node types are changed in the repository.
One twist, however, is that Schemata uses stringified names, not DNA Name objects. And
that means the Schemata is dependent upon the namespace mappings, and the NodeTypeSchemata
is dependent upon the JcrRepository's context. Each Session might have redefined some
namespace mappings, and the queries are to use the session's namespace mappings. So,
NodeTypeSchemata provides a method to obtain a Schemata instance given a JcrSession
instance, and this is what is used by QueryManager. Obtaining a session-specific schemata
could be expensive, so this method does a couple of cool tricks to minimize the time
required to build the session-specific schemata. First, if the JcrSession doesn't
actually redefine _any_ namespace mappings, or it doesn't redefine any of the
namespace mappings for namespaces used in the node types, or if the namespace mappings for
those namespaces used in the node types are unchanged, the NodeTypeSchemata can be used as
is. In all other cases, a session-specific Schemata must be created (and must be thown
out if any namespace mappings are changed in the session).
But, unlike the NodeTypeSchemata (which creates views for all node types preemptively),
the session-specific Schemata implementations are lazy. The theory is that a single query
may not involve that many tables, so it's not worth defining all views.
It should also be noted that the Lucene integration stores properties and paths in a
manner that is dependent upon the namespace URI, and not the prefix. The search engine
uses the ExecutionContext in which each query is being performed to transform the Schemata
names back into the prefix-independent form, prior to working with the indexed content.
This whole system appears to work very well so far. Again, work still needs to be done at
the connector-level to support FullTextSearchRequest and AccessQueryRequest types.
That's next.
Add XPath query language support
--------------------------------
Key: DNA-468
URL:
https://jira.jboss.org/jira/browse/DNA-468
Project: DNA
Issue Type: Feature Request
Components: JCR, Query, Search
Affects Versions: 0.5
Reporter: Randall Hauch
Assignee: Randall Hauch
Priority: Blocker
Fix For: 0.7
Create an XPath language binding for our graph model, so that we can parse XPath queries
and produce query models that can then be executed.
This can probably be done as a separate project dependent upon dna-graph, perhaps in the
extensions folder (since it'd be a query language extension). Maybe
"extensions/dna-query-xpath"?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira