Overview
The design of the query engine is to provide a way to define queries (in a various languages) and have the engine use a parser to transform them into an Abstract Query Model (equivalent to the abstract syntax tree for a query), and to then plan, validate, optimize and process the portions atomic portions of the query plan. This whole system was designed to be easily reused as-is or extended and customized to provide the desired behavior. But because this is a generalized query engine capable of query over a 'graph', the actual processing of the atomic portions of the queries must be provided when the engine is used. For example, the engine is capable of evaluating criteria and performing all types of joins and set operations (e.g., union, intersect, except), but does not know how to process the low-level atomic queries at the bottom of the query plan, and instead delegates this processing to a QueryProcessor framework.
A number of packages were added to 'dna-graph', including:
- org.jboss.dna.graph.query - The QueryEngine class, the QueryResults interface, and a QueryBuilder class that can be used to programmatically build a query using an internal domain-specific language (DSL) style. This package represents the main public touch points to the engine.
- org.jboss.dna.graph.query.model - The classes and interfaces that make up the Abstract Query Model. These are based on JSR-283's AQM interfaces, though there are some differences (since these are at the Graph API level rather than our JCR implementation, and because the JCR SQL2 representation does not support some common and useful parts of SQL, such as UNION/INTERSECT/EXCEPT. These classes can be instantiated directly (e.g., by a parser), or using a QueryBuilder.
- org.jboss.dna.graph.query.parse - The QueryParser interface that defines how a string-representation of a query is to be parsed and converted into an abstract query model representation. Currently there is a single SqlQueryParser implementation, though an XPathQueryParser will be added soon.
- org.jboss.dna.graph.query.plan - A component to accept an AQM representation and create a canonical query plan representation.
- org.jboss.dna.graph.query.optimize - A rule-based optimizer that analyzes and transforms a query plan.
- org.jboss.dna.graph.query.validate - A small framework capable of defining the schemata of 'tables' and 'columns' that can be queried.
- org.jboss.dna.graph.query.process - A framework for defining how the low-level atomic queries at the bottom of the query plans are to be executed, along with the various kinds of processing components for the different kinds of plan elements.
We plan to rely mostly upon Lucene for our indexing and searching, so we've created a specialization of the query engine with some custom optimizer rules and a processor module that is able to use the Lucene indexes. We also added a number of methods for creating, managing and updating the Lucene indexes (by reading content or based upon events). Because these were dependent upon the Lucene library, we decided to put them in a new 'dna-search' project.
A lot of tests were included as part of this commit. Also, the JavaDoc should be relatively complete, and should help explain how the various components work and what the expected behavior is.
Status
At this point, the general query engine and the Lucene-based specialization are for the most part complete and thoroughly tested, but these components need to be integrated into the larger connector framework and JCR implementation. All of the Lucene index generation and management needs to be coordinated and integrated with the administration and lifecycle of the DNA connectors and JCR engine. Additionally, while there are methods to create/update the indexes, the ability to extract text from binary property values still needs to be added. In short, there still is a lot of outstanding work. Over the next few days, I'll be breaking out these tasks into JIRA issues.
Best regards,
Randall