[dna-dev] New query engine added to trunk
Randall Hauch
rhauch at redhat.com
Mon Sep 21 16:28:16 EDT 2009
Today I committed to trunk the initial version of the query engine
functionality into 'dna-graph', and added a new 'dna-search' project
that extends the engine to use Lucene indexes. While the engine is
fairly complete, it has not yet been integrated into the connectors
nor into our JCR implementation, so there still is a lot of work to
do. However, I wanted to get this into trunk and solicit feedback.
So, if you're able and interested, please take a look at trunk and let
me know what you think. All of this work is being done under https://jira.jboss.org/jira/browse/DNA-467
, so please add comments there. Feel free to log new issues if you
find any bugs or problems.
Overview
The design of the query engine is to provide a way to define queries
(in a various languages) and have the engine use a parser to transform
them into an Abstract Query Model (equivalent to the abstract syntax
tree for a query), and to then plan, validate, optimize and process
the portions atomic portions of the query plan. This whole system was
designed to be easily reused as-is or extended and customized to
provide the desired behavior. But because this is a generalized query
engine capable of query over a 'graph', the actual processing of the
atomic portions of the queries must be provided when the engine is
used. For example, the engine is capable of evaluating criteria and
performing all types of joins and set operations (e.g., union,
intersect, except), but does not know how to process the low-level
atomic queries at the bottom of the query plan, and instead delegates
this processing to a QueryProcessor framework.
A number of packages were added to 'dna-graph', including:
org.jboss.dna.graph.query - The QueryEngine class, the QueryResults
interface, and a QueryBuilder class that can be used to
programmatically build a query using an internal domain-specific
language (DSL) style. This package represents the main public touch
points to the engine.
org.jboss.dna.graph.query.model - The classes and interfaces that make
up the Abstract Query Model. These are based on JSR-283's AQM
interfaces, though there are some differences (since these are at the
Graph API level rather than our JCR implementation, and because the
JCR SQL2 representation does not support some common and useful parts
of SQL, such as UNION/INTERSECT/EXCEPT. These classes can be
instantiated directly (e.g., by a parser), or using a QueryBuilder.
org.jboss.dna.graph.query.parse - The QueryParser interface that
defines how a string-representation of a query is to be parsed and
converted into an abstract query model representation. Currently
there is a single SqlQueryParser implementation, though an
XPathQueryParser will be added soon.
org.jboss.dna.graph.query.plan - A component to accept an AQM
representation and create a canonical query plan representation.
org.jboss.dna.graph.query.optimize - A rule-based optimizer that
analyzes and transforms a query plan.
org.jboss.dna.graph.query.validate - A small framework capable of
defining the schemata of 'tables' and 'columns' that can be queried.
org.jboss.dna.graph.query.process - A framework for defining how the
low-level atomic queries at the bottom of the query plans are to be
executed, along with the various kinds of processing components for
the different kinds of plan elements.
We plan to rely mostly upon Lucene for our indexing and searching, so
we've created a specialization of the query engine with some custom
optimizer rules and a processor module that is able to use the Lucene
indexes. We also added a number of methods for creating, managing and
updating the Lucene indexes (by reading content or based upon
events). Because these were dependent upon the Lucene library, we
decided to put them in a new 'dna-search' project.
A lot of tests were included as part of this commit. Also, the
JavaDoc should be relatively complete, and should help explain how the
various components work and what the expected behavior is.
Status
At this point, the general query engine and the Lucene-based
specialization are for the most part complete and thoroughly tested,
but these components need to be integrated into the larger connector
framework and JCR implementation. All of the Lucene index generation
and management needs to be coordinated and integrated with the
administration and lifecycle of the DNA connectors and JCR engine.
Additionally, while there are methods to create/update the indexes,
the ability to extract text from binary property values still needs to
be added. In short, there still is a lot of outstanding work. Over
the next few days, I'll be breaking out these tasks into JIRA issues.
Best regards,
Randall
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/dna-dev/attachments/20090921/d1ead2cc/attachment.html
More information about the dna-dev
mailing list