[dna-dev] New query engine added to trunk

Mon Sep 21 16:28:16 EDT 2009

Today I committed to trunk the initial version of the query engine  
functionality into 'dna-graph', and added a new 'dna-search' project  
that extends the engine to use Lucene indexes.  While the engine is  
fairly complete, it has not yet been integrated into the connectors  
nor into our JCR implementation, so there still is a lot of work to  
do.  However, I wanted to get this into trunk and solicit feedback.

So, if you're able and interested, please take a look at trunk and let  
me know what you think.  All of this work is being done under https://jira.jboss.org/jira/browse/DNA-467 
, so please add comments there.  Feel free to log new issues if you  
find any bugs or problems.

Overview

The design of the query engine is to provide a way to define queries  
(in a various languages) and have the engine use a parser to transform  
them into an Abstract Query Model (equivalent to the abstract syntax  
tree for a query), and to then plan, validate, optimize and process  
the portions atomic portions of the query plan.  This whole system was  
designed to be easily reused as-is or extended and customized to  
provide the desired behavior.  But because this is a generalized query  
engine capable of query over a 'graph', the actual processing of the  
atomic portions of the queries must be provided when the engine is  
used.  For example, the engine is capable of evaluating criteria and  
performing all types of joins and set operations (e.g., union,  
intersect, except), but does not know how to process the low-level  
atomic queries at the bottom of the query plan, and instead delegates  
this processing to a QueryProcessor framework.

A number of packages were added to 'dna-graph', including:
org.jboss.dna.graph.query - The QueryEngine class, the QueryResults  
interface, and a QueryBuilder class that can be used to  
programmatically build a query using an internal domain-specific  
language (DSL) style.  This package represents the main public touch  
points to the engine.
org.jboss.dna.graph.query.model - The classes and interfaces that make  
up the Abstract Query Model.  These are based on JSR-283's AQM  
interfaces, though there are some differences (since these are at the  
Graph API level rather than our JCR implementation, and because the  
JCR SQL2 representation does not support some common and useful parts  
of SQL, such as UNION/INTERSECT/EXCEPT.  These classes can be  
instantiated directly (e.g., by a parser), or using a QueryBuilder.
org.jboss.dna.graph.query.parse - The QueryParser interface that  
defines how a string-representation of a query is to be parsed and  
converted into an abstract query model representation.  Currently  
there is a single SqlQueryParser implementation, though an  
XPathQueryParser will be added soon.
org.jboss.dna.graph.query.plan - A component to accept an AQM  
representation and create a canonical query plan representation.
org.jboss.dna.graph.query.optimize - A rule-based optimizer that  
analyzes and transforms a query plan.
org.jboss.dna.graph.query.validate - A small framework capable of  
defining the schemata of 'tables' and 'columns' that can be queried.
org.jboss.dna.graph.query.process - A framework for defining how the  
low-level atomic queries at the bottom of the query plans are to be  
executed, along with the various kinds of processing components for  
the different kinds of plan elements.

We plan to rely mostly upon Lucene for our indexing and searching, so  
we've created a specialization of the query engine with some custom  
optimizer rules and a processor module that is able to use the Lucene  
indexes.  We also added a number of methods for creating, managing and  
updating the Lucene indexes (by reading content or based upon  
events).  Because these were dependent upon the Lucene library, we  
decided to put them in a new 'dna-search' project.

A lot of tests were included as part of this commit.  Also, the  
JavaDoc should be relatively complete, and should help explain how the  
various components work and what the expected behavior is.

Status

At this point, the general query engine and the Lucene-based  
specialization are for the most part complete and thoroughly tested,  
but these components need to be integrated into the larger connector  
framework and JCR implementation.  All of the Lucene index generation  
and management needs to be coordinated and integrated with the  
administration and lifecycle of the DNA connectors and JCR engine.   
Additionally, while there are methods to create/update the indexes,  
the ability to extract text from binary property values still needs to  
be added. In short, there still is a lot of outstanding work.  Over  
the next few days, I'll be breaking out these tasks into JIRA issues.

Best regards,

Randall
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/dna-dev/attachments/20090921/d1ead2cc/attachment.html