[dna-issues] [JBoss JIRA] Commented: (DNA-467) Add search/query support to the graph API

Friday, 18 September 2009

    [
https://jira.jboss.org/jira/browse/DNA-467?page=com.atlassian.jira.plugin...
] 

Randall Hauch commented on DNA-467:
-----------------------------------

Actually, the work is pretty far along, and I'm actively working on it (now that 0.6
is out the door).  I'm actually working on a (local) branch, and am hoping to commit
what I have to trunk relatively soon.

The design is based on the approach that JCR 2.0 has taken: multiple query languages are
parsed into a single Abstract Syntax Tree (a "query model"), which is then
executed by single engine.  This means that 'dna-graph' will have a QueryEngine
component parses string queries (given the name of the language, and a registered parser
for that language that transforms the string query into an AST representation), plans and
optimizes the query, constructs a processing plan, and delegates to a
'QueryProcessor' component to execute the processing plan and generate the
results.  These results are then returned to the caller of the QueryEngine.

This design allows us to do a couple of different things.  First, we can create multiple
language parsers, yet have one engine.  Second, the planning and optimization components
can either be customized or replaced, so the whole query engine framework can be reused. 
Third, the query engine is completely independent of the execution, though the
capabilities of the QueryProcessor can be taken into account during planning/optimization.
 We're actually providing a QueryProceessor implementation that uses Lucene indexes,
and with this a way to populate the Lucene indexes by scanning the graph content and/or
updating the indexes based upon events.

We'll actually plug this query engine into the DNA JCR system in a couple of ways. 
Each connector will optionally be able to support querying its content (either by pushing
down the queries to the underlying system, or by having a Lucene index and just
instantiating the query engine within the connector).  The federation connector will
support queries by pushing down queries to the connectors being federated, and it will be
able to manage a Lucene index for each connector that doesn't support queries.  Then,
since the JCR implementation will basically just implemented on top of the federated
connector, it just has to delegate the queries down to the federated connector.

Currently the following components are pretty complete (and have a lot of unit tests,
though the tests are probably not sufficient): the QueryEngine, the AST classes, the query
planner, the rule-based optimizer, and the QueryProcessor framework.  The Lucene-based
QueryProcessor implementation is at least 50% done and what remains is mostly integration.
 I'll be done with the JCR SQL2 parser (actually, the parser is complete and I have
many unit tests, but only 50% of the parser is tested) in another day or so.

What's left is:
- commit the current state (I've been waiting to make sure all the pieces look like
they'll fit together, and I'm probably no more than a few days away)
- write an XPath parser
- start integration testing of the query engine with the Lucene-based processor
- add query support to the connector API
- change some of our connectors to support queries (mostly just integrating the query
engine and Lucene processor), though the federation connector will require some extra
customization
- add support for text extraction
- expose through our JCR implementation
- make sure the events work and are wired correctly, as well as set up mechanisms to
manage the scanning and re-indexing of all content

Another approach is to not yet put add query support to the connector API, but instead to
just have our JCR implementation own the QueryEngine and Lucene-based processor.  I'm
not sure if this will save much time or effort (mostly because of federation), but it is
an option.

This is still a fair amount of work, but as you can see I think this design will be pretty
powerful and will allow the connectors to really control how they handle queries.  When
it's committed to trunk, we'll be able to divvy up the tasks.

WDYT so far?  Yeah, this is a really rough description, but I do hope to document this
more and blog about some of the details.  

...
 Add search/query support to the graph API
 -----------------------------------------

                 Key: DNA-467
                 URL: https://jira.jboss.org/jira/browse/DNA-467
             Project: DNA
          Issue Type: Feature Request
          Components: API, Connectors, Graph
    Affects Versions: 0.5
            Reporter: Randall Hauch
            Assignee: Randall Hauch
            Priority: Critical
             Fix For: 0.7

 The graph system needs to be able to push a search/query down to a connector, and we need
to develop a query model (in the form of concrete classes, probably immutable). 
Connectors can optionally support queries, and if they do they will process the query and
return the results.
 Particular language bindings would be put on top of this query model.  For example, JCR
1.0 defines an XPath language.  Support for each language would parse a query in that
language and produce a query model. 
-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

[dna-issues] [JBoss JIRA] Commented: (DNA-467) Add search/query support to the graph API