[hibernate-dev] Query handling : Antlr 3 versus Antlr 4

Tue Jun 9 10:51:21 EDT 2015

On Tue, Jun 9, 2015 at 5:50 AM Sanne Grinovero <sanne at hibernate.org> wrote:

Do you mean that you would be ok to "navigate" all the [DOT] nodes to
> get to the decorated attachments?
> In that case while you might be fine to translate each fragment into a
> different fragment, it's not straight forward to transform it into a
> different structure, say with sub-trees in different orders or nodes
> which don't have a 1:1 match.
> It's of course doable if you are filling in your own builder while
> navigating these (like we do with the Lucene DSL output), but it
> doesn't help you with multiple phases which is what Steve is pointing
> out.
>
> I would highly prefer to feed the semantic representation of the tree
> to our query generating backends, especially so if we could all share
> the same initial smart phases to do some basic validations and
> optimisations DRY. But then the consuming backends will likely have
> some additional validations and optimisations which need to be
> backend-specific (dialect-specific or technology specific in case of
> OGM).
>
> Steve, you mentioned that ANTLR4 handles transformations but only
> when it's the last step.  What prevents us to chain multiple such

transformations, applying the "last step" approach multiple times?
> I didn't look at it at all, so take this just as an high level,
> conceptual question. I guess one would need to clearly define all
> intermediate data types rather than have ANTLR generate them like it
> does with tokens, but that could be the lesser trouble?
>

The "problem" is that the Antlr listeners/visitors are always based on the
original parse tree.  The transformation is not the concern.  The concern
is how you match up the listener/visitor calls based on the original parse
tree into actions on the semantic tree.

The thing to keep in mind is that the Antlr listeners/visitors are based on
that parse tree.  Going back to the Customer-headquarters query and the
original parse and semantic trees, given a call to process the "dot node"
that represents the root of the select expression, how do you "map" that to
the attributeReference node in the semantic tree?

Once the trees start to deviate you have basically lost the ability to
drive processing of that "subsequent tree structure" based on Antlr.  At
least easily.  There are probably some approaches we could use to allow
that.  Off the top of my head, I could see assigning each and every node in
the parse tree a uid and then maintaining a "node replacement map" based on
those uids.  But that all seems like a lot of work.

Another option I have seen Antlr folks mention is to write a second grammar
defined based on your semantic tree.  It would produce listeners/visitors
based on the structure we ultimately expect in the semantic tree.  Antlr
would just not provide use the tree re-writing anymore; we'd do that
manually.  Other than that, everything else (10,000 foot view) should
remain the same.  Of course devils's in the details :)

We could also, which I think is what you are suggesting Sanne, have the
query parser project produce the semantic tree and then it would just be up
to the consumers of that semantic tree to do with it whats it wants.
Combining this with the idea of a second grammar for the semantic tree, we
could say that the query parser project provides:
1) Antlr 4 listeners and visitors based on that semantic tree grammar
2) An API for converting HQL to such semantic trees.