[hibernate-dev] Query handling : Antlr 3 versus Antlr 4

Tue Jun 9 10:02:50 EDT 2015

On Tue, Jun 9, 2015 at 3:57 AM Gunnar Morling <gunnar at hibernate.org> wrote:

What I like about the Antlr4 approach is the fact that you don't need a set
> of several quite similar grammars as you'd do with the tree transformation
> approach. Also using the current version of Antlr instead of 3 appears
> attractive to me wrt. to bugfixes and future development of the tool.
>

Understand that we would all "like" to use Antlr 4 for many reasons, myself
included.  But it has to work for our needs.  There are just so many open
questions (for me) as to whether that is the case.

>
> Based on what I understand from your discussions on the Antlr mailing
> list, I'd assume the parse tree and the external state it references to
> look roughly like so (---> indicates a reference to state built up during
> sub-sequential walks, maybe in some external "table", maybe stored within
> the (typed) tree nodes themselves):
>
> [QUERY]
>   [SELECT]
>     [ATTRIBUTE_REF] ---> AttributeReference("<gen:1>", "code")
>       [DOT]
>         [DOT]
>           [DOT]
>             [IDENT, "c"]
>             [IDENT, "headquarters"]
>           [IDENT, "state"]
>         [IDENT, "code"]
>   [FROM]
>     [SPACE]
>       [SPACE_ROOT] ---> InnerJoin( InnerJoin ( PersisterRef( "c",
> "com.acme.Customer" ), TableRef ( "<gen:0>", "headquarters" ) ), TableRef (
> "<gen:1>", "state" ) ) )
>         [IDENT, "Customer"]
>         [IDENT, "c"]
>
> I.e. instead of transforming the tree itself, the state required for
> output generation would be added as "decorators" to nodes of the original
> parse tree itself. That's just the basic idea as I understand it, surely
> the specific types of the decorator elements (AttributeReference,
> InnerJoin etc.) may look different. During "query rendering" we'd have to
> inspect the decorator state of the parse tree nodes and interpret it
> accordingly.
>

Well, see you do something "tricky" here that is actually one of my
concerns with Antlr 4 :)  You mix a parse tree and a semantic tree.
Specifically this part of your tree:

 [ATTRIBUTE_REF] ---> AttributeReference("<gen:1>", "code")
      [DOT]
        [DOT]
          [DOT]
            [IDENT, "c"]
            [IDENT, "headquarters"]
          [IDENT, "state"]
        [IDENT, "code"]

 The idea of "ATTRIBUTE_REF" is a semantic concept.  The DOT-IDENT struct
is your parse tree.  Antlr 4 does allow mixing these based on left
refactoring of the rules, *but* there is an assumption there... that the
branches in such a left-refactored rule can be resolved unambiguously.  I
am not so sure we can do that.

In simpler terms... Antlr 4 needs you to be able to apply those semantic
resolutions (attributeRef versus javaLiteralRef versus
oraclePackagedProcedure versus ...) up front.

So take the input that produces that tree: select c.headquarters.state.code

Syntactically that dot-ident structure could represent any number of
things.  And semantically we just simply do not have enough information.
We *could* eliminate it being a javaLiteralRef if we
made javaLiteralRef the highest precedence branch in the left-factored rule
that produces this, but that has serious drawbacks:
1) we are checking each and every dot-ident path as a possible
javaLiteralRef first, which means reflection (perf)
2) it is not a fool-proof approach.  The problem is that javaLiteralRef
should really have very low precedence.  There are conceivably cases where
the expression could resolve to either a javaLiteralRef or an attributeRef,
and in those cases the resolution should be routed through attributeRef not
javaLiteralRef

The ultimate problem there is that we cannot possibly know much of the
information we need to know for proper semantic analysis until after we
have seen the FROM clause.  We got around that with older Antlr versions
specifically via tree-rewriting: we re-write the tree to "hoist" FROM
before the other clauses.

So I believe the issue of alias resolution and implicit join conversion
> could be handled without tree transformations (at least conceptually, I
> could not code an actual implementation out of my head right away). But
> maybe there are other cases where tree transformations are more strictly
> needed?
>

Well I just illustrated above how that is actually a problem that does need
either tree transformations or at least delayed processing of the sub-tree.

Also get out of your head this idea that we can encode the semantic
resolution of dot-ident paths into the tree.  We simply will not be able to
(I believe).  And I think that starts to show my reservations about Antlr
4.  Basically every pass over this tree we will need to deal with
[[DOT][IDENT]] as opposed to [ATTRIBUTE_REFERENCE]