2015-06-09 16:02 GMT+02:00 Steve Ebersole <steve(a)hibernate.org>:
On Tue, Jun 9, 2015 at 3:57 AM Gunnar Morling
<gunnar(a)hibernate.org>
wrote:
What I like about the Antlr4 approach is the fact that you don't need a
> set of several quite similar grammars as you'd do with the tree
> transformation approach. Also using the current version of Antlr instead of
> 3 appears attractive to me wrt. to bugfixes and future development of the
> tool.
>
Understand that we would all "like" to use Antlr 4 for many reasons,
myself included. But it has to work for our needs. There are just so many
open questions (for me) as to whether that is the case.
Sure, that's what we need to find out.
> Based on what I understand from your discussions on the Antlr
mailing
> list, I'd assume the parse tree and the external state it references to
> look roughly like so (---> indicates a reference to state built up during
> sub-sequential walks, maybe in some external "table", maybe stored within
> the (typed) tree nodes themselves):
>
> [QUERY]
> [SELECT]
> [ATTRIBUTE_REF] ---> AttributeReference("<gen:1>",
"code")
> [DOT]
> [DOT]
> [DOT]
> [IDENT, "c"]
> [IDENT, "headquarters"]
> [IDENT, "state"]
> [IDENT, "code"]
> [FROM]
> [SPACE]
> [SPACE_ROOT] ---> InnerJoin( InnerJoin ( PersisterRef( "c",
> "com.acme.Customer" ), TableRef ( "<gen:0>",
"headquarters" ) ), TableRef (
> "<gen:1>", "state" ) ) )
> [IDENT, "Customer"]
> [IDENT, "c"]
>
> I.e. instead of transforming the tree itself, the state required for
> output generation would be added as "decorators" to nodes of the original
> parse tree itself. That's just the basic idea as I understand it, surely
> the specific types of the decorator elements (AttributeReference,
> InnerJoin etc.) may look different. During "query rendering" we'd have
> to inspect the decorator state of the parse tree nodes and interpret it
> accordingly.
>
Well, see you do something "tricky" here that is actually one of my
concerns with Antlr 4 :) You mix a parse tree and a semantic tree.
Specifically this part of your tree:
[ATTRIBUTE_REF] ---> AttributeReference("<gen:1>", "code")
[DOT]
[DOT]
[DOT]
[IDENT, "c"]
[IDENT, "headquarters"]
[IDENT, "state"]
[IDENT, "code"]
The idea of "ATTRIBUTE_REF" is a semantic concept. The DOT-IDENT struct
is your parse tree. Antlr 4 does allow mixing these based on left
refactoring of the rules, *but* there is an assumption there... that the
branches in such a left-refactored rule can be resolved unambiguously. I
am not so sure we can do that.
Yes, indeed I cheated here a bit. Probably it should be the following
instead:
[DOT] ---> AttributeReference("<gen:1>", "code")
[DOT]
[DOT]
[IDENT, "c"]
[IDENT, "headquarters"]
[IDENT, "state"]
[IDENT, "code"]
Or maybe something like:
[SELECTION_PARTICLE] ---> AttributeReference("<gen:1>",
"code")
[DOT]
[DOT]
[DOT]
[IDENT, "c"]
[IDENT, "headquarters"]
[IDENT, "state"]
[IDENT, "code"]
Where SELECTION_PARTICLE would be an abstract representation of anything
that can be selected (attribute ref, Java literal ref etc.) and the
decorator element added in a later pass would specify its actual semantics
based on the alias definitions etc. discovered before.
Bottom line being, that decorators providing semantics are attached to the
nodes of the parse tree based on information gathered in previous passes.
In simpler terms... Antlr 4 needs you to be able to apply those semantic
resolutions (attributeRef versus javaLiteralRef versus
oraclePackagedProcedure versus ...) up front.
So take the input that produces that tree: select c.headquarters.state.code
Syntactically that dot-ident structure could represent any number of
things. And semantically we just simply do not have enough information.
We *could* eliminate it being a javaLiteralRef if we
made javaLiteralRef the highest precedence branch in the left-factored rule
that produces this, but that has serious drawbacks:
1) we are checking each and every dot-ident path as a possible
javaLiteralRef first, which means reflection (perf)
2) it is not a fool-proof approach. The problem is that javaLiteralRef
should really have very low precedence. There are conceivably cases where
the expression could resolve to either a javaLiteralRef or an attributeRef,
and in those cases the resolution should be routed through attributeRef not
javaLiteralRef
The ultimate problem there is that we cannot possibly know much of the
information we need to know for proper semantic analysis until after we
have seen the FROM clause. We got around that with older Antlr versions
specifically via tree-rewriting: we re-write the tree to "hoist" FROM
before the other clauses.
So I believe the issue of alias resolution and implicit join conversion
> could be handled without tree transformations (at least conceptually, I
> could not code an actual implementation out of my head right away). But
> maybe there are other cases where tree transformations are more strictly
> needed?
>
Well I just illustrated above how that is actually a problem that does
need either tree transformations or at least delayed processing of the
sub-tree.
Also get out of your head this idea that we can encode the semantic
resolution of dot-ident paths into the tree. We simply will not be able to
(I believe).
Not into the tree itself, but we can encode that semantic resolution into
decorators (node attachments).
And I think that starts to show my reservations about Antlr 4.
Basically
every pass over this tree we will need to deal with [[DOT][IDENT]] as
opposed to [ATTRIBUTE_REFERENCE]
Yes, they would deal with [[DOT][IDENT]] nodes but would benefit from
semantic decorators attached previously. During rendering I would expect
mainly those attachments to be of importance for the query creation.
Admittedly, that's all quite "high level", but so far it seems doable to me
in principle. It doesn't answer of course actual tree transformations such
as (x + 0) -> x. I am not sure whether there are cases like this.