[hibernate-dev] Query handling : Antlr 3 versus Antlr 4

Wed Jun 17 08:39:39 EDT 2015

Hi,

> resolving possible unqualified attribute references [...]
> The other option is to have such an API.  This would allow the parser to
ask the consumer whether the given identifier (`ssn`) is a persistent
attribute of any of the available entities.

Yes, having such extension point seems reasonable. OGM would probably use
the same implementation as ORM, but other users may plug in another impl
based on their own type of entity definitions. Would the scope of that
extension point be solely attribute resolution or also handling of other
things such as literals? I'd hope the latter could be done in a unified way
by the parser?

> So here it is a question of whether we want the parser to perform
capability based validations for us.  Again, I'd argue we do as otherwise
each consumer ends up having to do these validations themselves

+1

> We also need to decide how we want to handle polymorphic queries in this
parser.

I am not sure in terms of exact types to be returned, but it'd help if the
returned structure contained information about the actually affected
"tables" (or more generally, "structures" in the query backend), so that
users don't each have to deal with resolving that information wrt. the
current mapping strategy. That need some extension point for specifying the
sub-types of given types. Again, OGM would probably share an impl. with ORM.

--Gunnar

2015-06-13 0:30 GMT+02:00 Steve Ebersole <steve at hibernate.org>:

> I just pushed my initial work on performing some indexing of explicit from
> clauses.  Essentially it takes the from clauses defined in the query and
> begins the massage process.  From here I will start working on handling
> implicit from-clause-elements.  Part of that however will require me being
> able to know what is an entity and whether an IDENTIFIER in the query
> represents an attribute in one (or more) of those from-clause-element
> entities.  IIRC this needs to be different between consumers of this
> project as they do not always have persisters, etc.  Previously we had
> discussed an API that all consumers could provide.  I had developed org.hibernate.hql.ast.common.ParserContext
> and friends as a means to that end.  However, the parser project is
> currently not using those.  On the bright side that means we have a clean
> slate for how to do this! :)
>
> Part of the equation here is how we want certain things to work in terms
> of an input/output paradigm, specifically what the "parser" should output.
> Let's take some examples...
>
> 1) resolving possible unqualified attribute references.  I have mentioned
> this one before.  Take a query like `select ssn from Person`.  There are 2
> main choices here when it comes to processing the `ssn` select
> expression.  It's a matter of precedence.  The first option is to have
> ATTRIBUTE_REFERENCE have the lowest precedence.  Here we'd try other
> possibilities first.  Mainly that would entail trying it as various forms
> of a constant.  If all those attempts fail we would *assume* that the
> expression is an ATTRIBUTE_REFERENCE.  The assumption aspect is
> important.  It means that would not be validated here.  This approach would
> not require any form of API, but it has many downsides: it would require
> more expensive resolution and it could potentially hide a ATTRIBUTE_REFERENCE.
> IMO ATTRIBUTE_REFERENCE should have the higher precedence.  The other
> option is to have such an API.  This would allow the parser to ask the
> consumer whether the given identifier (`ssn`) is a persistent attribute of
> any of the available entities.  If you can't tell, I am in big time favor
> of having such an API :)  But I am open to discussions as to the other side.
>
> 2) Understanding capabilities.  The principle here is understanding what
> is possible in different contexts based on the domain model being queried.
> For example, if we see a query like `select c.addresses.city from Company
> c` and if we know that `c.addresses` resolves to a persistent collection
> then we know that the following de-reference is invalid.  So here it is a
> question of whether we want the parser to perform capability based
> validations for us.  Again, I'd argue we do as otherwise each consumer ends
> up having to do these validations themselves.  This is/was the intent of
> the org.hibernate.hql.ast.TypeDescriptor stuff I had developed there
> originally.  So something like that.
>
> 3) We also need to decide how we want to handle polymorphic queries in
> this parser.  For a query like `from Object` what do we ultimately want
> returned?  Specifically how do we deal with the multi-valued
> java.lang.Object reference in whatever we send back from the parser?
> Because what we send out implies some things we need to send in (API).
>
> Anyway, the from clause parser is looking nice so far.
>
>
> On Thu, Jun 11, 2015 at 9:50 AM Steve Ebersole <steve at hibernate.org>
> wrote:
>
>> In the re-write case we will have to decide between 2 courses.
>>
>> The problem lies in the fact that the listeners and visitors expect a
>> tree specifically generated from the grammar that generated them.  For HQL,
>> say we have one grammar named HqlParser (like in the poc).  The
>> listeners and visitors built from HqlParser specifically expect the tree
>> from HqlParser.  The trees are typed.  Attempting to use trees from one
>> grammar in the listener/visitor from another grammar will not work.
>>
>> As far as re-writing that effectively means 2 options.
>>
>> If we want to have a second grammar for the "semantic query" we are going
>> to have to re-write the entire tree based on that second grammar.  I had
>> thought we might be able to mix them. But that was based on my Antlr 2/3
>> understanding where the trees are de-typed.  That approach will not work in
>> Antlr 4.  Its not a huge deal, but worth mentioning.
>>
>> The other option would be to encode the "semantically correct" rules into
>> the original grammar (HqlParser) as a higher precedence than their parse
>> tree corollary.  This gets a little fugly.  Consider again the `select
>> c.headquarters.state.code` fragment.  During the parse phase we need to
>> accept any dotIdentifierPath as a selectable item.  We simply do not
>> know during parse what that represents.  So for the parse phase, a
>> `selectItem` rule (overly simplified) might look like:
>>
>> selectItem : dotIdentifierPath;
>>
>> In this approach we would re-write the tree "in place" during semantic
>> analysis.  So at some point we know that the given dotIdentifierPath
>> represents a reference to a persistent attribute.  So we'd alter that rule
>> to look contain alternatives for each semantic possibility:
>>
>> selectItem : attributeReference | javaConstant | dotIdentifierPath;
>>
>> The rules attributeReference and javaConstant would never match during
>> the parse phase.
>>
>> Again, this is fugly imo.
>>
>>
>> On Wed, Jun 10, 2015 at 10:49 AM Gunnar Morling <gunnar at hibernate.org>
>> wrote:
>>
>>> 2015-06-09 22:11 GMT+02:00 Steve Ebersole <steve at hibernate.org>:
>>>
>>>> So today I spent some time cleaning up the basic HQL parser.
>>>> Personally I think it would be best if our 2 proof-of-concepts could share
>>>> that first grammar.  IMO that would make the differences between the 2
>>>> approaches more apparent.  I will push those changes soon.
>>>>
>>>
>>> Ok, I can try and work on a PoC for the decorator-based approach. Not
>>> sure what the outcome will be, as your's, my understanding of it is roughly
>>> vague and high-level. But if it fails we can maybe settle for the Antlr3
>>> approach with the better feeling of having investigated the alternative.
>>>
>>> Can you let me know when you have pushed your stuff? What does it do,
>>> render the query below as SQL?
>>>
>>> It is not complete yet.  But it covers most cases.
>>>>
>>>>
>>>> On Tue, Jun 9, 2015 at 10:47 AM Steve Ebersole <steve at hibernate.org>
>>>> wrote:
>>>>
>>>>> On Tue, Jun 9, 2015 at 10:14 AM Gunnar Morling <gunnar at hibernate.org>
>>>>> wrote:
>>>>>
>>>>> Yes, indeed I cheated here a bit. Probably it should be the following
>>>>>> instead:
>>>>>>
>>>>>>       [DOT] ---> AttributeReference("<gen:1>", "code")
>>>>>>         [DOT]
>>>>>>           [DOT]
>>>>>>             [IDENT, "c"]
>>>>>>             [IDENT, "headquarters"]
>>>>>>           [IDENT, "state"]
>>>>>>         [IDENT, "code"]
>>>>>>
>>>>>
>>>>> How do you identify one DOT as referring to something else versus any
>>>>> of the other DOTs?
>>>>>
>>>>>
>>>>> Or maybe something like:
>>>>>>
>>>>>>     [SELECTION_PARTICLE] ---> AttributeReference("<gen:1>", "code")
>>>>>>       [DOT]
>>>>>>         [DOT]
>>>>>>           [DOT]
>>>>>>             [IDENT, "c"]
>>>>>>             [IDENT, "headquarters"]
>>>>>>           [IDENT, "state"]
>>>>>>         [IDENT, "code"]
>>>>>>
>>>>>> Where SELECTION_PARTICLE would be an abstract representation of
>>>>>> anything that can be selected (attribute ref, Java literal ref etc.) and
>>>>>> the decorator element added in a later pass would specify its actual
>>>>>> semantics based on the alias definitions etc. discovered before.
>>>>>>
>>>>>> Bottom line being, that decorators providing semantics are attached
>>>>>> to the nodes of the parse tree based on information gathered in previous
>>>>>> passes.
>>>>>>
>>>>>
>>>>> And what does that look like in real, practical terms?   That's what
>>>>> concerns me :)  I don't know, and you are just speaking in generalities.
>>>>> So what does that look like in practice?
>>>>>
>>>>>
>>>>> Not into the tree itself, but we can encode that semantic resolution
>>>>>> into decorators (node attachments).
>>>>>>
>>>>>
>>>>> Again, what do these "node attachments" look like in  practice?  I
>>>>> have zero clue and based on my discussions with Antlr folks its not
>>>>> pretty.  Maybe I misunderstand.  But if you are proposing this approach, I
>>>>> would think you should have an idea of how it would look
>>>>> practically-speaking :)  Maybe this is the way to go, I just need to see
>>>>> what this looks like.
>>>>>
>>>>>
>>>>> Yes, they would deal with [[DOT][IDENT]] nodes but would benefit from
>>>>>> semantic decorators attached previously. During rendering I would expect
>>>>>> mainly those attachments to be of importance for the query creation.
>>>>>>
>>>>>> Admittedly, that's all quite "high level", but so far it seems doable
>>>>>> to me in principle. It doesn't answer of course actual tree transformations
>>>>>> such as (x + 0) -> x. I am not sure whether there are cases like this.
>>>>>>
>>>>>
>>>>> Yes it is all extremely high-level.  That is my concern.  Principle
>>>>> and practice are often 2 very different things.
>>>>>
>>>>> I plan on spending some time taking my hibernate-antlr4-poc project
>>>>> and expanding it specifically to try the "second grammar" approach and see
>>>>> what practical difficulties that shakes out.  Would you be willing to do
>>>>> the same for this decorated approach?  Then we'd have concrete stuff to
>>>>> compare and base a decision on.
>>>>>
>>>>> Also, `(x + 0) -> x` is actually a quite simple case.   Ours is much
>>>>> more complicated.  In analyzing `c.headquarters.state.code` in the SELECT
>>>>> clause we need a few things to happen in a few different parts of the
>>>>> tree.  We need:
>>>>> 1) `c.headquarters.state` to be transformed into 2 "implicit joins"
>>>>> in the FROM clause
>>>>> 2) we need to replace `c.headquarters.state.code` as
>>>>> `{implicit-alias}.code` in the SELECT
>>>>> 3) register `c.headquarters` and `c.headquarters.state` as implicit
>>>>> join paths (additional implicit joins using these paths should re-use the
>>>>> same joins).
>>>>>
>>>>