We've had a few discussions about this in the past. As 5.0 is getting
close to Final (next week), its time to start contemplating our next major
tasks. The consensus pick for that has been the idea of a "unified SQL
generation engine" along with a shared project for the semantic analysis of
HQL/JPQL (and recently it was decided to include JPA Criteria
interpretation here as well).
The central premise is this. Take the roughly 6 or 7 different top-level
ways Hibernate generates SQL and combine that into one "engine" based on
the input of a "semantic tree". The mentioned HQL/JPQL/Criteria shared
project will be one producer of such semantic trees. Others would include
persisters (for insert/update/delete requests) and loaders (for load
requests).
We have a lot of tasks for this overall goal still remaining.
We still have to finalize the design for the HQL/JPQL/Criteria to semantic
tree translator. One option is to proceed with the Antlr 4 based approach
I started a PoC for. John has been helping me some lately with that. The
first task here is to come to a consensus whether Antlr 4 is the way we
want to proceed here. We've been over the pros and cons before in detail.
In summary, there is a lot to love with Antlr 4. Our grammar for HQL
recognition and semantic tree building is very simple and elegant imo. The
drawback is clearly the lack of tree walking, meaning that we are
responsible for writing by hand our walker for the semantic tree. In fact
multiple, since each consumer (orm, ogm, search) would need to write their
own. And if we decide to build another AST while walking the semantic
tree, we'd end up having to hand-write yet another walker for those.
What I mean by that last part is that there are 2 ways we might choose to
deal with the semantic tree. For the purpose of discussion, let's look at
the ORM case. The first approach is to simply generate the SQL as we walk
the semantic tree; this would be a 2 phase interpretation approach (input
-> semantic tree -> SQL). That works in many cases. However it breaks
down in other cases. This is exactly the approach our existing HQL
translator uses. The other approach is to use a 3-phase translation (input
-> semantic-tree -> semantic-SQL-tree(s) -> SQL). This gives a hint to one
of the major problems. One source "semantic" query will often correspond
to multiple SQL queries; that is hard to manage in the 2-phase approach.
And not to mention integrating things like follow-on fetches and other
enhancements we want to gain from this. My vote is definitely for 3 or
more phases of interpretation. The problem is that this is exactly where
Antlr 4 sort of falls down.
So first things first... we need to decide on Antlr 3 versus Antlr 4
(versus some other parser solution).
Next, on the ORM side (every "backend" can decide this individually) we
need to decide on the approach for semantic-tree to SQL translation, which
somewhat depends on the Antlr 3 versus Antlr 4 decision.
We really need to decide these things ASAP and get moving on them as soon
as ORM 5.0 is finished.
Also, this is a massive undertaking with huge gain potentials for not just
ORM. As such we need to understand who will be working on this. Sanne,
Gunnar... I know y'all have a vested interest and a desire to work on it.
John, I know the same is true for you. Andrea? Have you had a chance to
look over the poc and/or get more familiar with Antlr?