Hi all,
I'm interested in knowing whether some behavior like the following has
been considered for Criteria, and what pitfalls or complexities to
watch out for. I'm considering forking hibernate on github to try
this out, but I'd like to hear if anyone has some advice on this
problem.
It would be nice if the following worked by default:
* create a JPA entity - let's call it a Cat. The Cat has a
one-to-many collection of Kittens with a JPA FetchType=LAZY.
* create a Criteria to retrieve a list of Cat
* add a restriction to the Criteria that filters on some property of
the subcollection - let's use kitten.Name. I'd like to retrieve all
Cats who have at least one kitten older than 2 years.
* page the collection: I have over 1 million cats who have a kitten
older than 2, but I'd just like to return cats 11-15, so I
setFirstResult to 11 and setMaxResults to 5.
* result: out of the 1 million cats who have a kitten older than 2,
I've retrieved cats 11-15.
What actually happens: hibernate generates queries like the following
(simplified, assuming MySQL syntax. The particular syntax for
defining the row limits would vary per database dialect):
select cat.* from Cat join Kitten on Kitten.catId = Cat.id where
Kitten.age > 2 limit 11, 5
If the first Cat returned has 5 kittens older than 2, what I get is 5
identical rows with the same Cat's fields. When this comes back to
hibernate and we use a distinct root entity transformation, we get
only a single cat instead of cats 11-15.
How hard would it be to add an "adaptive paging" mode to Criteria that
would generate a query like:
select cat.* from Cat where exists(select * from Kitten where
Kitten.catId = Cat.id and Kitten.age > 2) limit 11, 5
If we wanted to support "join" fetches or ordering on a property of
the subcollection, we could perform 2 queries: one to get the paged
identifiers of the root entities to be selected, and a second to get
all the data limited by those identifiers:
<paged identifiers> = select distinct cat.id from Cat join Kitten on
Kitten.catId = Cat.id where Kitten.age > 2 order by Kitten.name limit
11, 5
select * from Cat join Kitten on Kitten.catId = Cat.id where Cat.id in
(<paged identifiers>) order by Kitten.name
Depending on the database type, we could optimize further. The first
query might not even need to be returned to the application server
running the hibernate code. In SQL Server 2000+, for instance, we
could store the paged identifiers in a table variable and use it to
limit the second query.
Would this be prohibitively difficult or complicated to implement as a
special mode for Criteria? I have some experience with SQL Server,
MySQL, and Oracle, but I'm not sure if this syntax would be possible
in all databases that Hibernate supports.
Thanks!
Kevin