[hibernate-issues] [Hibernate-JIRA] Commented: (HHH-1615) GROUP BY entity does not work

Wed Feb 6 18:08:56 EST 2008

    [ http://opensource.atlassian.com/projects/hibernate/browse/HHH-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_29463 ] 

Giampaolo Tomassoni commented on HHH-1615:
------------------------------------------

I have the impression your example is a bit misleading. Grouped queries in which you need to get a whole entity are generally ones in which some ORM is involveld, otherwise the "group by" construct is redundant.

In example, the HQL query:

SELECT c, COUNT(*)
FROM Company AS c INNER JOIN c.products AS p
WHERE p.price > 150.00
GROUP BY c
HAVING COUNT(*) > 1
ORDER BY c.income ASC

could be mapped by a patched Hibernate to something like, say, the following SQL select:

SELECT c.id, c.name, c.income, COUNT(*)
FROM companies AS c RIGHT OUTER JOIN products AS p ON c.id=p.idcompany
WHERE p.price > 150.00
GROUP BY c.id, c.name, c.income
HAVING COUNT(*) > 1
ORDER BY c.income ASC

I think that many SQL optimizers should discover the speed-up pattern available on this kind of redundanly-grouped queries: c.id is a primary key and the value of c.name and c.income depend on it. Pheraps MySQL misses a true query optimizer?

Anyway, in order to improve performances when the optimizer isn't smart enough, the trick in SQL is field post-fetching. In example, a patched Hibernate could instead map the above HQL query in the following SQL one:

SELECT c1.id, c1.name, c1.income, c2.cnt
FROM companies AS c1 RIGHT OUTER JOIN (
 SELECT c.id, COUNT(*) AS cnt
 FROM companies AS c INNER JOIN products AS p ON c.id=p.idcompany
 WHERE p.price > 150.00
 GROUP BY c.id
 HAVING cnt > 1
) AS c2 ON c1.id=c.id
ORDER BY c1.income ASC

See? The HQL query is basically mimed by the inner select, which is then wrapped by one meant mostly to dereference entity fields. This needs a grouping index having the same width of the company table's primary key (1 assumed in this example), and results in better performances with respect to fetching all the grouped data in the application and then dereferencing the IDs from there.

It needs subqueries and joins, besides. I don't know if MySQL does support subqueries, but most of the other SQL engines handled by Hibernate do.

So, a general solution to the problem outlined in this issue would be to adopt the subquery+joins mapping when the target SQL engine supports them, falling back to the multi-field grouping when it doesn't.

Please note I didn't run the above HQL and SQL queries, thereby it may be there is something wrong in them.

> GROUP BY entity does not work
> -----------------------------
>
>                 Key: HHH-1615
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HHH-1615
>             Project: Hibernate3
>          Issue Type: Improvement
>          Components: query-hql
>    Affects Versions: 3.2.5
>         Environment: PostgreSQL 8.1.3, Hibernate 3.1.3
>            Reporter: Xavier Bugaud
>            Assignee: Anthony Patricio
>            Priority: Critical
>         Attachments: hibernate-group-by.diff
>
>
> The query : "select cat, count(*) from Cat cat group by cat" does not work.
> ERROR: column "xxxx" must appear in the GROUP BY clause or be used in an aggregate function
> See :
>  - http://forum.hibernate.org/viewtopic.php?t=953716
>  - http://jira.nhibernate.org/browse/NH-528

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://opensource.atlassian.com/projects/hibernate/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira