Re: [hibernate-dev] HHH-11042 Count distinct tuples

Thursday, 1 September 2016

Hi Christian,

I've been following your discussion on HHH-11042 and thinking about your
solution. I am concerned that some dialects do not treat null the way you
describe. SQL Server has a property that changes how null is treated:

SET ANSI_NULLS { ON | OFF } [1]

[1] mentions that in a future release it will not be possible to set
to OFF, but for now, this is a concern.

I also found [2] which says:

"For the DISTINCT keyword, null values are considered to be duplicates of
 each other. When DISTINCT is included in a SELECT statement, only one
NULL is returned in the results, regardless of how many null values are
encountered."

Sybase may be similar. I'm not sure if there are other dialects that
could be affected.

I agree that the dialect should be able to override the behavior.

Regards,

Gail

[1] https://msdn.microsoft.com/en-us/library/ms188048.aspx
[2] https://technet.microsoft.com/en-us/library/ms187831(v=sql.105).aspx

On Thu, Sep 1, 2016 at 3:18 AM, Christian Beikov <christian.beikov(a)gmail.com
...
 wrote: 
...
 Hey,

 I wanted to start a discussion regarding this issue:
 https://hibernate.atlassian.net/browse/HHH-11042

 Although the Dialect class contains the method
 "supportsTupleDistinctCount", it is never used, so when doing a count
 distinct on a tuple, it just renders the tuple instead of doing a
 fallback or throwing an error.

 I suggested the OP to override the count function in the dialect to do
 whatever he thinks is best but then I realized that the count function
 is not even used as the logic is hard coded in some locations. The
 problematic location in this case is
 "org.hibernate.hql.internal.ast.tree.IdentNode.resolveAsAlias" which
 does not consider the function at all but renders the SQL directly.

 After suggesting him to introduce a custom function instead and some
 discussion on how count distinct could be reliably implemented I think I
 found a solution that might work for most databases.

 On stackoverflow and other sites it is often suggested to use a checksum
 to workaround this limitation which obviously is not a good idea. I
 proposed to do concatenation with a separator that doesn't appear in the
 string and apparently the character '\0' is a valid character which
 makes it a good candidate as that should normally not appear in a string.

 The final solution to the problem looks something like the following

 count(distinct case when col1 is null or col2 is null then null else
 col1 || '\0' || col2 end) + count(case when col1 is null or col2 is null
 then 1 end)

 The first count does a count distinct on all columns concatenated with
 '\0' where all values are not null. The second just counts the cases
 where one of the column values was null. Together that emits the proper
 count based on the assumption that '\0' does not appear in the columns.

 What do you think about that solution? I would like to implement it that
 way and do a PR.

 I would also like to make use of the count function registered in the
 dialect to make this overrideable. Hope that's okay?

 Regards,
 Christian
 _______________________________________________
 hibernate-dev mailing list
 hibernate-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hibernate-dev

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [hibernate-dev] HHH-11042 Count distinct tuples