[teiid-issues] [JBoss JIRA] (TEIID-3743) Multiple Count Distinct Columns Fails for Impala

Steven Hawkins (JIRA) issues at jboss.org
Wed May 25 11:31:01 EDT 2016


    [ https://issues.jboss.org/browse/TEIID-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242721#comment-13242721 ] 

Steven Hawkins commented on TEIID-3743:
---------------------------------------

This is a slightly different case than before in that a group by is present.  In fact the current code won't attempt the rewrite with a group by or having clause.  When there is a group by present the cross join solution will not work as multiple rows can be produced by each aggregation and then would have to be joined together based upon grouping columns:

select col3, count(distinct col1), count(distinct col2) from t1 group by col3:

would instead look like:

select v1.col1, v1.c1 result1, v2.c1 result2 from (select col3, count(distinct col1) as c1 from t1 group by col3) v1 inner join (select col3, count(distinct col2) as c1 from t1 group by col3) v2 on v1.col3 = v2.col3 or (v1.col3 is null and v2.col3 is null);

Which is even messier and has even worse performance implications.  I would track this as a separate issue. 


> Multiple Count Distinct Columns Fails for Impala
> ------------------------------------------------
>
>                 Key: TEIID-3743
>                 URL: https://issues.jboss.org/browse/TEIID-3743
>             Project: Teiid
>          Issue Type: Bug
>          Components: Misc. Connectors
>    Affects Versions: 8.11.4
>            Reporter: Scott Wallace
>            Assignee: Steven Hawkins
>             Fix For: 8.12
>
>
> Teiid Impala translator incorrectly allows multiple count distinct columns, which fails to execute since it is not supported by Impala.
> Per Cloudera documentation:
> {quote}By default, Impala only allows a single COUNT(DISTINCT columns) expression in each query.
> To produce the same result as multiple COUNT(DISTINCT) expressions, you can use the following technique for queries involving a single table:
> select v1.c1 result1, v2.c1 result2 from (select count(distinct col1) as c1 from t1) v1 cross join (select count(distinct col2) as c1 from t1) v2;{quote}
> The Teiid-Impala translator should ideally rewrite the query as necessary so that the query does not fail executing multiple count distincts against Impala.



--
This message was sent by Atlassian JIRA
(v6.4.11#64026)


More information about the teiid-issues mailing list