[teiid-issues] [JBoss JIRA] (TEIID-4227) Impala Translator - Support Multiple Count Distinct with Group By

Scott Wallace (JIRA) issues at jboss.org
Tue May 31 17:43:00 EDT 2016


    [ https://issues.jboss.org/browse/TEIID-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245777#comment-13245777 ] 

Scott Wallace commented on TEIID-4227:
--------------------------------------

[~shawkins] I hadn't considered the values conflicting in the join causing incorrect results. :(

> would it be possible to test with a patched or other translator modification that disables group by support?

Would the group by be ignored only in cases where multiple count distincts are being requested? We'd definitely want to limit not pushing down to Impala only in this edge case scenario. Hopefully Impala will address the multiple count distinct limitation at some point. There's a Jira out there that's been open a while: https://issues.cloudera.org/browse/IMPALA-110.

Thanks for all your support.

> Impala Translator - Support Multiple Count Distinct with Group By
> -----------------------------------------------------------------
>
>                 Key: TEIID-4227
>                 URL: https://issues.jboss.org/browse/TEIID-4227
>             Project: Teiid
>          Issue Type: Feature Request
>          Components: Misc. Connectors
>    Affects Versions: 8.13.3
>            Reporter: Scott Wallace
>            Assignee: Steven Hawkins
>             Fix For: 9.1
>
>
> This is somewhat related to TEIID-3743, which added support of multiple count distinct metrics. We have a requirement to allow multiple count distinct metrics grouped by common attributes.
> Currently, we get a failure as follows:
> {noformat}org.teiid.runtime.client.TeiidClientException: java.lang.RuntimeException: Remote org.teiid.core.TeiidProcessingException: TEIID30504 vw_impression_click_transaction_process_date_detail: 500051 TEIID11008:TEIID11004 Error executing statement(s): [Prepared Values: [] SQL: SELECT g_0.process_date_key, SUM(g_0.num_clicks), SUM(g_0.num_impressions), COUNT(DISTINCT g_0.orderid), COUNT(DISTINCT g_0.`hash_tid`), SUM(g_0.sales) FROM detail.vw_impression_click_transaction_process_date_detail g_0 WHERE g_0.process_date_key >= '2016-05-22' AND g_0.process_date_key < '2016-05-23' AND g_0.advertiser_key = 12345 GROUP BY g_0.process_date_key]
> {noformat}
> To allow this request to succeed, we could have translator write that Impala query as:
> {noformat}
> select nvl(v1.c1,v2.c1), nvl(v1.c2,v1.c2), nvl(v1.c3,v1.c3), v1.c4, v2.c5, nvl(v1.c6,v1.c6)  from (
> SELECT g_0.process_date_key as c1,
>          SUM(g_0.num_clicks) as c2,
>          SUM(g_0.num_impressions) as c3,
>          COUNT(DISTINCT g_0.orderid) as c4,
>          SUM(g_0.sales) as c6
> FROM detail.vw_impression_click_transaction_process_date_detail g_0
> WHERE g_0.process_date_key >= '2016-05-22'
>         AND g_0.process_date_key < '2016-05-23'
>         AND g_0.advertiser_key = 12345
> GROUP BY  g_0.process_date_key
>   ) v1 inner join
>   (
>     SELECT g_0.process_date_key as c1,
>          SUM(g_0.num_clicks) as c2,
>          SUM(g_0.num_impressions) as c3,
>          COUNT(DISTINCT g_0.`hash_tid`) as c5,
>          SUM(g_0.sales) as c6
> FROM detail.vw_impression_click_transaction_process_date_detail g_0
> WHERE g_0.process_date_key >= '2016-05-22'
>         AND g_0.process_date_key < '2016-05-23'
>         AND g_0.advertiser_key = 12345
> GROUP BY  g_0.process_date_key
>    ) v2 on v1.c1 = v2.c1;
> {noformat}
> FYI [~shawkins]



--
This message was sent by Atlassian JIRA
(v6.4.11#64026)


More information about the teiid-issues mailing list