[teiid-issues] [JBoss JIRA] (TEIID-3454) Dependent Join optimizations for Netezza and Hive

Wed Oct 17 09:50:12 EDT 2018

     [ https://issues.jboss.org/browse/TEIID-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Hawkins updated TEIID-3454:
----------------------------------
    Fix Version/s: 12.x
                       (was: 11.x)


> Dependent Join optimizations for Netezza and Hive
> -------------------------------------------------
>
>                 Key: TEIID-3454
>                 URL: https://issues.jboss.org/browse/TEIID-3454
>             Project: Teiid
>          Issue Type: Feature Request
>          Components: Query Engine
>    Affects Versions: 8.10
>            Reporter: John Muller
>            Priority: Major
>             Fix For: 12.x
>
>
> Currently, dependent joins create 1 or more IN clauses.  Many MPP / NoSQL systems can have drastically better performance by creating temp tables that match key distributions.  Two examples I know of would be Netezza and Hive.
> In Netezza, if the incoming dependent join (small dimension; here "Customer" using Northwind data model concepts) has a key that will be joined to to a big fact table that is DISTRIBUTED ON or ORGANIZED BY 'ed then creating a temp table that matches this distribution will result in ~100x query performance.  Sometimes, if the dimension is small enough, this doesn't make a big difference as Netezza will perform a broadcast join, but it's never a bad idea to create the temp table.
> Similarly, Hive DDL has both partitions and buckets (pre-sorted).


--
This message was sent by Atlassian Jira
(v7.12.1#712002)