[teiid-issues] [JBoss JIRA] (TEIID-3454) Dependent Join optimizations for Netezza and Hive

Wed Jul 8 19:04:02 EDT 2015

     [ https://issues.jboss.org/browse/TEIID-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Hawkins updated TEIID-3454:
----------------------------------
    Fix Version/s: 8.12


> Dependent Join optimizations for Netezza and Hive
> -------------------------------------------------
>
>                 Key: TEIID-3454
>                 URL: https://issues.jboss.org/browse/TEIID-3454
>             Project: Teiid
>          Issue Type: Feature Request
>          Components: Query Engine
>    Affects Versions: 8.10
>            Reporter: John Muller
>            Assignee: Steven Hawkins
>            Priority: Minor
>             Fix For: 8.12
>
>
> Currently, dependent joins create 1 or more IN clauses.  Many MPP / NoSQL systems can have drastically better performance by creating temp tables that match key distributions.  Two examples I know of would be Netezza and Hive.
> In Netezza, if the incoming dependent join (small dimension; here "Customer" using Northwind data model concepts) has a key that will be joined to to a big fact table that is DISTRIBUTED ON or ORGANIZED BY 'ed then creating a temp table that matches this distribution will result in ~100x query performance.  Sometimes, if the dimension is small enough, this doesn't make a big difference as Netezza will perform a broadcast join, but it's never a bad idea to create the temp table.
> Similarly, Hive DDL has both partitions and buckets (pre-sorted).


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)