[teiid-issues] [JBoss JIRA] (TEIID-3454) Dependent Join optimizations for Netezza and Hive

Mon Aug 10 11:16:05 EDT 2020

     [ https://issues.redhat.com/browse/TEIID-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Hawkins resolved TEIID-3454.
-----------------------------------
    Resolution: Out of Date

Bulk resolving older issues as out of date.

> Dependent Join optimizations for Netezza and Hive
> -------------------------------------------------
>
>                 Key: TEIID-3454
>                 URL: https://issues.redhat.com/browse/TEIID-3454
>             Project: Teiid
>          Issue Type: Feature Request
>          Components: Query Engine
>    Affects Versions: 8.10
>            Reporter: John Muller
>            Priority: Major
>             Fix For: Backlog
>
>
> Currently, dependent joins create 1 or more IN clauses.  Many MPP / NoSQL systems can have drastically better performance by creating temp tables that match key distributions.  Two examples I know of would be Netezza and Hive.
> In Netezza, if the incoming dependent join (small dimension; here "Customer" using Northwind data model concepts) has a key that will be joined to to a big fact table that is DISTRIBUTED ON or ORGANIZED BY 'ed then creating a temp table that matches this distribution will result in ~100x query performance.  Sometimes, if the dimension is small enough, this doesn't make a big difference as Netezza will perform a broadcast join, but it's never a bad idea to create the temp table.
> Similarly, Hive DDL has both partitions and buckets (pre-sorted).

--
This message was sent by Atlassian Jira
(v7.13.8#713008)