[teiid-issues] [JBoss JIRA] (TEIID-3454) Dependent Join optimizations for Netezza and Hive

Tuesday, 6 December 2016

     [
https://issues.jboss.org/browse/TEIID-3454?page=com.atlassian.jira.plugin...
]

Steven Hawkins updated TEIID-3454:
----------------------------------
    Fix Version/s: 10.0
                   10.0
                       (was: 9.2)

...
 Dependent Join optimizations for Netezza and Hive
 -------------------------------------------------

                 Key: TEIID-3454
                 URL: https://issues.jboss.org/browse/TEIID-3454
             Project: Teiid
          Issue Type: Feature Request
          Components: Query Engine
    Affects Versions: 8.10
            Reporter: John Muller
            Assignee: Steven Hawkins
             Fix For: 10.0

 Currently, dependent joins create 1 or more IN clauses.  Many MPP / NoSQL systems can
have drastically better performance by creating temp tables that match key distributions. 
Two examples I know of would be Netezza and Hive.
 In Netezza, if the incoming dependent join (small dimension; here "Customer"
using Northwind data model concepts) has a key that will be joined to to a big fact table
that is DISTRIBUTED ON or ORGANIZED BY 'ed then creating a temp table that matches
this distribution will result in ~100x query performance.  Sometimes, if the dimension is
small enough, this doesn't make a big difference as Netezza will perform a broadcast
join, but it's never a bad idea to create the temp table.
 Similarly, Hive DDL has both partitions and buckets (pre-sorted). 

--
This message was sent by Atlassian JIRA
(v7.2.3#72005)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[teiid-issues] [JBoss JIRA] (TEIID-3454) Dependent Join optimizations for Netezza and Hive