[JBoss JIRA] (TEIID-3454) Dependent Join optimizations for Netezza and Hive

Friday, 24 April 2015

John Muller created TEIID-3454:
----------------------------------

             Summary: Dependent Join optimizations for Netezza and Hive
                 Key: TEIID-3454
                 URL: https://issues.jboss.org/browse/TEIID-3454
             Project: Teiid
          Issue Type: Feature Request
          Components: Query Engine
    Affects Versions: 8.10
            Reporter: John Muller
            Assignee: Steven Hawkins
            Priority: Minor

Currently, dependent joins create 1 or more IN clauses.  Many MPP / NoSQL systems can have
drastically better performance by creating temp tables that match key distributions.  Two
examples I know of would be Netezza and Hive.

In Netezza, if the incoming dependent join (small dimension; here "Customer"
using Northwind data model concepts) has a key that will be joined to to a big fact table
that is DISTRIBUTED ON or ORGANIZED BY 'ed then creating a temp table that matches
this distribution will result in ~100x query performance.  Sometimes, if the dimension is
small enough, this doesn't make a big difference as Netezza will perform a broadcast
join, but it's never a bad idea to create the temp table.

Similarly, Hive DDL has both partitions and buckets (pre-sorted).

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009