John Muller created TEIID-3454:
----------------------------------
Summary: Dependent Join optimizations for Netezza and Hive
Key: TEIID-3454
URL:
https://issues.jboss.org/browse/TEIID-3454
Project: Teiid
Issue Type: Feature Request
Components: Query Engine
Affects Versions: 8.10
Reporter: John Muller
Assignee: Steven Hawkins
Priority: Minor
Currently, dependent joins create 1 or more IN clauses. Many MPP / NoSQL systems can have
drastically better performance by creating temp tables that match key distributions. Two
examples I know of would be Netezza and Hive.
In Netezza, if the incoming dependent join (small dimension; here "Customer"
using Northwind data model concepts) has a key that will be joined to to a big fact table
that is DISTRIBUTED ON or ORGANIZED BY 'ed then creating a temp table that matches
this distribution will result in ~100x query performance. Sometimes, if the dimension is
small enough, this doesn't make a big difference as Netezza will perform a broadcast
join, but it's never a bad idea to create the temp table.
Similarly, Hive DDL has both partitions and buckets (pre-sorted).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)