]
Steven Hawkins reassigned TEIID-3454:
-------------------------------------
Fix Version/s: 10.x
(was: 10.0)
Assignee: (was: Steven Hawkins)
Dependent Join optimizations for Netezza and Hive
-------------------------------------------------
Key: TEIID-3454
URL:
https://issues.jboss.org/browse/TEIID-3454
Project: Teiid
Issue Type: Feature Request
Components: Query Engine
Affects Versions: 8.10
Reporter: John Muller
Fix For: 10.x
Currently, dependent joins create 1 or more IN clauses. Many MPP / NoSQL systems can
have drastically better performance by creating temp tables that match key distributions.
Two examples I know of would be Netezza and Hive.
In Netezza, if the incoming dependent join (small dimension; here "Customer"
using Northwind data model concepts) has a key that will be joined to to a big fact table
that is DISTRIBUTED ON or ORGANIZED BY 'ed then creating a temp table that matches
this distribution will result in ~100x query performance. Sometimes, if the dimension is
small enough, this doesn't make a big difference as Netezza will perform a broadcast
join, but it's never a bad idea to create the temp table.
Similarly, Hive DDL has both partitions and buckets (pre-sorted).