[teiid-issues] [JBoss JIRA] (TEIID-3454) Dependent Join optimizations for Netezza and Hive
Steven Hawkins (Jira)
issues at jboss.org
Wed Oct 17 09:50:12 EDT 2018
[ https://issues.jboss.org/browse/TEIID-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Hawkins updated TEIID-3454:
----------------------------------
Fix Version/s: 12.x
(was: 11.x)
> Dependent Join optimizations for Netezza and Hive
> -------------------------------------------------
>
> Key: TEIID-3454
> URL: https://issues.jboss.org/browse/TEIID-3454
> Project: Teiid
> Issue Type: Feature Request
> Components: Query Engine
> Affects Versions: 8.10
> Reporter: John Muller
> Priority: Major
> Fix For: 12.x
>
>
> Currently, dependent joins create 1 or more IN clauses. Many MPP / NoSQL systems can have drastically better performance by creating temp tables that match key distributions. Two examples I know of would be Netezza and Hive.
> In Netezza, if the incoming dependent join (small dimension; here "Customer" using Northwind data model concepts) has a key that will be joined to to a big fact table that is DISTRIBUTED ON or ORGANIZED BY 'ed then creating a temp table that matches this distribution will result in ~100x query performance. Sometimes, if the dimension is small enough, this doesn't make a big difference as Netezza will perform a broadcast join, but it's never a bad idea to create the temp table.
> Similarly, Hive DDL has both partitions and buckets (pre-sorted).
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
More information about the teiid-issues
mailing list