[teiid-issues] [JBoss JIRA] (TEIID-3454) Dependent Join optimizations for Netezza and Hive
Steven Hawkins (Jira)
issues at jboss.org
Mon Aug 10 11:16:05 EDT 2020
[ https://issues.redhat.com/browse/TEIID-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Hawkins resolved TEIID-3454.
-----------------------------------
Resolution: Out of Date
Bulk resolving older issues as out of date.
> Dependent Join optimizations for Netezza and Hive
> -------------------------------------------------
>
> Key: TEIID-3454
> URL: https://issues.redhat.com/browse/TEIID-3454
> Project: Teiid
> Issue Type: Feature Request
> Components: Query Engine
> Affects Versions: 8.10
> Reporter: John Muller
> Priority: Major
> Fix For: Backlog
>
>
> Currently, dependent joins create 1 or more IN clauses. Many MPP / NoSQL systems can have drastically better performance by creating temp tables that match key distributions. Two examples I know of would be Netezza and Hive.
> In Netezza, if the incoming dependent join (small dimension; here "Customer" using Northwind data model concepts) has a key that will be joined to to a big fact table that is DISTRIBUTED ON or ORGANIZED BY 'ed then creating a temp table that matches this distribution will result in ~100x query performance. Sometimes, if the dimension is small enough, this doesn't make a big difference as Netezza will perform a broadcast join, but it's never a bad idea to create the temp table.
> Similarly, Hive DDL has both partitions and buckets (pre-sorted).
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
More information about the teiid-issues
mailing list