[teiid-issues] [JBoss JIRA] (TEIID-4498) Generalize subquery batch processing

Mon Dec 10 12:58:00 EST 2018

    [ https://issues.jboss.org/browse/TEIID-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672866#comment-13672866 ] 

Steven Hawkins edited comment on TEIID-4498 at 12/10/18 12:57 PM:
------------------------------------------------------------------

To elaborate further, a correlated subquery using equi-join predicates that is guarenteed return 0 or 1 rows for every outer row:

{code}SELECT pm1.g1.*, (select count(*) from pm1.g2 where e1 = pm1.g1.e1) FROM pm1.g1{code}

If subquery unnest default is true, or the mj hint is used then it will be planned as:

{code}select pm1.g1.* cnt from pm1.g1 left outer join (select count(*) cnt from pm1.g2) v where v.e1 = pm1.g1.e1{code}

If not then he existing RuleMergeCriteria logic can create a plan like:

{code}
join
  access pm1.g1
  access (subplan)
{code}

With some manipulation we get:

{code}
join
  access pm1.g1
  select (e1 in <dependent values>)
    access (subplan)
{code}

However there is little benefit in those plans.  The ultimate goal is to have the dependent values pushed.

This gets more complicated in the case of subqueries that may return multiple rows.  That would either require using the common table plan show above or altering the join node logic under the left out join case to throw an exception when an outer row matches more than 1 inner row.

was (Author: shawkins):
To elaborate further, a correlated subquery using equi-join predicates that is guarenteed return 0 or 1 rows for every outer row:

{code}SELECT pm1.g1.*, (select count(*) from pm1.g2 where e1 = pm1.g1.e1) FROM pm1.g1{code}

If subquery unnest default is true, or the mj hint is used then it will be planned as:

{code}select pm1.g1.* cnt from pm1.g1 left outer join (select count(*) cnt from pm1.g2) v where pm1.g2.e1 = pm1.g1.e1{code}

If not then he existing RuleMergeCriteria logic can create a plan like:

{code}
join
  access pm1.g1
  access (subplan)
{code}

With some manipulation we get:

{code}
join
  access pm1.g1
  select (e1 in <dependent values>)
    access (subplan)
{code}

However there is little benefit in those plans.  The ultimate goal is to have the dependent values pushed.

This gets more complicated in the case of subqueries that may return multiple rows.  That would either require using the common table plan show above or altering the join node logic under the left out join case to throw an exception when an outer row matches more than 1 inner row.

> Generalize subquery batch processing
> ------------------------------------
>
>                 Key: TEIID-4498
>                 URL: https://issues.jboss.org/browse/TEIID-4498
>             Project: Teiid
>          Issue Type: Enhancement
>          Components: Query Engine
>            Reporter: Steven Hawkins
>            Assignee: Steven Hawkins
>            Priority: Major
>             Fix For: 12.0
>
>
> For subqueries that cannot be unnested, but can be modified to project their correlation, we should have at least batch processing rather than the row by row default.

--
This message was sent by Atlassian Jira
(v7.12.1#712002)