... |
* Choose Dependent - this rule looks at each join node and determines whether the join should be made dependent and in which direction. Cardinality, the number of distinct values, and primary key information are used in several formulas to determine whether a dependent join is likely to be worthwhile. The dependent join differs in performance ideally because a fewer number of values will be returned from the dependent side. Also, we must consider the number of values passed from independent to dependent side. If that set is larger than the max number of values in an IN criteria on the dependent side, then we must break the query into a set of queries and combine their results. Executing each query in the connector has some overhead and that is taken into account. Without costing information a lot of common cases where the only criteria specified is on a non\-unique \(but strongly limiting) field are missed. A join is eligible to be dependent if: |
** * there is at least one equi\-join criterion, i.e. tablea.col = tableb.col |
** * the join is not a full outer join and the dependent side of the join is on the inner side of the join |
The join will be made dependent if one of the following conditions, listed in precedence order, holds: |
** * There is an unsatisfied access pattern that can be satisfied with the dependent join criteria |
** * The potential dependent side of the join is marked with an option makedep |
** * \(4.3.2) if costing was enabled, the estimated cost for the dependent join \(5.0+ possibly in each direction in the case of inner joins) is computed and compared to not performing the dependent join. If the costs were all determined \(which requires all relevant table cardinality, column ndv, and possibly nnv values to be populated) the lowest is chosen. |
** * If key metadata information indicates that the potential dependent side is not “small” and the other side is “not small” or \(5.0.1) the potential dependent side is the inner side of a left outer join. |
Dependent join is the key optimization we use to efficiently process multi\-source joins. |
... |
* Merge Virtual - removes view and inline view layers * Place Access - places access nodes under source nodes. An access node represents the point at which everything below the access node gets pushed to the source or is a plan invocation. Later rules focus on either pushing under the access or pulling the access node up the tree to move more work down to the sources. This rule is also responsible for placing [Federated Optimizations#Access Patterns]. |
* Plan Joins - this rule attempts to find an optimal ordering of the joins performed in the plan, while ensuring that [Federated Optimizations#Access Patterns] dependencies are met. This rule has three main steps. First it must determine an ordering of joins that satisfy the access patterns present. Second it will heuristically create joins that can be pushed to the source \(if a set of joins are pushed to the source, we will not attempt to create an optimal ordering within that set. More than likely it will be sent to the source in the non\-ANSI multi\-join syntax and will be optimized by the database). Third it will use costing information to determine the best left\-linear ordering of joins performed in the processing engine. This third step will do an exhaustive search for 6 7 or less join sources and is heuristically driven by join selectivity for 7 8 or more sources. |
* Plan Outer Joins - reorders outer joins as permitted to improve push down. |
* Plan Procedures - plans procedures that appear in procedural relational queries * Plan Sorts - optimizations around sorting, such as combining sort operations or moving projection |
... |
For each sub-command in the user command an appropriate kind of sub-planner is used (relational, XML, procedure, etc).
Each planner has three primary phases:
A relational processing plan is created by the optimizer after the logical plan is manipulated by a series of rules. The application of rules is determined both by the query structure and by the rules themselves. The node structure of the debug plan resembles that of the processing plan, but the node types more logically represent SQL operations.
User SQL statements after rewrite are converted into a canonical plan form. The canonical plan form most closely resembles the initial SQL structure. A SQL select query has the following possible clauses (all but SELECT are optional): WITH, SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT. These clauses are logically executed in the following order:
For example, a SQL statement such as SELECT max(pm1.g1.e1) FROM pm1.g1 WHERE e2 = 1 creates a logical plan:
Project(groups=[anon_grp0], props={PROJECT_COLS=[anon_grp0.agg0 AS expr1]}) Group(groups=[anon_grp0], props={SYMBOL_MAP={anon_grp0.agg0=MAX(pm1.g1.e1)}}) Select(groups=[pm1.g1], props={SELECT_CRITERIA=e2 = 1}) Source(groups=[pm1.g1])
Here the Source corresponds to the FROM clause, the Select corresponds to the WHERE clause, the Group corresponds to the implied grouping to create the max aggregate, and the Project corresponds to the SELECT clause.
Note that the affect of grouping generates what is effectively an inline view, anon_grp0, to handle the projection of values created by the grouping.
All Node Types:
Each node has a set of applicable properties that are typically shown on the node.
Relational optimization is based upon rule execution that evolves the initial plan into the execution plan. There are a set of pre-defined rules that are dynamically assembled into a rule stack for every query. The rule stack is assembled based on the contents of the user’s query and the views/procedures accessed. For example, if there are no view layers, then rule Merge Virtual, which merges view layers together, is not needed and will not be added to the stack. This allows the rule stack to reflect the complexity of the query.
Logically the plan node data structure represents a tree of nodes where the source data comes up from the leaf nodes (typically Access nodes in the final plan), flows up through the tree and produces the user’s results out the top. The nodes in the plan structure can have bidirectional links, dynamic properties, and allow any number of child nodes. Processing plans in contrast typically have fixed properties.
Plan rule manipulate the plan tree, fire other rules, and drive the optimization process. Each rule is designed to perform a narrow set of tasks. Some rules can be run multiple times. Some rules require a specific set of precursors to run properly.
The join will be made dependent if one of the following conditions, listed in precedence order, holds:
Dependent join is the key optimization we use to efficiently process multi-source joins.
Instead of reading all of source A and all of source B and joining them on A.x = B.x, we read all of A then build a set of A.x that are passed as a criteria when querying B. In cases where A is small and B is large, this can drastically reduce the data retrieved from B, thus greatly speeding the overall query.
One of the most important optimization related to pushing criteria, is how the criteria will be pushed trough join. Consider the following plan tree that represents a subtree of the plan for the query "select ... from A inner join b on (A.x = B.x) where A.y = 3"
SELECT (B.y = 3) | JOIN - Inner Join on (A.x = B.x / \ SRC (A) SRC (B)
![]() | SELECT nodes represent criteria, and SRC stands for SOURCE. |
It is always valid for inner join and cross joins to push (single source) criteria that are above the join, below the join. This allows for criteria originating in the user query to eventually be present in source queries below the joins. This result can be represented visually as:
JOIN - Inner Join on (A.x = B.x) / \ / SELECT (B.y = 3) | | SRC (A) SRC (B)
The same optimization is valid for criteria specified against the outer side of an outer join. For example:
SELECT (B.y = 3) | JOIN - Right Outer Join on (A.x = B.x) / \ SRC (A) SRC (B)
Becomes
JOIN - Right Outer Join on (A.x = B.x) / \ / SELECT (B.y = 3) | | SRC (A) SRC (B)
However criteria specified against the inner side of an outer join needs special consideration. The above scenario with a left or full outer join is not the same. For example:
SELECT (B.y = 3) | JOIN - Left Outer Join on (A.x = B.x) / \ SRC (A) SRC (B)
Can become (available only after 5.0.2):
JOIN - Inner Join on (A.x = B.x) / \ / SELECT (B.y = 3) | | SRC (A) SRC (B)
Since the criterion is not dependent upon the null values that may be populated from the inner side of the join, the criterion is eligible to be pushed below the join – but only if the join type is also changed to an inner join.
On the other hand, criteria that are dependent upon the presence of null values CANNOT be moved. For example:
SELECT (B.y is null) | JOIN - Left Outer Join on (A.x = B.x) / \ SRC (A) SRC (B)
This plan tree must have the criteria remain above the join, since the outer join may be introducing null values itself.
As each relational sub plan is optimized, the plan will show what is being optimized and it's canonical form:
OPTIMIZE: SELECT e1 FROM (SELECT e1 FROM pm1.g1) AS x ---------------------------------------------------------------------------- GENERATE CANONICAL: SELECT e1 FROM (SELECT e1 FROM pm1.g1) AS x CANONICAL PLAN: Project(groups=[x], props={PROJECT_COLS=[e1]}) Source(groups=[x], props={NESTED_COMMAND=SELECT e1 FROM pm1.g1, SYMBOL_MAP={x.e1=e1}}) Project(groups=[pm1.g1], props={PROJECT_COLS=[e1]}) Source(groups=[pm1.g1])
With more complicated user queries, such as a procedure invocation or one containing subqueries, the sub plans may be nested within the overall plan. Each plan ends by showing the final processing plan:
---------------------------------------------------------------------------- OPTIMIZATION COMPLETE: PROCESSOR PLAN: AccessNode(0) output=[e1] SELECT g_0.e1 FROM pm1.g1 AS g_0
The affect of rules can be seen by the state of the plan tree before and after the rule fires. For example, the debug log below shows the application of rule merge virtual, which will remove the "x" inline view layer:
EXECUTING AssignOutputElements AFTER: Project(groups=[x], props={PROJECT_COLS=[e1], OUTPUT_COLS=[e1]}) Source(groups=[x], props={NESTED_COMMAND=SELECT e1 FROM pm1.g1, SYMBOL_MAP={x.e1=e1}, OUTPUT_COLS=[e1]}) Project(groups=[pm1.g1], props={PROJECT_COLS=[e1], OUTPUT_COLS=[e1]}) Access(groups=[pm1.g1], props={SOURCE_HINT=null, MODEL_ID=Schema name=pm1, nameInSource=null, uuid=3335, OUTPUT_COLS=[e1]}) Source(groups=[pm1.g1], props={OUTPUT_COLS=[e1]}) ============================================================================ EXECUTING MergeVirtual AFTER: Project(groups=[pm1.g1], props={PROJECT_COLS=[e1], OUTPUT_COLS=[e1]}) Access(groups=[pm1.g1], props={SOURCE_HINT=null, MODEL_ID=Schema name=pm1, nameInSource=null, uuid=3335, OUTPUT_COLS=[e1]}) Source(groups=[pm1.g1])
Some important planning decisions are shown in the plan as they occur as an annotation. For example the snippet below shows that the access node could not be raised as the parent select node contained an unsupported subquery.
Project(groups=[pm1.g1], props={PROJECT_COLS=[e1], OUTPUT_COLS=null}) Select(groups=[pm1.g1], props={SELECT_CRITERIA=e1 IN /*+ NO_UNNEST */ (SELECT e1 FROM pm2.g1), OUTPUT_COLS=null}) Access(groups=[pm1.g1], props={SOURCE_HINT=null, MODEL_ID=Schema name=pm1, nameInSource=null, uuid=3341, OUTPUT_COLS=null}) Source(groups=[pm1.g1], props={OUTPUT_COLS=null}) ============================================================================ EXECUTING RaiseAccess LOW Relational Planner SubqueryIn is not supported by source pm1 - e1 IN /*+ NO_UNNEST */ (SELECT e1 FROM pm2.g1) was not pushed AFTER: Project(groups=[pm1.g1]) Select(groups=[pm1.g1], props={SELECT_CRITERIA=e1 IN /*+ NO_UNNEST */ (SELECT e1 FROM pm2.g1), OUTPUT_COLS=null}) Access(groups=[pm1.g1], props={SOURCE_HINT=null, MODEL_ID=Schema name=pm1, nameInSource=null, uuid=3341, OUTPUT_COLS=null}) Source(groups=[pm1.g1])
The procedure planner is fairly simple. It converts the statements in the procedure into instructions in a program that will be run during processing. This is mostly a 1-to-1 mapping and very little optimization is performed.
The XML Planner creates an XML plan that is relatively close to the end result of the Procedure Planner – a program with instructions. Many of the instructions are even similar (while loop, execute SQL, etc). Additional instructions deal with producing the output result document (adding elements and attributes).
The XML planner does several types of planning (not necessarily in this order):
XML programs can also be recursive, which involves using the same document fragment for both the initial fragment and a set of repeated fragments (each a new query) until some termination criteria or limit is met.
XQuery is eligible for specific optimizations. Document projection is the most common optimization. It will be shown in the debug plan as an annotation. For example with the user query containing "xmltable('/a/b' passing doc columns x string path '@x', val string path '/.')", the debug plan would show a tree of the document that will effectively be used by the context and path XQuerys:
MEDIUM XQuery Planning Projection conditions met for /a/b - Document projection will be used childelement(Q{}a) childelement(Q{}b) attributeattribute(Q{}x) childtext() childtext()