[JBoss JIRA] (TEIID-4997) Teiid on/with Spark
by Steven Hawkins (Jira)
[ https://issues.jboss.org/browse/TEIID-4997?page=com.atlassian.jira.plugin... ]
Steven Hawkins updated TEIID-4997:
----------------------------------
Fix Version/s: 12.x
(was: 11.x)
> Teiid on/with Spark
> -------------------
>
> Key: TEIID-4997
> URL: https://issues.jboss.org/browse/TEIID-4997
> Project: Teiid
> Issue Type: Feature Request
> Components: Build/Kits, Query Engine
> Reporter: Steven Hawkins
> Assignee: Steven Hawkins
> Priority: Major
> Fix For: 12.x
>
>
> With the availability of Spark on OpenShift, we should provide a cooperative planning/execution mode for Teiid that utilizes the Spark engine.
> Roughly this would look like a Teiid master running embedded with the Spark master serving the typical JDBC/ODBC/OData endpoints. On an incoming query the optimizer would choose to process against Spark or to process with Teiid - if processing with Teiid that may still require submitting the job to a worker to avoid burdening the master. Alternatively the Teiid master could run in a separate pod with the additional serialization costs, however initially the remote Spark [JDBC/ODBC layer|https://spark.apache.org/docs/latest/sql-programming-guide.html#dis...] will not be available in the OpenShift effort.
> If execution against Spark is chosen, then instead of a typical Teiid processor plan a spark job will be created instead. Initially this could be limited to relational plans, but could be expanded to include procedure language support translated to python, scala, etc. The spark job would represent each source access as a [temporary view|https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc...] accessing the relevant pushdown query. Ideally this would be executed against a Teiid Embedded instance running in the worker node. If remote this would incur an extra hop and have security considerations. This can be thought of as using Teiid for its virtualization and access layer features. The rest of the processing about the access layers could then be represented as Spark SQL.
> For example a Teiid user query of "select * from hdfs.tbl h, oracle.tbl o where h.id = o.id order by h.col" would become the Spark SQL job:
> CREATE TEMPORARY VIEW h
> USING org.apache.spark.sql.jdbc
> OPTIONS (
> url "jdbc:teiid:vdb",
> dbtable "(select col ... from hdfs.tbl)",
> fetchSize '1024,
> ...
> )
> CREATE TEMPORARY VIEW o
> USING org.apache.spark.sql.jdbc
> OPTIONS (
> url "jdbc:teiid:vdb",
> dbtable "(select col ... from oracle.tbl)",
> fetchSize '1024,
> ...
> )
> SELECT * FROM h inner join o on h.id
> The challenges/considerations of this are:
> * Utilizing embedded with coordinated VDB management. There's the associated issue of driver management as well.
> * Translating Teiid SQL to Spark SQL. All Teiid functions, udfs, aggregate functions would need to be made known to Spark. Table function constructs, such as XMLTABLE, TEXTTABLE, etc. could initially just be treated as access layer concerns. Type issues would exist as xml/clob/json would map to string.
> * no xa support
> * we'd need to provide reasonable values for fetch size, partition information, etc. in the access layer queries.
> * We'd have to determine the extent to which federated join optimizations need to be conveyed (dependent join and pushdown) as that would go beyond simply translating to Spark SQL.
> * there's a potential to use [global temporary views|http://www.gatorsmile.io/globaltempview/] which is a more convenient way of adding virtualization to Spark.
> * Large internal materialization should be re-targeted to Spark or JDG
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 2 months
[JBoss JIRA] (TEIID-5321) SAP IQ numeric functions corner case differences
by Steven Hawkins (Jira)
[ https://issues.jboss.org/browse/TEIID-5321?page=com.atlassian.jira.plugin... ]
Steven Hawkins updated TEIID-5321:
----------------------------------
Fix Version/s: 12.x
(was: 11.x)
> SAP IQ numeric functions corner case differences
> ------------------------------------------------
>
> Key: TEIID-5321
> URL: https://issues.jboss.org/browse/TEIID-5321
> Project: Teiid
> Issue Type: Bug
> Components: Misc. Connectors
> Affects Versions: 8.12.13.6_4
> Reporter: Jan Stastny
> Assignee: Steven Hawkins
> Priority: Minor
> Fix For: 12.x
>
>
> The following discrepancies in return values of numeric functions are caused by SAP IQ's behaviour.
> # COT(0)
> #* results in OutOfRange error in Teiid
> #* returns NULL in SAP IQ
> # EXP\(n\) - for n < -15
> #* result is 0 in Teiid
> #* result is in SAP IQ
> #** 0 for -15 > n >=-745
> #** NULL for n<=-746
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 2 months
[JBoss JIRA] (TEIID-5318) MaterializtionManager probe can be too rapid.
by Steven Hawkins (Jira)
[ https://issues.jboss.org/browse/TEIID-5318?page=com.atlassian.jira.plugin... ]
Steven Hawkins updated TEIID-5318:
----------------------------------
Fix Version/s: 12.x
(was: 11.x)
> MaterializtionManager probe can be too rapid.
> ---------------------------------------------
>
> Key: TEIID-5318
> URL: https://issues.jboss.org/browse/TEIID-5318
> Project: Teiid
> Issue Type: Quality Risk
> Components: Server
> Reporter: Steven Hawkins
> Priority: Major
> Fix For: 12.x
>
>
> If a ttl load is in progress the probe is repeated every minute - regardless of how much time is remaining. This can happen due to manual loads or in clusters where members start the materialization jobs with some skew. While this is not harmful, it does lead to a lot of logging and queries against the status table which could be avoided.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 2 months