July 2017 - teiid-issues - Jboss List Archives

[JBoss JIRA] (TEIID-4999) Archetypes should not use Teiid packages

by Steven Hawkins (JIRA)

[ https://issues.jboss.org/browse/TEIID-4999?page=com.atlassian.jira.plugin... ] Steven Hawkins commented on TEIID-4999: --------------------------------------- Updated the templates and docs to be more consistent and utilize locations that shouldn't conflict with Teiid artifacts. > Archetypes should not use Teiid packages > ---------------------------------------- > > Key: TEIID-4999 > URL: https://issues.jboss.org/browse/TEIID-4999 > Project: Teiid > Issue Type: Quality Risk > Components: Build/Kits, Documentation > Reporter: Steven Hawkins > Assignee: Steven Hawkins > Fix For: 10.0 > > > The archetypes are hardcoded to expect the use of org.jboss.teiid packages. This should be changed to allow for any package. -- This message was sent by Atlassian JIRA (v7.2.3#72005)

8 years, 10 months

1
0
0 / 0

[JBoss JIRA] (TEIID-5004) INNER JOINs of two CTEs fail with ASSERTION FAILED error message

by dalex dalex (JIRA)

dalex dalex created TEIID-5004: ---------------------------------- Summary: INNER JOINs of two CTEs fail with ASSERTION FAILED error message Key: TEIID-5004 URL: https://issues.jboss.org/browse/TEIID-5004 Project: Teiid Issue Type: Bug Components: Query Engine Affects Versions: 9.3.1, 9.3 Environment: teiid-9.3.0.Final on WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) and teiid-9.3.1 on WildFly Full 10.0.0.Final (WildFly Core 2.0.10.Final) Reporter: dalex dalex Assignee: Steven Hawkins Priority: Blocker when running the following query several times in a row: {code:sql} with CTE1 as ( select a1 from ( -->> with CTE11 as (select a1 from ( WITH alias as (SELECT 1 as a), alias1 as (select alias.a as alias, AAA.a as alias1, AAA.b as a from alias join test.test_a AAA on alias.a!=AAA.a or alias.a!=AAA.a), alias2 as (with alias as (select alias.a as alias, AAA.a as alias1, AAA.b as a from alias join test.test_a AAA on alias.a!=AAA.a or alias.a!=AAA.a) select * from alias), alias3 as (with alias as (select alias.a as alias, AAA.a as alias1, AAA.a from alias join alias2 AAA on alias.a!=AAA.a or alias.a!=AAA.a) select * from alias) SELECT AAA.alias1 as a1 FROM alias join alias1 as AAA on alias.a=AAA.a or alias.a!=AAA.a join alias2 on AAA.a=alias2.a or AAA.a!=alias.a left join alias3 on alias3.a=alias2.a or alias3.a!=alias2.a ) tv2) --<< select a1 from CTE11 ) as SUBQ1), CTE2 as ( select a1 from ( -->> with CTE21 as (select a1 from ( WITH alias as (SELECT 1 as a), alias1 as (select alias.a as alias, AAA.a as alias1, AAA.b as a from alias join test.test_a AAA on alias.a!=AAA.a or alias.a!=AAA.a), alias2 as (with alias as (select alias.a as alias, AAA.a as alias1, AAA.b as a from alias join test.test_a AAA on alias.a!=AAA.a or alias.a!=AAA.a) select * from alias), alias3 as (with alias as (select alias.a as alias, AAA.a as alias1, AAA.a from alias join alias2 AAA on alias.a!=AAA.a or alias.a!=AAA.a) select * from alias) SELECT AAA.alias1 as a1 FROM alias join alias1 as AAA on alias.a=AAA.a or alias.a!=AAA.a join alias2 on AAA.a=alias2.a or AAA.a!=alias.a left join alias3 on alias3.a=alias2.a or alias3.a!=alias2.a ) tv2) --<< select a1 from CTE21 ) as SUBQ2) select * from CTE1 as T1 join CTE2 as T2 on T1.a1=T2.a1 {code} teiid returns the following stacktrace (from teiid 9.3.1): {code} 2017-07-21 16:52:32,440 ERROR [org.teiid.PROCESSOR] (Worker1_QueryProcessorQueue73) oiBQAQPtTMe+ TEIID30019 Unexpected exception for request oiBQAQPtTMe+.15: java.lang.AssertionError : ASSERTION FAILED: expected reference to be not null at org.teiid.core.util.Assertion.failed(Assertion.java:73) at org.teiid.core.util.Assertion.isNotNull(Assertion.java:100) at org.teiid.core.util.Assertion.isNotNull(Assertion.java:92) at org.teiid.common.buffer.TupleBuffer.getBatch(TupleBuffer.java:287) at org.teiid.dqp.internal.process.TupleSourceCache$CopyOnReadTupleSource.nextTuple(TupleSourceCache.java:103) at org.teiid.query.processor.relational.AccessNode.nextBatchDirect(AccessNode.java:394) at org.teiid.query.processor.relational.RelationalNode.nextBatch(RelationalNode.java:281) at org.teiid.query.processor.BatchIterator.finalRow(BatchIterator.java:69) at org.teiid.common.buffer.AbstractTupleSource.getCurrentTuple(AbstractTupleSource.java:70) at org.teiid.query.processor.BatchIterator.getCurrentTuple(BatchIterator.java:84) at org.teiid.common.buffer.AbstractTupleSource.hasNext(AbstractTupleSource.java:92) at org.teiid.query.processor.relational.MergeJoinStrategy.process(MergeJoinStrategy.java:165) at org.teiid.query.processor.relational.JoinNode.nextBatchDirect(JoinNode.java:235) at org.teiid.query.processor.relational.RelationalNode.nextBatch(RelationalNode.java:281) at org.teiid.query.processor.BatchCollector.collectTuples(BatchCollector.java:164) at org.teiid.query.processor.relational.SourceState.prefetch(SourceState.java:211) at org.teiid.query.processor.relational.SourceState.rowCountLE(SourceState.java:158) at org.teiid.query.processor.relational.JoinNode.nextBatchDirect(JoinNode.java:227) at org.teiid.query.processor.relational.RelationalNode.nextBatch(RelationalNode.java:281) at org.teiid.query.processor.BatchCollector.collectTuples(BatchCollector.java:164) at org.teiid.query.processor.relational.SourceState.prefetch(SourceState.java:211) at org.teiid.query.processor.relational.JoinNode.prefetch(JoinNode.java:255) at org.teiid.query.processor.relational.JoinNode.nextBatchDirect(JoinNode.java:247) at org.teiid.query.processor.relational.RelationalNode.nextBatch(RelationalNode.java:281) at org.teiid.query.processor.relational.LimitNode.nextBatchDirect(LimitNode.java:102) at org.teiid.query.processor.relational.RelationalNode.nextBatch(RelationalNode.java:281) at org.teiid.query.processor.relational.ProjectNode.nextBatchDirect(ProjectNode.java:150) at org.teiid.query.processor.relational.RelationalNode.nextBatch(RelationalNode.java:281) at org.teiid.query.processor.relational.RelationalPlan.nextBatch(RelationalPlan.java:145) at org.teiid.query.processor.QueryProcessor.nextBatchDirect(QueryProcessor.java:151) at org.teiid.query.processor.QueryProcessor.nextBatch(QueryProcessor.java:114) at org.teiid.query.processor.BatchCollector.collectTuples(BatchCollector.java:164) at org.teiid.query.processor.BatchCollector.collectTuples(BatchCollector.java:146) at org.teiid.dqp.internal.process.RequestWorkItem.processMore(RequestWorkItem.java:477) at org.teiid.dqp.internal.process.RequestWorkItem.process(RequestWorkItem.java:349) at org.teiid.dqp.internal.process.AbstractWorkItem.run(AbstractWorkItem.java:51) at org.teiid.dqp.internal.process.RequestWorkItem.run(RequestWorkItem.java:275) at org.teiid.dqp.internal.process.DQPWorkContext.runInContext(DQPWorkContext.java:284) at org.teiid.dqp.internal.process.ThreadReuseExecutor$RunnableWrapper.run(ThreadReuseExecutor.java:119) at org.teiid.dqp.internal.process.ThreadReuseExecutor$3.run(ThreadReuseExecutor.java:210) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v7.2.3#72005)

8 years, 10 months

1
0
0 / 0

[JBoss JIRA] (TEIID-4385) Nested subquery in an aggregate fails to evaluate

by Johnathon Lee (JIRA)

[ https://issues.jboss.org/browse/TEIID-4385?page=com.atlassian.jira.plugin... ] Johnathon Lee updated TEIID-4385: --------------------------------- Fix Version/s: 8.12.12.6_3 > Nested subquery in an aggregate fails to evaluate > ------------------------------------------------- > > Key: TEIID-4385 > URL: https://issues.jboss.org/browse/TEIID-4385 > Project: Teiid > Issue Type: Bug > Components: Query Engine > Affects Versions: 8.12 > Reporter: Steven Hawkins > Assignee: Steven Hawkins > Fix For: 9.1, 9.0.3, 8.12.12.6_3 > > > A correlated subquery that appears in an aggregate value inside of a subquery, will not have the correlated values available and fail with a TEIID30328 exception. > For example: > SELECT g0.a, g0.b, (SELECT max((SELECT g2.a FROM m.z AS g2 WHERE g2.b = g1.a)) FROM m.y AS g1 WHERE g0.a = g1.b) FROM m.x AS g0 > will fail on the g0.a reference. -- This message was sent by Atlassian JIRA (v7.2.3#72005)

8 years, 10 months

1
0
0 / 0

[JBoss JIRA] (TEIID-4997) Teiid on/with Spark

by Steven Hawkins (JIRA)

[ https://issues.jboss.org/browse/TEIID-4997?page=com.atlassian.jira.plugin... ] Steven Hawkins commented on TEIID-4997: --------------------------------------- Permanent UDFs require some effort - https://stackoverflow.com/questions/43272446/defined-spark-permanent-udf-... Otherwise they must be added on a per session basis. There is also the issue that our UDFs and UDAFs can accept the command context. The closest we can get to that is registering those functions with a command context instance on per session basis. Another solution is to only register UDFs for Teiid specific functions and map to equivalent Spark functions. Lateral joins are not supported as generally with Spark either as they lack a general nested table that can use a subquery. Recursive common table expressions are also not supported. > Teiid on/with Spark > ------------------- > > Key: TEIID-4997 > URL: https://issues.jboss.org/browse/TEIID-4997 > Project: Teiid > Issue Type: Feature Request > Components: Build/Kits, Query Engine > Reporter: Steven Hawkins > Assignee: Steven Hawkins > > With the availability of Spark on OpenShift, we should provide a cooperative planning/execution mode for Teiid that utilizes the Spark engine. > Roughly this would look like a Teiid master running embedded with the Spark master serving the typical JDBC/ODBC/OData endpoints. On an incoming query the optimizer would choose to process against Spark or to process with Teiid - if processing with Teiid that may still require submitting the job to a worker to avoid burdening the master. Alternatively the Teiid master could run in a separate pod with the additional serialization costs, however initially the remote Spark [JDBC/ODBC layer|https://spark.apache.org/docs/latest/sql-programming-guide.html#dis...] will not be available in the OpenShift effort. > If execution against Spark is chosen, then instead of a typical Teiid processor plan a spark job will be created instead. Initially this could be limited to relational plans, but could be expanded to include procedure language support translated to python, scala, etc. The spark job would represent each source access as a [temporary view|https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc...] accessing the relevant pushdown query. Ideally this would be executed against a Teiid Embedded instance running in the worker node. If remote this would incur an extra hop and have security considerations. This can be thought of as using Teiid for its virtualization and access layer features. The rest of the processing about the access layers could then be represented as Spark SQL. > For example a Teiid user query of "select * from hdfs.tbl h, oracle.tbl o where h.id = o.id order by h.col" would become the Spark SQL job: > CREATE TEMPORARY VIEW h > USING org.apache.spark.sql.jdbc > OPTIONS ( > url "jdbc:teiid:vdb", > dbtable "(select col ... from hdfs.tbl)", > fetchSize '1024, > ... > ) > CREATE TEMPORARY VIEW o > USING org.apache.spark.sql.jdbc > OPTIONS ( > url "jdbc:teiid:vdb", > dbtable "(select col ... from oracle.tbl)", > fetchSize '1024, > ... > ) > SELECT * FROM h inner join o on h.id > The challenges/considerations of this are: > * Utilizing embedded with coordinated VDB management. There's the associated issue of driver management as well. > * Translating Teiid SQL to Spark SQL. All Teiid functions, udfs, aggregate functions would need to be made known to Spark. Table function constructs, such as XMLTABLE, TEXTTABLE, etc. could initially just be treated as access layer concerns. Type issues would exist as xml/clob/json would map to string. > * no xa support > * we'd need to provide reasonable values for fetch size, partition information, etc. in the access layer queries. > * We'd have to determine the extent to which federated join optimizations need to be conveyed (dependent join and pushdown) as that would go beyond simply translating to Spark SQL. > * there's a potential to use [global temporary views|http://www.gatorsmile.io/globaltempview/] which is a more convenient way of adding virtualization to Spark. > * Large internal materialization should be re-targeted to Spark or JDG -- This message was sent by Atlassian JIRA (v7.2.3#72005)

8 years, 10 months

1
0
0 / 0

[JBoss JIRA] (TEIID-4997) Teiid on/with Spark

by Steven Hawkins (JIRA)

[ https://issues.jboss.org/browse/TEIID-4997?page=com.atlassian.jira.plugin... ] Steven Hawkins commented on TEIID-4997: --------------------------------------- A different initial approach is to use Spark in local master / embedded mode. This sidesteps the notions of cluster integration for now and will just focus on what's logically needed in the creation of the spark job. > Teiid on/with Spark > ------------------- > > Key: TEIID-4997 > URL: https://issues.jboss.org/browse/TEIID-4997 > Project: Teiid > Issue Type: Feature Request > Components: Build/Kits, Query Engine > Reporter: Steven Hawkins > Assignee: Steven Hawkins > > With the availability of Spark on OpenShift, we should provide a cooperative planning/execution mode for Teiid that utilizes the Spark engine. > Roughly this would look like a Teiid master running embedded with the Spark master serving the typical JDBC/ODBC/OData endpoints. On an incoming query the optimizer would choose to process against Spark or to process with Teiid - if processing with Teiid that may still require submitting the job to a worker to avoid burdening the master. Alternatively the Teiid master could run in a separate pod with the additional serialization costs, however initially the remote Spark [JDBC/ODBC layer|https://spark.apache.org/docs/latest/sql-programming-guide.html#dis...] will not be available in the OpenShift effort. > If execution against Spark is chosen, then instead of a typical Teiid processor plan a spark job will be created instead. Initially this could be limited to relational plans, but could be expanded to include procedure language support translated to python, scala, etc. The spark job would represent each source access as a [temporary view|https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc...] accessing the relevant pushdown query. Ideally this would be executed against a Teiid Embedded instance running in the worker node. If remote this would incur an extra hop and have security considerations. This can be thought of as using Teiid for its virtualization and access layer features. The rest of the processing about the access layers could then be represented as Spark SQL. > For example a Teiid user query of "select * from hdfs.tbl h, oracle.tbl o where h.id = o.id order by h.col" would become the Spark SQL job: > CREATE TEMPORARY VIEW h > USING org.apache.spark.sql.jdbc > OPTIONS ( > url "jdbc:teiid:vdb", > dbtable "(select col ... from hdfs.tbl)", > fetchSize '1024, > ... > ) > CREATE TEMPORARY VIEW o > USING org.apache.spark.sql.jdbc > OPTIONS ( > url "jdbc:teiid:vdb", > dbtable "(select col ... from oracle.tbl)", > fetchSize '1024, > ... > ) > SELECT * FROM h inner join o on h.id > The challenges/considerations of this are: > * Utilizing embedded with coordinated VDB management. There's the associated issue of driver management as well. > * Translating Teiid SQL to Spark SQL. All Teiid functions, udfs, aggregate functions would need to be made known to Spark. Table function constructs, such as XMLTABLE, TEXTTABLE, etc. could initially just be treated as access layer concerns. Type issues would exist as xml/clob/json would map to string. > * no xa support > * we'd need to provide reasonable values for fetch size, partition information, etc. in the access layer queries. > * We'd have to determine the extent to which federated join optimizations need to be conveyed (dependent join and pushdown) as that would go beyond simply translating to Spark SQL. > * there's a potential to use [global temporary views|http://www.gatorsmile.io/globaltempview/] which is a more convenient way of adding virtualization to Spark. > * Large internal materialization should be re-targeted to Spark or JDG -- This message was sent by Atlassian JIRA (v7.2.3#72005)

8 years, 10 months

1
0
0 / 0

[JBoss JIRA] (TEIID-5003) Deprecate BatchedCommand, use BulkCommand instead

by Steven Hawkins (JIRA)

[ https://issues.jboss.org/browse/TEIID-5003?page=com.atlassian.jira.plugin... ] Steven Hawkins resolved TEIID-5003. ----------------------------------- Resolution: Done Added the new interface and deprecated the old. > Deprecate BatchedCommand, use BulkCommand instead > ------------------------------------------------- > > Key: TEIID-5003 > URL: https://issues.jboss.org/browse/TEIID-5003 > Project: Teiid > Issue Type: Quality Risk > Components: Connector API > Reporter: Steven Hawkins > Assignee: Steven Hawkins > Fix For: 10.0 > > > To match the support methods it would be clearer to use BulkCommand instead of BatchedCommand. -- This message was sent by Atlassian JIRA (v7.2.3#72005)

8 years, 10 months

1
0
0 / 0

[JBoss JIRA] (TEIID-4997) Teiid on/with Spark

by Steven Hawkins (JIRA)

[ https://issues.jboss.org/browse/TEIID-4997?page=com.atlassian.jira.plugin... ] Steven Hawkins edited comment on TEIID-4997 at 7/20/17 11:33 AM: ----------------------------------------------------------------- To do the collocation of Teiid embedded with the worker will take a lot of specialization. A starting point of a wildfly-swarm uber jar without any translators, gives us a convenient way to be loaded in through the spark shell. However our base jar size is > 122 MB with a significant runtime footprint as well given the amount of heap usage reserved by Teiid. This of course could be slimmed more as we don't need the rest nor remote jdbc layers. We'd need to customize the engine startup to be triggered through the driver, customize the resource consumption such that the buffermanager is reserving only a small amount of memory, prevent any materialization loading (for now), and have a convention for vdbs are managed. In the interest of a POC level of effort this leads me to believe that we should focus instead on remote access back to Teiid and as an optimization direct creation of pushdown of source queries to JDBC sources by directly tapping into the translator layer. was (Author: shawkins): To do the collocation of Teiid embedded with the worker will take a lot of specialization. A starting point of a wildfly-swarm uber jar without any translators, gives us a convenient way to be loaded in through the spark shell. However our base size is > 122 MB with a significant amount of heap usage reserved by Teiid. This of course could be slimmed more as we don't need the rest nor remote jdbc layers. We'd need to customize the engine startup to be triggered through the driver, customize the resource consumption such that the buffermanager is reserving only a small amount of memory, prevent any materialization loading (for now), and have a convention for vdbs are managed. In the interest of a POC level of effort this leads me to believe that we should focus instead on remote access back to Teiid and as an optimization direct creation of pushdown of source queries to JDBC sources by directly tapping into the translator layer. > Teiid on/with Spark > ------------------- > > Key: TEIID-4997 > URL: https://issues.jboss.org/browse/TEIID-4997 > Project: Teiid > Issue Type: Feature Request > Components: Build/Kits, Query Engine > Reporter: Steven Hawkins > Assignee: Steven Hawkins > > With the availability of Spark on OpenShift, we should provide a cooperative planning/execution mode for Teiid that utilizes the Spark engine. > Roughly this would look like a Teiid master running embedded with the Spark master serving the typical JDBC/ODBC/OData endpoints. On an incoming query the optimizer would choose to process against Spark or to process with Teiid - if processing with Teiid that may still require submitting the job to a worker to avoid burdening the master. Alternatively the Teiid master could run in a separate pod with the additional serialization costs, however initially the remote Spark [JDBC/ODBC layer|https://spark.apache.org/docs/latest/sql-programming-guide.html#dis...] will not be available in the OpenShift effort. > If execution against Spark is chosen, then instead of a typical Teiid processor plan a spark job will be created instead. Initially this could be limited to relational plans, but could be expanded to include procedure language support translated to python, scala, etc. The spark job would represent each source access as a [temporary view|https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc...] accessing the relevant pushdown query. Ideally this would be executed against a Teiid Embedded instance running in the worker node. If remote this would incur an extra hop and have security considerations. This can be thought of as using Teiid for its virtualization and access layer features. The rest of the processing about the access layers could then be represented as Spark SQL. > For example a Teiid user query of "select * from hdfs.tbl h, oracle.tbl o where h.id = o.id order by h.col" would become the Spark SQL job: > CREATE TEMPORARY VIEW h > USING org.apache.spark.sql.jdbc > OPTIONS ( > url "jdbc:teiid:vdb", > dbtable "(select col ... from hdfs.tbl)", > fetchSize '1024, > ... > ) > CREATE TEMPORARY VIEW o > USING org.apache.spark.sql.jdbc > OPTIONS ( > url "jdbc:teiid:vdb", > dbtable "(select col ... from oracle.tbl)", > fetchSize '1024, > ... > ) > SELECT * FROM h inner join o on h.id > The challenges/considerations of this are: > * Utilizing embedded with coordinated VDB management. There's the associated issue of driver management as well. > * Translating Teiid SQL to Spark SQL. All Teiid functions, udfs, aggregate functions would need to be made known to Spark. Table function constructs, such as XMLTABLE, TEXTTABLE, etc. could initially just be treated as access layer concerns. Type issues would exist as xml/clob/json would map to string. > * no xa support > * we'd need to provide reasonable values for fetch size, partition information, etc. in the access layer queries. > * We'd have to determine the extent to which federated join optimizations need to be conveyed (dependent join and pushdown) as that would go beyond simply translating to Spark SQL. > * there's a potential to use [global temporary views|http://www.gatorsmile.io/globaltempview/] which is a more convenient way of adding virtualization to Spark. > * Large internal materialization should be re-targeted to Spark or JDG -- This message was sent by Atlassian JIRA (v7.2.3#72005)

8 years, 10 months

1
0
0 / 0

[JBoss JIRA] (TEIID-4997) Teiid on/with Spark

by Steven Hawkins (JIRA)

[ https://issues.jboss.org/browse/TEIID-4997?page=com.atlassian.jira.plugin... ] Steven Hawkins commented on TEIID-4997: --------------------------------------- To do the collocation of Teiid embedded with the worker will take a lot of specialization. A starting point of a wildfly-swarm uber jar without any translators, gives us a convenient way to be loaded in through the spark shell. However our base size is > 122 MB with a significant amount of heap usage reserved by Teiid. This of course could be slimmed more as we don't need the rest nor remote jdbc layers. We'd need to customize the engine startup to be triggered through the driver, customize the resource consumption such that the buffermanager is reserving only a small amount of memory, prevent any materialization loading (for now), and have a convention for vdbs are managed. In the interest of a POC level of effort this leads me to believe that we should focus instead on remote access back to Teiid and as an optimization direct creation of pushdown of source queries to JDBC sources by directly tapping into the translator layer. > Teiid on/with Spark > ------------------- > > Key: TEIID-4997 > URL: https://issues.jboss.org/browse/TEIID-4997 > Project: Teiid > Issue Type: Feature Request > Components: Build/Kits, Query Engine > Reporter: Steven Hawkins > Assignee: Steven Hawkins > > With the availability of Spark on OpenShift, we should provide a cooperative planning/execution mode for Teiid that utilizes the Spark engine. > Roughly this would look like a Teiid master running embedded with the Spark master serving the typical JDBC/ODBC/OData endpoints. On an incoming query the optimizer would choose to process against Spark or to process with Teiid - if processing with Teiid that may still require submitting the job to a worker to avoid burdening the master. Alternatively the Teiid master could run in a separate pod with the additional serialization costs, however initially the remote Spark [JDBC/ODBC layer|https://spark.apache.org/docs/latest/sql-programming-guide.html#dis...] will not be available in the OpenShift effort. > If execution against Spark is chosen, then instead of a typical Teiid processor plan a spark job will be created instead. Initially this could be limited to relational plans, but could be expanded to include procedure language support translated to python, scala, etc. The spark job would represent each source access as a [temporary view|https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc...] accessing the relevant pushdown query. Ideally this would be executed against a Teiid Embedded instance running in the worker node. If remote this would incur an extra hop and have security considerations. This can be thought of as using Teiid for its virtualization and access layer features. The rest of the processing about the access layers could then be represented as Spark SQL. > For example a Teiid user query of "select * from hdfs.tbl h, oracle.tbl o where h.id = o.id order by h.col" would become the Spark SQL job: > CREATE TEMPORARY VIEW h > USING org.apache.spark.sql.jdbc > OPTIONS ( > url "jdbc:teiid:vdb", > dbtable "(select col ... from hdfs.tbl)", > fetchSize '1024, > ... > ) > CREATE TEMPORARY VIEW o > USING org.apache.spark.sql.jdbc > OPTIONS ( > url "jdbc:teiid:vdb", > dbtable "(select col ... from oracle.tbl)", > fetchSize '1024, > ... > ) > SELECT * FROM h inner join o on h.id > The challenges/considerations of this are: > * Utilizing embedded with coordinated VDB management. There's the associated issue of driver management as well. > * Translating Teiid SQL to Spark SQL. All Teiid functions, udfs, aggregate functions would need to be made known to Spark. Table function constructs, such as XMLTABLE, TEXTTABLE, etc. could initially just be treated as access layer concerns. Type issues would exist as xml/clob/json would map to string. > * no xa support > * we'd need to provide reasonable values for fetch size, partition information, etc. in the access layer queries. > * We'd have to determine the extent to which federated join optimizations need to be conveyed (dependent join and pushdown) as that would go beyond simply translating to Spark SQL. > * there's a potential to use [global temporary views|http://www.gatorsmile.io/globaltempview/] which is a more convenient way of adding virtualization to Spark. > * Large internal materialization should be re-targeted to Spark or JDG -- This message was sent by Atlassian JIRA (v7.2.3#72005)

8 years, 10 months

1
0
0 / 0

[JBoss JIRA] (TEIID-5003) Deprecate BatchedCommand, use BulkCommand instead

by Steven Hawkins (JIRA)

Steven Hawkins created TEIID-5003: ------------------------------------- Summary: Deprecate BatchedCommand, use BulkCommand instead Key: TEIID-5003 URL: https://issues.jboss.org/browse/TEIID-5003 Project: Teiid Issue Type: Quality Risk Components: Connector API Reporter: Steven Hawkins Assignee: Steven Hawkins Fix For: 10.0 To match the support methods it would be clearer to use BulkCommand instead of BatchedCommand. -- This message was sent by Atlassian JIRA (v7.2.3#72005)

8 years, 10 months

1
0
0 / 0

[JBoss JIRA] (TEIID-4918) Enhance infinispan support for muti-threaded puts or using putAll

by Ramesh Reddy (JIRA)

[ https://issues.jboss.org/browse/TEIID-4918?page=com.atlassian.jira.plugin... ] Ramesh Reddy commented on TEIID-4918: ------------------------------------- That is true, but the feature is not really Materialization, it is BulkUpdate. May be there is a middle ground we can achieve just using PutAll. > Enhance infinispan support for muti-threaded puts or using putAll > ----------------------------------------------------------------- > > Key: TEIID-4918 > URL: https://issues.jboss.org/browse/TEIID-4918 > Project: Teiid > Issue Type: Enhancement > Components: Infinispan > Affects Versions: 9.3, 8.12.x-6.4 > Reporter: Van Halbert > Assignee: Ramesh Reddy > Fix For: 10.0 > > > For performance reasons, it was requested that the infinispan connector/translator be enhanced to be multi-threaded when doing batch inserts or try using the putAll option. -- This message was sent by Atlassian JIRA (v7.2.3#72005)

8 years, 10 months

1
0
0 / 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

teiid-issues July 2017