[JBoss JIRA] (TEIID-4251) Built in support for Postgres DB as materialization target
by Steven Hawkins (Jira)
[ https://issues.redhat.com/browse/TEIID-4251?page=com.atlassian.jira.plugi... ]
Steven Hawkins commented on TEIID-4251:
---------------------------------------
Linking to the multi-pod issue as there is discussion here about the use of local timer jobs vs. centralizing.
> Built in support for Postgres DB as materialization target
> ----------------------------------------------------------
>
> Key: TEIID-4251
> URL: https://issues.redhat.com/browse/TEIID-4251
> Project: Teiid
> Issue Type: Feature Request
> Components: Server
> Reporter: Ramesh Reddy
> Priority: Major
> Fix For: Backlog
>
>
> If Postgres database is available along with install or assumed that it is available, then some of the materialization task can be automated, like
> - Creation of a common STATUS table
> - Creation of the materilization targets (create views on dbms)
> - On load, on undeploy and load scripts for all the materialization views
> We need to device a way this to be pluggable, such that based on success of this, we can provide additional support for other sources.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 6 months
[JBoss JIRA] (TEIID-5318) MaterializtionManager probe can be too rapid.
by Steven Hawkins (Jira)
[ https://issues.redhat.com/browse/TEIID-5318?page=com.atlassian.jira.plugi... ]
Steven Hawkins resolved TEIID-5318.
-----------------------------------
Fix Version/s: (was: Backlog)
Resolution: Won't Fix
> MaterializtionManager probe can be too rapid.
> ---------------------------------------------
>
> Key: TEIID-5318
> URL: https://issues.redhat.com/browse/TEIID-5318
> Project: Teiid
> Issue Type: Quality Risk
> Components: Server
> Reporter: Steven Hawkins
> Priority: Major
>
> If a ttl load is in progress the probe is repeated every minute - regardless of how much time is remaining. This can happen due to manual loads or in clusters where members start the materialization jobs with some skew. While this is not harmful, it does lead to a lot of logging and queries against the status table which could be avoided.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 6 months
[JBoss JIRA] (TEIID-4997) Teiid on/with Spark
by Steven Hawkins (Jira)
[ https://issues.redhat.com/browse/TEIID-4997?page=com.atlassian.jira.plugi... ]
Steven Hawkins resolved TEIID-4997.
-----------------------------------
Fix Version/s: (was: Backlog)
Resolution: Won't Do
Marking as won't do. Will rewrite as a Teiid on/with Presto.
> Teiid on/with Spark
> -------------------
>
> Key: TEIID-4997
> URL: https://issues.redhat.com/browse/TEIID-4997
> Project: Teiid
> Issue Type: Feature Request
> Components: Build/Kits, Query Engine
> Reporter: Steven Hawkins
> Assignee: Steven Hawkins
> Priority: Major
>
> With the availability of Spark on OpenShift, we should provide a cooperative planning/execution mode for Teiid that utilizes the Spark engine.
> Roughly this would look like a Teiid master running embedded with the Spark master serving the typical JDBC/ODBC/OData endpoints. On an incoming query the optimizer would choose to process against Spark or to process with Teiid - if processing with Teiid that may still require submitting the job to a worker to avoid burdening the master. Alternatively the Teiid master could run in a separate pod with the additional serialization costs, however initially the remote Spark [JDBC/ODBC layer|https://spark.apache.org/docs/latest/sql-programming-guide.html#dis...] will not be available in the OpenShift effort.
> If execution against Spark is chosen, then instead of a typical Teiid processor plan a spark job will be created instead. Initially this could be limited to relational plans, but could be expanded to include procedure language support translated to python, scala, etc. The spark job would represent each source access as a [temporary view|https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc...] accessing the relevant pushdown query. Ideally this would be executed against a Teiid Embedded instance running in the worker node. If remote this would incur an extra hop and have security considerations. This can be thought of as using Teiid for its virtualization and access layer features. The rest of the processing about the access layers could then be represented as Spark SQL.
> For example a Teiid user query of "select * from hdfs.tbl h, oracle.tbl o where h.id = o.id order by h.col" would become the Spark SQL job:
> CREATE TEMPORARY VIEW h
> USING org.apache.spark.sql.jdbc
> OPTIONS (
> url "jdbc:teiid:vdb",
> dbtable "(select col ... from hdfs.tbl)",
> fetchSize '1024,
> ...
> )
> CREATE TEMPORARY VIEW o
> USING org.apache.spark.sql.jdbc
> OPTIONS (
> url "jdbc:teiid:vdb",
> dbtable "(select col ... from oracle.tbl)",
> fetchSize '1024,
> ...
> )
> SELECT * FROM h inner join o on h.id
> The challenges/considerations of this are:
> * Utilizing embedded with coordinated VDB management. There's the associated issue of driver management as well.
> * Translating Teiid SQL to Spark SQL. All Teiid functions, udfs, aggregate functions would need to be made known to Spark. Table function constructs, such as XMLTABLE, TEXTTABLE, etc. could initially just be treated as access layer concerns. Type issues would exist as xml/clob/json would map to string.
> * no xa support
> * we'd need to provide reasonable values for fetch size, partition information, etc. in the access layer queries.
> * We'd have to determine the extent to which federated join optimizations need to be conveyed (dependent join and pushdown) as that would go beyond simply translating to Spark SQL.
> * there's a potential to use [global temporary views|http://www.gatorsmile.io/globaltempview/] which is a more convenient way of adding virtualization to Spark.
> * Large internal materialization should be re-targeted to Spark or JDG
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 6 months
[JBoss JIRA] (TEIID-5802) Row based security and column masking don't work correctly for materialized tables (views)
by Steven Hawkins (Jira)
[ https://issues.redhat.com/browse/TEIID-5802?page=com.atlassian.jira.plugi... ]
Steven Hawkins updated TEIID-5802:
----------------------------------
Fix Version/s: Backlog
Placing in the backlog for now. It's not clear that there is a simple resolution here.
> Row based security and column masking don't work correctly for materialized tables (views)
> ------------------------------------------------------------------------------------------
>
> Key: TEIID-5802
> URL: https://issues.redhat.com/browse/TEIID-5802
> Project: Teiid
> Issue Type: Bug
> Components: Query Engine
> Affects Versions: 12.0
> Environment: teiid-12.0.0 on WildFly Full 14.0.1.Final (WildFly Core 6.0.2.Final)
> Reporter: Dmitrii Pogorelov
> Assignee: Steven Hawkins
> Priority: Major
> Fix For: Backlog
>
>
> Row based security doesn't work for materialized tables (views). For example, having the following permissions for materialized views.test_view1 and views.test_view2 views according to query plan Teiid applies ApplySecurity rule only for views.test_view2 view but should also apply for views.test_view1 view:
> {code:xml}
> <data-role name="role1" any-authenticated="true" allow-create-temporary-tables="true">
> <description>Allow read only</description>
> <permission>
> <resource-name>dsp</resource-name>
> <allow-read>true</allow-read>
> </permission>
> <permission>
> <resource-name>views.test_view1</resource-name>
> <allow-read>true</allow-read>
> <condition constraint="false">col2 > 1</condition>
> </permission>
> <permission>
> <resource-name>views.test_view2</resource-name>
> <allow-read>true</allow-read>
> <condition constraint="false">col0 = 'sa'</condition>
> </permission>
> </data-role>
> {code}
> The same situation is for column masking feature, see below.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 6 months
[JBoss JIRA] (TEIID-5965) Allow variables to be used as TextTable delimeters, row delimeters, quote, header, skip rows, and escape characters
by Dmitrii Pogorelov (Jira)
[ https://issues.redhat.com/browse/TEIID-5965?page=com.atlassian.jira.plugi... ]
Dmitrii Pogorelov commented on TEIID-5965:
------------------------------------------
[~shawkins] thx a lot for your comment
>In total you are requesting delimiter, header and skip be parameterizable.
I apologize for the inconsistency, I think it would be better to make all these DELIMITER, HEADER, SKIP, ROW DELIMITER, ESCAPE and QUOTE values parameterizable. In this case, the parameterization can simplify further development.
> Allow variables to be used as TextTable delimeters, row delimeters, quote, header, skip rows, and escape characters
> -------------------------------------------------------------------------------------------------------------------
>
> Key: TEIID-5965
> URL: https://issues.redhat.com/browse/TEIID-5965
> Project: Teiid
> Issue Type: Enhancement
> Components: Query Engine
> Affects Versions: 13.1
> Reporter: Dmitrii Pogorelov
> Assignee: Steven Hawkins
> Priority: Major
> Fix For: 15.0
>
> Original Estimate: 5 hours
> Remaining Estimate: 5 hours
>
> In the specified example, the delimiter is TAB. Many web APIs allow customization of the delimiter character, and in order to provide a generic parser, sometimes it is way easier to define a delimiter as a variable, and not need to create a long nested structure with IF-THEN-ELSE-IF-ELSE constructs:
> {code:sql}
> Select * From TextTable (
> 'c1 c2
> 1 2'
> Columns
> c1 integer,
> c2 integer
> Delimiter E'\t'
> Header 1
> )x;
> {code}
> Imagine that based on setup on API side (out of our control) the content can be delivered via tab or semicolon, e.g.
> {code}
> c1;c2
> 1;2
> {code}
> or
> {code}
> c1 c2
> 1 2
> {code}
> Let's save this response into a variable and see the code, which we will need to write depending on the setup:
> {code:sql}
> Begin
> ...
> If (delimiter = 'tab')
> Begin
> Select * From TextTable (
> apiResponse
> Columns
> c1 integer,
> c2 integer
> Delimiter E'\t'
> Header 1
> )x;
> End
> Else If (delimiter = 'tab')
> Begin
> Select * From TextTable (
> apiResponse
> Columns
> c1 integer,
> c2 integer
> Delimiter ';'
> Header 1
> )x;
> End
> End
> {code}
> The if-else block is constantly growing, especially if we want to customize quote, escape, delimiter, and row delimiter.
> Thus it would be great if we could make these values configurable. In this case, we could end up with this expected code, which is more readable and more easily customizable:
> {code:sql}
> Begin
> ...
> Declare string delimiter = E'\t';
> Select * From TextTable (
> apiResponse
> Columns
> c1 integer,
> c2 integer
> Delimiter delimiter
> Header 1
> )x;
> End
> {code}
> With a bit of tweaking and certain assumptions, leading in the trust level, we can even read the first line and try to auto-detect the delimiter automatically (e.g. by counting tabs, commas, and semicolons in the first line).
> Please, could you be so kind as to make HEADER and SKIP values customizable (rather than hardcoded numbers)?
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 6 months
[JBoss JIRA] (TEIID-4363) ParseHub - support for web documents
by Steven Hawkins (Jira)
[ https://issues.redhat.com/browse/TEIID-4363?page=com.atlassian.jira.plugi... ]
Steven Hawkins updated TEIID-4363:
----------------------------------
Fix Version/s: Backlog
Open To Community
> ParseHub - support for web documents
> ------------------------------------
>
> Key: TEIID-4363
> URL: https://issues.redhat.com/browse/TEIID-4363
> Project: Teiid
> Issue Type: Feature Request
> Components: Misc. Connectors
> Reporter: Kim Palko
> Priority: Major
> Fix For: Backlog, Open To Community
>
>
> ParseHub is a new web browser extension that you can use to turn any dynamic and poorly structured website into an API, without writing code. ParseHub is a scraping tool that is designed to work on websites with JavaScript and Ajax; it is similar to web scraping tools such as Import.io and Kimono Labs.
> The ParseHub tool will identify relationships between elements, extract all of the data and provide it in a spreadsheet or easily accessible API for you. Both scrapers and data are cloud hosted. All you have to do is to download the ParseHub browser extension and start extracting the data you want. Watch this 60 second video of ParseHub.
> https://www.parsehub.com/
> http://scraping.pro/turn-any-interactive-website-into-api-with-parsehub/
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
4 years, 6 months