[JBoss JIRA] (TEIID-4251) Built in support for Postgres DB as materialization target
by Steven Hawkins (Jira)
[ https://issues.jboss.org/browse/TEIID-4251?page=com.atlassian.jira.plugin... ]
Steven Hawkins commented on TEIID-4251:
---------------------------------------
The proposal is to have a simple internal (in-process) non-replicated materialization for things like sys/pg_catalog tables. Then for user facing internal materialization we'll use a pg instance per vdb.
We'll can design/allow for multi-tenet usage of the database, but initially have each database instance owned by a vdb. The biggest multi-tenet issue using the direct pg fdw integration is the imported schema name limitation - that is the local database name
HA could be obtained using crunchy data rather than the default pg - but we wouldn't say there was a need for DR given that the materialization can be fully rebuilt from sources.
Design choices:
1. Keep the notion of a status table / mvstatus function.
Pros: If we do expand to external materialization, we'll need that. It provides additional semantics of onError - wait or ignore. With pg the view is either populated or not. When not populated a query against the materialized view will fail. Alternatively we could have wait and ignore be implemented in a best effort fashion over top of the pg handling. Also it prevents any naming issues - see above pg requires the local and remote schema names to match (although that could probably be patched), and usage or existing pg schema names pg_toast, public, etc. could lead to matview table name conflicts (although that's pretty unlikely in practice).
Cons: It's additional complexity and typically best effort only. Without transactional consistency there isn't a hard guarantee on the actual mat view state. If we collocate the status on every target, we could get this down to a local rather than an xa transaction. Or we can require that a transaction is started to use a materialized view - similar to lob handling in pg. For the pg internal being discussed here, there is already the https://www.postgresql.org/docs/9.3/view-pg-matviews.html view.
Instead of the SQL shown above we'd still have to generate the appropriate:
create table xxx_mat (cols ...); -- the difference here is that we need to have the type logic from the jdbc translator, or the pg layer provide the mapping explicitly
create index ... on xxx_mat
2. Refresh strategies
PG by default only supports fully snapshot refresh, which can either be invalidating or in the background.
If we only keep the status table, but not use the mvstatus function, we can still offer a refresh based upon a change event threshold.
Row updates are not possible with postgresql materialized views, so predicate based refresh or eventually row/column level cdc is not possible.
2a. Snapshot refresh
PG uses the refresh materialized view statement, which we can map to from our sysadmin procedure. However we need to also implement time based refresh. We previously relied on each teiid instance to run a timer job for every materialization action and then use cluster communication to determine if the local node should be responsible for the given refresh. In the new design we can make use of kube cron jobs: https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/ at the least that job image would need a pg client along with some script of "refresh materialized view x CONCURRENTLY" and of course use the system account secret for the pg instance.
3. Connectivity to pg, and pg to Teiid
Connectivity to pg is straight-forward. The instance creation should also create a secret for a system account. Connectivity from pg to Teiid implies that we will also need to have a service account for Teiid. That seems tricky given the sso integration - do we federate in our own authentication system for that? Alternatively we have to consider using a client ip or certificate approach over the pg transport to identify the pg instance.
[~rareddy] what are your thoughts?
> Built in support for Postgres DB as materialization target
> ----------------------------------------------------------
>
> Key: TEIID-4251
> URL: https://issues.jboss.org/browse/TEIID-4251
> Project: Teiid
> Issue Type: Sub-task
> Components: Server
> Reporter: Ramesh Reddy
> Assignee: Steven Hawkins
> Priority: Major
> Fix For: 12.2
>
>
> If Postgres database is available along with install or assumed that it is available, then some of the materialization task can be automated, like
> - Creation of a common STATUS table
> - Creation of the materilization targets (create views on dbms)
> - On load, on undeploy and load scripts for all the materialization views
> We need to device a way this to be pluggable, such that based on success of this, we can provide additional support for other sources.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 9 months
[JBoss JIRA] (TEIID-5707) Must pushdown correlated subquery without from fails
by Steven Hawkins (Jira)
[ https://issues.jboss.org/browse/TEIID-5707?page=com.atlassian.jira.plugin... ]
Steven Hawkins resolved TEIID-5707.
-----------------------------------
Fix Version/s: 11.2.3
12.1.1
Resolution: Done
The correlated reference visitor was not considering the case of having a subquery without a from clause correctly.
> Must pushdown correlated subquery without from fails
> ----------------------------------------------------
>
> Key: TEIID-5707
> URL: https://issues.jboss.org/browse/TEIID-5707
> Project: Teiid
> Issue Type: Bug
> Components: Query Engine
> Reporter: Steven Hawkins
> Assignee: Steven Hawkins
> Priority: Major
> Fix For: 12.2, 11.2.3, 12.1.1
>
>
> A query such as:
> select (select func(e1)) from g1
> where func is a must pushdown function will fail due to not finding the correlated value e1. This happens regardless of whether func is from the same source as g1 - because currently the plan will be the same either way (we don't yet see a way to associate the subplan with the g1 source).
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 9 months
[JBoss JIRA] (TEIID-5707) Must pushdown correlated subquery without from fails
by Steven Hawkins (Jira)
Steven Hawkins created TEIID-5707:
-------------------------------------
Summary: Must pushdown correlated subquery without from fails
Key: TEIID-5707
URL: https://issues.jboss.org/browse/TEIID-5707
Project: Teiid
Issue Type: Bug
Components: Query Engine
Reporter: Steven Hawkins
Assignee: Steven Hawkins
Fix For: 12.2
A query such as:
select (select func(e1)) from g1
where func is a must pushdown function will fail due to not finding the correlated value e1. This happens regardless of whether func is from the same source as g1 - because currently the plan will be the same either way (we don't yet see a way to associate the subplan with the g1 source).
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 9 months
[JBoss JIRA] (TEIIDSB-4) Provide rest access with SpringBoot
by Steven Hawkins (Jira)
[ https://issues.jboss.org/browse/TEIIDSB-4?page=com.atlassian.jira.plugin.... ]
Steven Hawkins commented on TEIIDSB-4:
--------------------------------------
Linking to the operator / metadata issues - ideally we can generate the rest interface from the full metadata in a build phase.
> Provide rest access with SpringBoot
> -----------------------------------
>
> Key: TEIIDSB-4
> URL: https://issues.jboss.org/browse/TEIIDSB-4
> Project: Teiid Spring Boot
> Issue Type: Feature Request
> Reporter: Steven Hawkins
> Assignee: Ramesh Reddy
> Priority: Major
>
> The current rest access layer is based upon the generation of a war with deployment hooks in wildfly. There should be a build-time or similar mechanism to expose rest with spring boot.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 9 months
[JBoss JIRA] (TEIID-4051) Swagger metadata includes internal java constructs
by Steven Hawkins (Jira)
[ https://issues.jboss.org/browse/TEIID-4051?page=com.atlassian.jira.plugin... ]
Steven Hawkins updated TEIID-4051:
----------------------------------
Fix Version/s: 12.x
(was: 12.2)
Pulling out of 12.2 - we need to see how this is resolved on the Teiid Spring Boot side before taking any steps on the full server side.
> Swagger metadata includes internal java constructs
> --------------------------------------------------
>
> Key: TEIID-4051
> URL: https://issues.jboss.org/browse/TEIID-4051
> Project: Teiid
> Issue Type: Quality Risk
> Components: Server
> Affects Versions: 8.12.x
> Reporter: Steven Hawkins
> Assignee: Steven Hawkins
> Priority: Major
> Fix For: 12.x
>
>
> From IntegrationTestRestWarGeneration the swagger metadata includes definitions for Statement, ResultSet, StreamingOutput, etc.:
> ...
> InputStream:
> type: "object"
> Statement:
> type: "object"
> properties:
> queryTimeout:
> type: "integer"
> format: "int32"
> closed:
> type: "boolean"
> default: false
> resultSet:
> $ref: "#/definitions/ResultSet"
> ...
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 9 months
[JBoss JIRA] (TEIID-5706) Use streaming/iterator OData result building
by Steven Hawkins (Jira)
Steven Hawkins created TEIID-5706:
-------------------------------------
Summary: Use streaming/iterator OData result building
Key: TEIID-5706
URL: https://issues.jboss.org/browse/TEIID-5706
Project: Teiid
Issue Type: Sub-task
Components: OData
Reporter: Steven Hawkins
Assignee: Steven Hawkins
Add support for building results off of a stream, rather than fully realizing the entity collection and then serializing.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 9 months