[
https://issues.jboss.org/browse/TEIID-4251?page=com.atlassian.jira.plugin...
]
Steven Hawkins commented on TEIID-4251:
---------------------------------------
The proposal is to have a simple internal (in-process) non-replicated materialization for
things like sys/pg_catalog tables. Then for user facing internal materialization
we'll use a pg instance per vdb.
We'll can design/allow for multi-tenet usage of the database, but initially have each
database instance owned by a vdb. The biggest multi-tenet issue using the direct pg fdw
integration is the imported schema name limitation - that is the local database name
HA could be obtained using crunchy data rather than the default pg - but we wouldn't
say there was a need for DR given that the materialization can be fully rebuilt from
sources.
Design choices:
1. Keep the notion of a status table / mvstatus function.
Pros: If we do expand to external materialization, we'll need that. It provides
additional semantics of onError - wait or ignore. With pg the view is either populated or
not. When not populated a query against the materialized view will fail. Alternatively
we could have wait and ignore be implemented in a best effort fashion over top of the pg
handling. Also it prevents any naming issues - see above pg requires the local and remote
schema names to match (although that could probably be patched), and usage or existing pg
schema names pg_toast, public, etc. could lead to matview table name conflicts (although
that's pretty unlikely in practice).
Cons: It's additional complexity and typically best effort only. Without
transactional consistency there isn't a hard guarantee on the actual mat view state.
If we collocate the status on every target, we could get this down to a local rather than
an xa transaction. Or we can require that a transaction is started to use a materialized
view - similar to lob handling in pg. For the pg internal being discussed here, there is
already the
https://www.postgresql.org/docs/9.3/view-pg-matviews.html view.
Instead of the SQL shown above we'd still have to generate the appropriate:
create table xxx_mat (cols ...); -- the difference here is that we need to have the type
logic from the jdbc translator, or the pg layer provide the mapping explicitly
create index ... on xxx_mat
2. Refresh strategies
PG by default only supports fully snapshot refresh, which can either be invalidating or in
the background.
If we only keep the status table, but not use the mvstatus function, we can still offer a
refresh based upon a change event threshold.
Row updates are not possible with postgresql materialized views, so predicate based
refresh or eventually row/column level cdc is not possible.
2a. Snapshot refresh
PG uses the refresh materialized view statement, which we can map to from our sysadmin
procedure. However we need to also implement time based refresh. We previously relied on
each teiid instance to run a timer job for every materialization action and then use
cluster communication to determine if the local node should be responsible for the given
refresh. In the new design we can make use of kube cron jobs:
https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/ at the least that
job image would need a pg client along with some script of "refresh materialized view
x CONCURRENTLY" and of course use the system account secret for the pg instance.
3. Connectivity to pg, and pg to Teiid
Connectivity to pg is straight-forward. The instance creation should also create a secret
for a system account. Connectivity from pg to Teiid implies that we will also need to
have a service account for Teiid. That seems tricky given the sso integration - do we
federate in our own authentication system for that? Alternatively we have to consider
using a client ip or certificate approach over the pg transport to identify the pg
instance.
[~rareddy] what are your thoughts?
Built in support for Postgres DB as materialization target
----------------------------------------------------------
Key: TEIID-4251
URL:
https://issues.jboss.org/browse/TEIID-4251
Project: Teiid
Issue Type: Sub-task
Components: Server
Reporter: Ramesh Reddy
Assignee: Steven Hawkins
Priority: Major
Fix For: 12.2
If Postgres database is available along with install or assumed that it is available,
then some of the materialization task can be automated, like
- Creation of a common STATUS table
- Creation of the materilization targets (create views on dbms)
- On load, on undeploy and load scripts for all the materialization views
We need to device a way this to be pluggable, such that based on success of this, we can
provide additional support for other sources.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)