[teiid-issues] [JBoss JIRA] (TEIID-4251) Built in support for Postgres DB as materialization target

Tue Apr 2 11:45:08 EDT 2019

    [ https://issues.jboss.org/browse/TEIID-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716725#comment-13716725 ] 

Steven Hawkins commented on TEIID-4251:
---------------------------------------

The proposal is to have a simple internal (in-process) non-replicated materialization for things like sys/pg_catalog tables.  Then for user facing internal materialization we'll use a pg instance per vdb. 

We'll can design/allow for multi-tenet usage of the database, but initially have each database instance owned by a vdb.  The biggest multi-tenet issue using the direct pg fdw integration is the imported schema name limitation - that is the local database name

HA could be obtained using crunchy data rather than the default pg - but we wouldn't say there was a need for DR given that the materialization can be fully rebuilt from sources.

Design choices:

1. Keep the notion of a status table / mvstatus function.  

Pros: If we do expand to external materialization, we'll need that.  It provides additional semantics of onError - wait or ignore.  With pg the view is either populated or not.  When not populated a query against the materialized view will fail.  Alternatively we could have wait and ignore be implemented in a best effort fashion over top of the pg handling.  Also it prevents any naming issues - see above pg requires the local and remote schema names to match (although that could probably be patched), and usage or existing pg schema names pg_toast, public, etc. could lead to matview table name conflicts (although that's pretty unlikely in practice).

Cons: It's additional complexity and typically best effort only.  Without transactional consistency there isn't a hard guarantee on the actual mat view state.  If we collocate the status on every target, we could get this down to a local rather than an xa transaction.  Or we can require that a transaction is started to use a materialized view - similar to lob handling in pg.  For the pg internal being discussed here, there is already the https://www.postgresql.org/docs/9.3/view-pg-matviews.html view.

Instead of the SQL shown above we'd still have to generate the appropriate:

create table xxx_mat (cols ...); -- the difference here is that we need to have the type logic from the jdbc translator, or the pg layer provide the mapping explicitly
create index ... on xxx_mat

2. Refresh strategies

PG by default only supports fully snapshot refresh, which can either be invalidating or in the background.
If we only keep the status table, but not use the mvstatus function, we can still offer a refresh based upon a change event threshold.
Row updates are not possible with postgresql materialized views, so predicate based refresh or eventually row/column level cdc is not possible.

2a. Snapshot refresh

PG uses the refresh materialized view statement, which we can map to from our sysadmin procedure.  However we need to also implement time based refresh.  We previously relied on each teiid instance to run a timer job for every materialization action and then use cluster communication to determine if the local node should be responsible for the given refresh.  In the new design we can make use of kube cron jobs:  https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/  at the least that job image would need a pg client along with some script of "refresh materialized view x CONCURRENTLY" and of course use the system account secret for the pg instance.

3. Connectivity to pg, and pg to Teiid

Connectivity to pg is straight-forward.  The instance creation should also create a secret for a system account.  Connectivity from pg to Teiid implies that we will also need to have a service account for Teiid.  That seems tricky given the sso integration - do we federate in our own authentication system for that?  Alternatively we have to consider using a client ip or certificate approach over the pg transport to identify the pg instance.

[~rareddy] what are your thoughts?

> Built in support for Postgres DB as materialization target
> ----------------------------------------------------------
>
>                 Key: TEIID-4251
>                 URL: https://issues.jboss.org/browse/TEIID-4251
>             Project: Teiid
>          Issue Type: Sub-task
>          Components: Server
>            Reporter: Ramesh Reddy
>            Assignee: Steven Hawkins
>            Priority: Major
>             Fix For: 12.2
>
>
> If Postgres database is available along with install or assumed that it is available, then some of the materialization task can be automated, like
> - Creation of a common STATUS table
> - Creation of the materilization targets (create views on dbms)
> - On load, on undeploy and load scripts for all the materialization views
> We need to device a way this to be pluggable, such that based on success of this, we can provide additional support for other sources. 

--
This message was sent by Atlassian Jira
(v7.12.1#712002)