[teiid-issues] [JBoss JIRA] (TEIID-4251) Built in support for Postgres DB as materialization target

Tuesday, 2 April 2019

    [
https://issues.jboss.org/browse/TEIID-4251?page=com.atlassian.jira.plugin...
] 

Steven Hawkins commented on TEIID-4251:
---------------------------------------

The proposal is to have a simple internal (in-process) non-replicated materialization for
things like sys/pg_catalog tables.  Then for user facing internal materialization
we'll use a pg instance per vdb. 

We'll can design/allow for multi-tenet usage of the database, but initially have each
database instance owned by a vdb.  The biggest multi-tenet issue using the direct pg fdw
integration is the imported schema name limitation - that is the local database name

HA could be obtained using crunchy data rather than the default pg - but we wouldn't
say there was a need for DR given that the materialization can be fully rebuilt from
sources.

Design choices:

1. Keep the notion of a status table / mvstatus function.  

Pros: If we do expand to external materialization, we'll need that.  It provides
additional semantics of onError - wait or ignore.  With pg the view is either populated or
not.  When not populated a query against the materialized view will fail.  Alternatively
we could have wait and ignore be implemented in a best effort fashion over top of the pg
handling.  Also it prevents any naming issues - see above pg requires the local and remote
schema names to match (although that could probably be patched), and usage or existing pg
schema names pg_toast, public, etc. could lead to matview table name conflicts (although
that's pretty unlikely in practice).

Cons: It's additional complexity and typically best effort only.  Without
transactional consistency there isn't a hard guarantee on the actual mat view state. 
If we collocate the status on every target, we could get this down to a local rather than
an xa transaction.  Or we can require that a transaction is started to use a materialized
view - similar to lob handling in pg.  For the pg internal being discussed here, there is
already the https://www.postgresql.org/docs/9.3/view-pg-matviews.html view.

Instead of the SQL shown above we'd still have to generate the appropriate:

create table xxx_mat (cols ...); -- the difference here is that we need to have the type
logic from the jdbc translator, or the pg layer provide the mapping explicitly
create index ... on xxx_mat

2. Refresh strategies

PG by default only supports fully snapshot refresh, which can either be invalidating or in
the background.
If we only keep the status table, but not use the mvstatus function, we can still offer a
refresh based upon a change event threshold.
Row updates are not possible with postgresql materialized views, so predicate based
refresh or eventually row/column level cdc is not possible.

2a. Snapshot refresh

PG uses the refresh materialized view statement, which we can map to from our sysadmin
procedure.  However we need to also implement time based refresh.  We previously relied on
each teiid instance to run a timer job for every materialization action and then use
cluster communication to determine if the local node should be responsible for the given
refresh.  In the new design we can make use of kube cron jobs: 
https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/  at the least that
job image would need a pg client along with some script of "refresh materialized view
x CONCURRENTLY" and of course use the system account secret for the pg instance.

3. Connectivity to pg, and pg to Teiid

Connectivity to pg is straight-forward.  The instance creation should also create a secret
for a system account.  Connectivity from pg to Teiid implies that we will also need to
have a service account for Teiid.  That seems tricky given the sso integration - do we
federate in our own authentication system for that?  Alternatively we have to consider
using a client ip or certificate approach over the pg transport to identify the pg
instance.

[~rareddy] what are your thoughts?

...
 Built in support for Postgres DB as materialization target
 ----------------------------------------------------------

                 Key: TEIID-4251
                 URL: https://issues.jboss.org/browse/TEIID-4251
             Project: Teiid
          Issue Type: Sub-task
          Components: Server
            Reporter: Ramesh Reddy
            Assignee: Steven Hawkins
            Priority: Major
             Fix For: 12.2

 If Postgres database is available along with install or assumed that it is available,
then some of the materialization task can be automated, like
 - Creation of a common STATUS table
 - Creation of the materilization targets (create views on dbms)
 - On load, on undeploy and load scripts for all the materialization views
 We need to device a way this to be pluggable, such that based on success of this, we can
provide additional support for other sources.  

--
This message was sent by Atlassian Jira
(v7.12.1#712002)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[teiid-issues] [JBoss JIRA] (TEIID-4251) Built in support for Postgres DB as materialization target