[teiid-issues] [JBoss JIRA] (TEIID-4758) Permanent materialization load failure is when target source goes down

Tue Feb 14 12:30:00 EST 2017

    [ https://issues.jboss.org/browse/TEIID-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13363389#comment-13363389 ] 

Ramesh Reddy commented on TEIID-4758:
-------------------------------------

I have tried to duplicate the error using 3 different databases. Here is the behavior

H2 -> Saves the {{Status}} table
PG -> Is Source Database. I have TPCH generated data.
MySQL -> Target database to save the materialized view, which takes several minutes to complete.

Now when MySQL is taken down during, the materialized run the sequence of events happen like below
{code}
scheduling job on table large_view
Name = large_view LoadState = LOADING Valid = FALSE
Name = large_view LoadState = LOADING Valid = FALSE
2017-02-14 11:05:01,074 WARN [Worker1_QueryProcessorQueue60] org.teiid.CONNECTOR - Connector worker process failed for atomic-request=As/Jp3qn4/Vd.0.102.31
Name = large_view LoadState = FAILED_LOAD Valid = FALSE
scheduling job on table large_view
Name = large_view LoadState = LOADING Valid = FALSE
Name = large_view LoadState = LOADING Valid = FALSE
{code}

so, it means if target database is not available, it flips the loading status to "FAIL" but since it immediately starts another scheduled job, it flips the status back to "loading". When the load fails again, then cycle is continued.

Now if I keep the target database alive and take down the source database (PG) in this case, the sequence is 

{code}
scheduling job on table large_view
Name = large_view LoadState = LOADING Valid = FALSE
Name = large_view LoadState = LOADING Valid = FALSE
2017-02-14 11:15:47,740 WARN [Worker0_QueryProcessorQueue14] org.teiid.CONNECTOR - Connector worker process failed for atomic-request=ae3kcWA7YCfi.0.101.4
scheduling job on table large_view
Name = large_view LoadState = FAILED_LOAD Valid = FALSE
2017-02-14 11:15:50,787 WARN [Worker3_QueryProcessorQueue34] org.teiid.CONNECTOR - Connector worker process failed for atomic-request=TI1aTvRX4JcG.0.113.18
scheduling job on table large_view
Name = large_view LoadState = FAILED_LOAD Valid = FALSE
2017-02-14 11:15:53,821 WARN [Worker3_QueryProcessorQueue44] org.teiid.CONNECTOR - Connector worker process failed for atomic-request=z/modN36UD/y.0.125.23
scheduling job on table large_view
2017-02-14 11:15:56,850 WARN [Worker3_QueryProcessorQueue55] org.teiid.CONNECTOR - Connector worker process failed for atomic-request=TUKsriWs1ito.0.137.29
scheduling job on table large_view
Name = large_view LoadState = FAILED_LOAD Valid = FALSE
2017-02-14 11:15:59,906 WARN [Worker3_QueryProcessorQueue65] org.teiid.CONNECTOR - Connector worker process failed for atomic-request=+33ACkiLU6TS.0.101.34
scheduling job on table large_view
scheduling job on table large_view
Name = large_view LoadState = FAILED_LOAD Valid = FALSE
scheduling job on table large_view
{code}

Here when the source is down, the materialization fails fast, and re-schedules. The cycle happens quickly, repeatedly.

The simplest solution may be to check the Validity of the VDB, before scheduling the next job. When a database source is down, the VDB is still active, but its validity is false. But this will only work in Server mode as Teiid server actively monitors the data source connections. 

> Permanent materialization load failure is when target source goes down
> ----------------------------------------------------------------------
>
>                 Key: TEIID-4758
>                 URL: https://issues.jboss.org/browse/TEIID-4758
>             Project: Teiid
>          Issue Type: Bug
>          Components: Server
>    Affects Versions: 8.12
>            Reporter: Ramesh Reddy
>            Assignee: Ramesh Reddy
>
> During the external materialization load, if the target cache database goes offline, the materialization job stops, but the {{Status}} table is left in {{LOADING}} state, which will never recover when the target cache database comes back up again.
> This situation is observed when JDG is used in OpenShift along with JDV However, behavior can occur in standalone situations too. The system should resilient and must recover in this situation.

--
This message was sent by Atlassian JIRA
(v7.2.3#72005)