[jboss-jira] [JBoss JIRA] (WFLY-10755) ISPN000208: No live owners found for segments

Sat Jul 28 12:36:00 EDT 2018

     [ https://issues.jboss.org/browse/WFLY-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

tommaso borgato updated WFLY-10755:
-----------------------------------
    Description: 
This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/20/console-dev212/].
The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:

{noformat}
<cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
  <transport lock-timeout="60000"/>
  <distributed-cache owners="2" name="dist">
    <locking isolation="REPEATABLE_READ"/>
    <transaction mode="BATCH"/>
    <file-store/>
  </distributed-cache>
  <replicated-cache name="repl">
    <locking isolation="REPEATABLE_READ"/>
    <transaction mode="BATCH"/>
    <file-store/>
  </replicated-cache>
  <invalidation-cache name="offload">
    <locking isolation="REPEATABLE_READ"/>
    <transaction mode="BATCH"/>
    <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
      <table prefix="s">
        <id-column name="id" type="VARCHAR(255)"/>
        <data-column name="datum" type="BYTEA"/>
        <timestamp-column name="version" type="BIGINT"/>
      </table>
    </jdbc-store>
  </invalidation-cache>
</cache-container>
{noformat}

The error is observed on node dev212:

right after Node dev214 left the cluster:

{noformat}
[JBossINF] [0m[0m09:08:34,196 INFO  [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
[JBossINF] [0m[0m09:08:34,197 INFO  [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[33m09:08:34,362 WARN  [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
[JBossINF] 	at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
[JBossINF] 	at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
[JBossINF] 	at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
[JBossINF] 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[JBossINF] 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[JBossINF] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[JBossINF] 	at java.lang.Thread.run(Thread.java:748)
[JBossINF] 
...
[JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
{noformat}

right after Node dev215 left the cluster:

{noformat}
[JBossINF] [0m[0m09:11:32,029 INFO  [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m09:11:32,029 INFO  [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
[JBossINF] [0m[0m09:11:32,029 INFO  [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[0m09:12:29,829 INFO  [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
{noformat}

Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
This run already used modified jgroups time-outs:
{noformat}
<protocol type="FD_ALL">
  <property name="timeout">10000</property>
  <property name="interval">2000</property>
  <property name="timeout_check_interval">1000</property>
</protocol>
<protocol type="VERIFY_SUSPECT">
  <property name="timeout">1000</property>
</protocol>
{noformat}
but the error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/18/console-dev215/] where those values were unmodified.

  was:
This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/20/console-dev212/].
The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:

{noformat}
<cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
  <transport lock-timeout="60000"/>
  <distributed-cache owners="2" name="dist">
    <locking isolation="REPEATABLE_READ"/>
    <transaction mode="BATCH"/>
    <file-store/>
  </distributed-cache>
  <replicated-cache name="repl">
    <locking isolation="REPEATABLE_READ"/>
    <transaction mode="BATCH"/>
    <file-store/>
  </replicated-cache>
  <invalidation-cache name="offload">
    <locking isolation="REPEATABLE_READ"/>
    <transaction mode="BATCH"/>
    <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
      <table prefix="s">
        <id-column name="id" type="VARCHAR(255)"/>
        <data-column name="datum" type="BYTEA"/>
        <timestamp-column name="version" type="BIGINT"/>
      </table>
    </jdbc-store>
  </invalidation-cache>
</cache-container>
{noformat}

The error is observed on node dev212:

right after Node dev214 left the cluster:

{noformat}
[JBossINF] [0m[0m09:08:34,196 INFO  [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
[JBossINF] [0m[0m09:08:34,197 INFO  [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[33m09:08:34,362 WARN  [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
[JBossINF] 	at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
[JBossINF] 	at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
[JBossINF] 	at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
[JBossINF] 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[JBossINF] 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[JBossINF] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[JBossINF] 	at java.lang.Thread.run(Thread.java:748)
[JBossINF] 
...
[JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
{noformat}

right after Node dev215 left the cluster:

{noformat}
[JBossINF] [0m[0m09:11:32,029 INFO  [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m09:11:32,029 INFO  [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
[JBossINF] [0m[0m09:11:32,029 INFO  [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[0m09:12:29,829 INFO  [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
{noformat}

Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
This run already used modified jgroups time-outs:
{noformat}
<protocol type="FD_ALL">
  <property name="timeout">10000</property>
  <property name="interval">2000</property>
  <property name="timeout_check_interval">1000</property>
</protocol>
<protocol type="VERIFY_SUSPECT">
  <property name="timeout">1000</property>
</protocol>
{noformat}
but the error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/18/console-dev215/] where values were unmodified.

> ISPN000208: No live owners found for segments
> ---------------------------------------------
>
>                 Key: WFLY-10755
>                 URL: https://issues.jboss.org/browse/WFLY-10755
>             Project: WildFly
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 14.0.0.CR1
>            Reporter: tommaso borgato
>            Assignee: Paul Ferraro
>
> This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/20/console-dev212/].
> The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
> {noformat}
> <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
>   <transport lock-timeout="60000"/>
>   <distributed-cache owners="2" name="dist">
>     <locking isolation="REPEATABLE_READ"/>
>     <transaction mode="BATCH"/>
>     <file-store/>
>   </distributed-cache>
>   <replicated-cache name="repl">
>     <locking isolation="REPEATABLE_READ"/>
>     <transaction mode="BATCH"/>
>     <file-store/>
>   </replicated-cache>
>   <invalidation-cache name="offload">
>     <locking isolation="REPEATABLE_READ"/>
>     <transaction mode="BATCH"/>
>     <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
>       <table prefix="s">
>         <id-column name="id" type="VARCHAR(255)"/>
>         <data-column name="datum" type="BYTEA"/>
>         <timestamp-column name="version" type="BIGINT"/>
>       </table>
>     </jdbc-store>
>   </invalidation-cache>
> </cache-container>
> {noformat}
> The error is observed on node dev212:
> right after Node dev214 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:08:34,196 INFO  [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
> [JBossINF] [0m[0m09:08:34,197 INFO  [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[33m09:08:34,362 WARN  [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
> [JBossINF] 	at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
> [JBossINF] 	at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
> [JBossINF] 	at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
> [JBossINF] 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [JBossINF] 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> [JBossINF] 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> [JBossINF] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [JBossINF] 	at java.lang.Thread.run(Thread.java:748)
> [JBossINF] 
> ...
> [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
> {noformat}
> right after Node dev215 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:11:32,029 INFO  [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO  [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO  [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[0m09:12:29,829 INFO  [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
> {noformat}
> Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
> This run already used modified jgroups time-outs:
> {noformat}
> <protocol type="FD_ALL">
>   <property name="timeout">10000</property>
>   <property name="interval">2000</property>
>   <property name="timeout_check_interval">1000</property>
> </protocol>
> <protocol type="VERIFY_SUSPECT">
>   <property name="timeout">1000</property>
> </protocol>
> {noformat}
> but the error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/18/console-dev215/] where those values were unmodified.

--
This message was sent by Atlassian JIRA
(v7.5.0#75005)