[JBoss JIRA] (WFLY-10755) ISPN000208: No live owners found for segments
by tommaso borgato (JIRA)
[ https://issues.jboss.org/browse/WFLY-10755?page=com.atlassian.jira.plugin... ]
tommaso borgato updated WFLY-10755:
-----------------------------------
Description:
This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
{noformat}
<cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache owners="2" name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
<replicated-cache name="repl">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</replicated-cache>
<invalidation-cache name="offload">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
<table prefix="s">
<id-column name="id" type="VARCHAR(255)"/>
<data-column name="datum" type="BYTEA"/>
<timestamp-column name="version" type="BIGINT"/>
</table>
</jdbc-store>
</invalidation-cache>
</cache-container>
{noformat}
h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
The error is observed on node dev212:
right after Node dev214 left the cluster:
{noformat}
[JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
[JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
[JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
[JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[JBossINF] at java.lang.Thread.run(Thread.java:748)
[JBossINF]
...
[JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
{noformat}
right after Node dev215 left the cluster:
{noformat}
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
{noformat}
Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
This run already used modified jgroups time-outs:
{noformat}
<protocol type="FD_ALL">
<property name="timeout">10000</property>
<property name="interval">2000</property>
<property name="timeout_check_interval">1000</property>
</protocol>
<protocol type="VERIFY_SUSPECT">
<property name="timeout">1000</property>
</protocol>
{noformat}
h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
{noformat}
<FD_ALL timeout="60000"
interval="15000"
timeout_check_interval="5000"
/>
{noformat}
In this run, the error is observed on node dev212:
{noformat}
[JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected
{noformat}
but the logs on dev214 show the node wasn't down; it was just restarted and logged the following:
{noformat}
[JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand)
2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started!
[JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms)
[JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms)
[JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
...
{noformat}
was:
h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
{noformat}
<cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache owners="2" name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
<replicated-cache name="repl">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</replicated-cache>
<invalidation-cache name="offload">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
<table prefix="s">
<id-column name="id" type="VARCHAR(255)"/>
<data-column name="datum" type="BYTEA"/>
<timestamp-column name="version" type="BIGINT"/>
</table>
</jdbc-store>
</invalidation-cache>
</cache-container>
{noformat}
The error is observed on node dev212:
right after Node dev214 left the cluster:
{noformat}
[JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
[JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
[JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
[JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[JBossINF] at java.lang.Thread.run(Thread.java:748)
[JBossINF]
...
[JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
{noformat}
right after Node dev215 left the cluster:
{noformat}
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
{noformat}
Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
This run already used modified jgroups time-outs:
{noformat}
<protocol type="FD_ALL">
<property name="timeout">10000</property>
<property name="interval">2000</property>
<property name="timeout_check_interval">1000</property>
</protocol>
<protocol type="VERIFY_SUSPECT">
<property name="timeout">1000</property>
</protocol>
{noformat}
h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
{noformat}
<FD_ALL timeout="60000"
interval="15000"
timeout_check_interval="5000"
/>
{noformat}
In this run, the error is observed on node dev212:
{noformat}
[JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected
{noformat}
but the logs on dev214 show the node wasn't down; it was just restarted and logged the following:
{noformat}
[JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand)
2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started!
[JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms)
[JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms)
[JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
...
{noformat}
> ISPN000208: No live owners found for segments
> ---------------------------------------------
>
> Key: WFLY-10755
> URL: https://issues.jboss.org/browse/WFLY-10755
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 14.0.0.CR1
> Reporter: tommaso borgato
> Assignee: Paul Ferraro
>
> This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
> The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
> {noformat}
> <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
> <transport lock-timeout="60000"/>
> <distributed-cache owners="2" name="dist">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </distributed-cache>
> <replicated-cache name="repl">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </replicated-cache>
> <invalidation-cache name="offload">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
> <table prefix="s">
> <id-column name="id" type="VARCHAR(255)"/>
> <data-column name="datum" type="BYTEA"/>
> <timestamp-column name="version" type="BIGINT"/>
> </table>
> </jdbc-store>
> </invalidation-cache>
> </cache-container>
> {noformat}
> h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
> The error is observed on node dev212:
> right after Node dev214 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
> [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
> [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
> [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [JBossINF] at java.lang.Thread.run(Thread.java:748)
> [JBossINF]
> ...
> [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
> {noformat}
> right after Node dev215 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
> {noformat}
> Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
> This run already used modified jgroups time-outs:
> {noformat}
> <protocol type="FD_ALL">
> <property name="timeout">10000</property>
> <property name="interval">2000</property>
> <property name="timeout_check_interval">1000</property>
> </protocol>
> <protocol type="VERIFY_SUSPECT">
> <property name="timeout">1000</property>
> </protocol>
> {noformat}
> h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
> The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
> h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
> The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
> {noformat}
> <FD_ALL timeout="60000"
> interval="15000"
> timeout_check_interval="5000"
> />
> {noformat}
> In this run, the error is observed on node dev212:
> {noformat}
> [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message
> [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
> [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
> [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected
> {noformat}
> but the logs on dev214 show the node wasn't down; it was just restarted and logged the following:
> {noformat}
> [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand)
> 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started!
> [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms)
> [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms)
> [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> ...
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 9 months
[JBoss JIRA] (WFLY-10755) ISPN000208: No live owners found for segments
by tommaso borgato (JIRA)
[ https://issues.jboss.org/browse/WFLY-10755?page=com.atlassian.jira.plugin... ]
tommaso borgato updated WFLY-10755:
-----------------------------------
Description:
h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4-20|https://...]
This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
{noformat}
<cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache owners="2" name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
<replicated-cache name="repl">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</replicated-cache>
<invalidation-cache name="offload">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
<table prefix="s">
<id-column name="id" type="VARCHAR(255)"/>
<data-column name="datum" type="BYTEA"/>
<timestamp-column name="version" type="BIGINT"/>
</table>
</jdbc-store>
</invalidation-cache>
</cache-container>
{noformat}
The error is observed on node dev212:
right after Node dev214 left the cluster:
{noformat}
[JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
[JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
[JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
[JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[JBossINF] at java.lang.Thread.run(Thread.java:748)
[JBossINF]
...
[JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
{noformat}
right after Node dev215 left the cluster:
{noformat}
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
{noformat}
Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
This run already used modified jgroups time-outs:
{noformat}
<protocol type="FD_ALL">
<property name="timeout">10000</property>
<property name="interval">2000</property>
<property name="timeout_check_interval">1000</property>
</protocol>
<protocol type="VERIFY_SUSPECT">
<property name="timeout">1000</property>
</protocol>
{noformat}
h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB-18|http...]
The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB-21|http...]
The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
{noformat}
<FD_ALL timeout="60000"
interval="15000"
timeout_check_interval="5000"
/>
{noformat}
In this run, the error is observed on node dev212:
{noformat}
[JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected
{noformat}
but the logs on dev214 show the node wasn't down; it was just restarted and logged the following:
{noformat}
[JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand)
2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started!
[JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms)
[JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms)
[JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
...
{noformat}
was:
h3. first run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...]
This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
{noformat}
<cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache owners="2" name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
<replicated-cache name="repl">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</replicated-cache>
<invalidation-cache name="offload">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
<table prefix="s">
<id-column name="id" type="VARCHAR(255)"/>
<data-column name="datum" type="BYTEA"/>
<timestamp-column name="version" type="BIGINT"/>
</table>
</jdbc-store>
</invalidation-cache>
</cache-container>
{noformat}
The error is observed on node dev212:
right after Node dev214 left the cluster:
{noformat}
[JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
[JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
[JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
[JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[JBossINF] at java.lang.Thread.run(Thread.java:748)
[JBossINF]
...
[JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
{noformat}
right after Node dev215 left the cluster:
{noformat}
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
{noformat}
Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
This run already used modified jgroups time-outs:
{noformat}
<protocol type="FD_ALL">
<property name="timeout">10000</property>
<property name="interval">2000</property>
<property name="timeout_check_interval">1000</property>
</protocol>
<protocol type="VERIFY_SUSPECT">
<property name="timeout">1000</property>
</protocol>
{noformat}
The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
{noformat}
<FD_ALL timeout="60000"
interval="15000"
timeout_check_interval="5000"
/>
{noformat}
In this run, the error is observed on node dev212:
{noformat}
[JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected
{noformat}
but the logs on dev214 show the node wasn't down; it was just restarted and logged the following:
{noformat}
[JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand)
2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started!
[JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms)
[JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms)
[JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
...
{noformat}
> ISPN000208: No live owners found for segments
> ---------------------------------------------
>
> Key: WFLY-10755
> URL: https://issues.jboss.org/browse/WFLY-10755
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 14.0.0.CR1
> Reporter: tommaso borgato
> Assignee: Paul Ferraro
>
> h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4-20|https://...]
> This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
> The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
> {noformat}
> <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
> <transport lock-timeout="60000"/>
> <distributed-cache owners="2" name="dist">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </distributed-cache>
> <replicated-cache name="repl">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </replicated-cache>
> <invalidation-cache name="offload">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
> <table prefix="s">
> <id-column name="id" type="VARCHAR(255)"/>
> <data-column name="datum" type="BYTEA"/>
> <timestamp-column name="version" type="BIGINT"/>
> </table>
> </jdbc-store>
> </invalidation-cache>
> </cache-container>
> {noformat}
> The error is observed on node dev212:
> right after Node dev214 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
> [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
> [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
> [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [JBossINF] at java.lang.Thread.run(Thread.java:748)
> [JBossINF]
> ...
> [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
> {noformat}
> right after Node dev215 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
> {noformat}
> Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
> This run already used modified jgroups time-outs:
> {noformat}
> <protocol type="FD_ALL">
> <property name="timeout">10000</property>
> <property name="interval">2000</property>
> <property name="timeout_check_interval">1000</property>
> </protocol>
> <protocol type="VERIFY_SUSPECT">
> <property name="timeout">1000</property>
> </protocol>
> {noformat}
> h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB-18|http...]
> The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
> h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB-21|http...]
> The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
> {noformat}
> <FD_ALL timeout="60000"
> interval="15000"
> timeout_check_interval="5000"
> />
> {noformat}
> In this run, the error is observed on node dev212:
> {noformat}
> [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message
> [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
> [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
> [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected
> {noformat}
> but the logs on dev214 show the node wasn't down; it was just restarted and logged the following:
> {noformat}
> [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand)
> 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started!
> [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms)
> [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms)
> [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> ...
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 9 months
[JBoss JIRA] (WFLY-10755) ISPN000208: No live owners found for segments
by tommaso borgato (JIRA)
[ https://issues.jboss.org/browse/WFLY-10755?page=com.atlassian.jira.plugin... ]
tommaso borgato updated WFLY-10755:
-----------------------------------
Description:
h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
{noformat}
<cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache owners="2" name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
<replicated-cache name="repl">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</replicated-cache>
<invalidation-cache name="offload">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
<table prefix="s">
<id-column name="id" type="VARCHAR(255)"/>
<data-column name="datum" type="BYTEA"/>
<timestamp-column name="version" type="BIGINT"/>
</table>
</jdbc-store>
</invalidation-cache>
</cache-container>
{noformat}
The error is observed on node dev212:
right after Node dev214 left the cluster:
{noformat}
[JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
[JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
[JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
[JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[JBossINF] at java.lang.Thread.run(Thread.java:748)
[JBossINF]
...
[JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
{noformat}
right after Node dev215 left the cluster:
{noformat}
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
{noformat}
Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
This run already used modified jgroups time-outs:
{noformat}
<protocol type="FD_ALL">
<property name="timeout">10000</property>
<property name="interval">2000</property>
<property name="timeout_check_interval">1000</property>
</protocol>
<protocol type="VERIFY_SUSPECT">
<property name="timeout">1000</property>
</protocol>
{noformat}
h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
{noformat}
<FD_ALL timeout="60000"
interval="15000"
timeout_check_interval="5000"
/>
{noformat}
In this run, the error is observed on node dev212:
{noformat}
[JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected
{noformat}
but the logs on dev214 show the node wasn't down; it was just restarted and logged the following:
{noformat}
[JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand)
2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started!
[JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms)
[JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms)
[JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
...
{noformat}
was:
h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4-20|https://...]
This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
{noformat}
<cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache owners="2" name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
<replicated-cache name="repl">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</replicated-cache>
<invalidation-cache name="offload">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
<table prefix="s">
<id-column name="id" type="VARCHAR(255)"/>
<data-column name="datum" type="BYTEA"/>
<timestamp-column name="version" type="BIGINT"/>
</table>
</jdbc-store>
</invalidation-cache>
</cache-container>
{noformat}
The error is observed on node dev212:
right after Node dev214 left the cluster:
{noformat}
[JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
[JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
[JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
[JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[JBossINF] at java.lang.Thread.run(Thread.java:748)
[JBossINF]
...
[JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
{noformat}
right after Node dev215 left the cluster:
{noformat}
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
{noformat}
Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
This run already used modified jgroups time-outs:
{noformat}
<protocol type="FD_ALL">
<property name="timeout">10000</property>
<property name="interval">2000</property>
<property name="timeout_check_interval">1000</property>
</protocol>
<protocol type="VERIFY_SUSPECT">
<property name="timeout">1000</property>
</protocol>
{noformat}
h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB-18|http...]
The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB-21|http...]
The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
{noformat}
<FD_ALL timeout="60000"
interval="15000"
timeout_check_interval="5000"
/>
{noformat}
In this run, the error is observed on node dev212:
{noformat}
[JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected
{noformat}
but the logs on dev214 show the node wasn't down; it was just restarted and logged the following:
{noformat}
[JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand)
2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started!
[JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms)
[JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms)
[JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
...
{noformat}
> ISPN000208: No live owners found for segments
> ---------------------------------------------
>
> Key: WFLY-10755
> URL: https://issues.jboss.org/browse/WFLY-10755
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 14.0.0.CR1
> Reporter: tommaso borgato
> Assignee: Paul Ferraro
>
> h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
> This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
> The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
> {noformat}
> <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
> <transport lock-timeout="60000"/>
> <distributed-cache owners="2" name="dist">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </distributed-cache>
> <replicated-cache name="repl">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </replicated-cache>
> <invalidation-cache name="offload">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
> <table prefix="s">
> <id-column name="id" type="VARCHAR(255)"/>
> <data-column name="datum" type="BYTEA"/>
> <timestamp-column name="version" type="BIGINT"/>
> </table>
> </jdbc-store>
> </invalidation-cache>
> </cache-container>
> {noformat}
> The error is observed on node dev212:
> right after Node dev214 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
> [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
> [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
> [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [JBossINF] at java.lang.Thread.run(Thread.java:748)
> [JBossINF]
> ...
> [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
> {noformat}
> right after Node dev215 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
> {noformat}
> Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
> This run already used modified jgroups time-outs:
> {noformat}
> <protocol type="FD_ALL">
> <property name="timeout">10000</property>
> <property name="interval">2000</property>
> <property name="timeout_check_interval">1000</property>
> </protocol>
> <protocol type="VERIFY_SUSPECT">
> <property name="timeout">1000</property>
> </protocol>
> {noformat}
> h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
> The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
> h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
> The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
> {noformat}
> <FD_ALL timeout="60000"
> interval="15000"
> timeout_check_interval="5000"
> />
> {noformat}
> In this run, the error is observed on node dev212:
> {noformat}
> [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message
> [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
> [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
> [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected
> {noformat}
> but the logs on dev214 show the node wasn't down; it was just restarted and logged the following:
> {noformat}
> [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand)
> 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started!
> [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms)
> [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms)
> [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> ...
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 9 months
[JBoss JIRA] (WFLY-10755) ISPN000208: No live owners found for segments
by tommaso borgato (JIRA)
[ https://issues.jboss.org/browse/WFLY-10755?page=com.atlassian.jira.plugin... ]
tommaso borgato updated WFLY-10755:
-----------------------------------
Description:
h3. first run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...]
This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
{noformat}
<cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache owners="2" name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
<replicated-cache name="repl">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</replicated-cache>
<invalidation-cache name="offload">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
<table prefix="s">
<id-column name="id" type="VARCHAR(255)"/>
<data-column name="datum" type="BYTEA"/>
<timestamp-column name="version" type="BIGINT"/>
</table>
</jdbc-store>
</invalidation-cache>
</cache-container>
{noformat}
The error is observed on node dev212:
right after Node dev214 left the cluster:
{noformat}
[JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
[JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
[JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
[JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[JBossINF] at java.lang.Thread.run(Thread.java:748)
[JBossINF]
...
[JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
{noformat}
right after Node dev215 left the cluster:
{noformat}
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
{noformat}
Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
This run already used modified jgroups time-outs:
{noformat}
<protocol type="FD_ALL">
<property name="timeout">10000</property>
<property name="interval">2000</property>
<property name="timeout_check_interval">1000</property>
</protocol>
<protocol type="VERIFY_SUSPECT">
<property name="timeout">1000</property>
</protocol>
{noformat}
The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
{noformat}
<FD_ALL timeout="60000"
interval="15000"
timeout_check_interval="5000"
/>
{noformat}
In this run, the error is observed on node dev212:
{noformat}
[JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected
{noformat}
but the logs on dev214 show the node wasn't down; it was just restarted and logged the following:
{noformat}
[JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand)
2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started!
[JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms)
[JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms)
[JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
...
{noformat}
was:
This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
{noformat}
<cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache owners="2" name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
<replicated-cache name="repl">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</replicated-cache>
<invalidation-cache name="offload">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
<table prefix="s">
<id-column name="id" type="VARCHAR(255)"/>
<data-column name="datum" type="BYTEA"/>
<timestamp-column name="version" type="BIGINT"/>
</table>
</jdbc-store>
</invalidation-cache>
</cache-container>
{noformat}
The error is observed on node dev212:
right after Node dev214 left the cluster:
{noformat}
[JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
[JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
[JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
[JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[JBossINF] at java.lang.Thread.run(Thread.java:748)
[JBossINF]
...
[JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
{noformat}
right after Node dev215 left the cluster:
{noformat}
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
{noformat}
Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
This run already used modified jgroups time-outs:
{noformat}
<protocol type="FD_ALL">
<property name="timeout">10000</property>
<property name="interval">2000</property>
<property name="timeout_check_interval">1000</property>
</protocol>
<protocol type="VERIFY_SUSPECT">
<property name="timeout">1000</property>
</protocol>
{noformat}
The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
{noformat}
<FD_ALL timeout="60000"
interval="15000"
timeout_check_interval="5000"
/>
{noformat}
In this run, the error is observed on node dev212:
{noformat}
[JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected
{noformat}
but the logs on dev214 show the node wasn't down; it was just restarted and logged the following:
{noformat}
[JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand)
2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started!
[JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms)
[JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms)
[JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
...
{noformat}
> ISPN000208: No live owners found for segments
> ---------------------------------------------
>
> Key: WFLY-10755
> URL: https://issues.jboss.org/browse/WFLY-10755
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 14.0.0.CR1
> Reporter: tommaso borgato
> Assignee: Paul Ferraro
>
> h3. first run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...]
> This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
> The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
> {noformat}
> <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
> <transport lock-timeout="60000"/>
> <distributed-cache owners="2" name="dist">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </distributed-cache>
> <replicated-cache name="repl">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </replicated-cache>
> <invalidation-cache name="offload">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
> <table prefix="s">
> <id-column name="id" type="VARCHAR(255)"/>
> <data-column name="datum" type="BYTEA"/>
> <timestamp-column name="version" type="BIGINT"/>
> </table>
> </jdbc-store>
> </invalidation-cache>
> </cache-container>
> {noformat}
> The error is observed on node dev212:
> right after Node dev214 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
> [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
> [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
> [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [JBossINF] at java.lang.Thread.run(Thread.java:748)
> [JBossINF]
> ...
> [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
> {noformat}
> right after Node dev215 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
> {noformat}
> Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
> This run already used modified jgroups time-outs:
> {noformat}
> <protocol type="FD_ALL">
> <property name="timeout">10000</property>
> <property name="interval">2000</property>
> <property name="timeout_check_interval">1000</property>
> </protocol>
> <protocol type="VERIFY_SUSPECT">
> <property name="timeout">1000</property>
> </protocol>
> {noformat}
> The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
> The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
> {noformat}
> <FD_ALL timeout="60000"
> interval="15000"
> timeout_check_interval="5000"
> />
> {noformat}
> In this run, the error is observed on node dev212:
> {noformat}
> [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message
> [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
> [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
> [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected
> {noformat}
> but the logs on dev214 show the node wasn't down; it was just restarted and logged the following:
> {noformat}
> [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand)
> 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started!
> [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms)
> [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms)
> [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> ...
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 9 months
[JBoss JIRA] (WFLY-10755) ISPN000208: No live owners found for segments
by tommaso borgato (JIRA)
[ https://issues.jboss.org/browse/WFLY-10755?page=com.atlassian.jira.plugin... ]
tommaso borgato updated WFLY-10755:
-----------------------------------
Description:
This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
{noformat}
<cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache owners="2" name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
<replicated-cache name="repl">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</replicated-cache>
<invalidation-cache name="offload">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
<table prefix="s">
<id-column name="id" type="VARCHAR(255)"/>
<data-column name="datum" type="BYTEA"/>
<timestamp-column name="version" type="BIGINT"/>
</table>
</jdbc-store>
</invalidation-cache>
</cache-container>
{noformat}
The error is observed on node dev212:
right after Node dev214 left the cluster:
{noformat}
[JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
[JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
[JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
[JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[JBossINF] at java.lang.Thread.run(Thread.java:748)
[JBossINF]
...
[JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
{noformat}
right after Node dev215 left the cluster:
{noformat}
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
{noformat}
Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
This run already used modified jgroups time-outs:
{noformat}
<protocol type="FD_ALL">
<property name="timeout">10000</property>
<property name="interval">2000</property>
<property name="timeout_check_interval">1000</property>
</protocol>
<protocol type="VERIFY_SUSPECT">
<property name="timeout">1000</property>
</protocol>
{noformat}
The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
{noformat}
<FD_ALL timeout="60000"
interval="15000"
timeout_check_interval="5000"
/>
{noformat}
In this run, the error is observed on node dev212:
{noformat}
[JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
[JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected
{noformat}
but the logs on dev214 show the node wasn't down; it was just restarted and logged the following:
{noformat}
[JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990
[JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand)
2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started!
[JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms)
[JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms)
[JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
[JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
[JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
[JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
[JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
...
{noformat}
was:
This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
{noformat}
<cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache owners="2" name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
<replicated-cache name="repl">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</replicated-cache>
<invalidation-cache name="offload">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
<table prefix="s">
<id-column name="id" type="VARCHAR(255)"/>
<data-column name="datum" type="BYTEA"/>
<timestamp-column name="version" type="BIGINT"/>
</table>
</jdbc-store>
</invalidation-cache>
</cache-container>
{noformat}
The error is observed on node dev212:
right after Node dev214 left the cluster:
{noformat}
[JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
[JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
[JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
[JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[JBossINF] at java.lang.Thread.run(Thread.java:748)
[JBossINF]
...
[JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
{noformat}
right after Node dev215 left the cluster:
{noformat}
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
{noformat}
Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
This run already used modified jgroups time-outs:
{noformat}
<protocol type="FD_ALL">
<property name="timeout">10000</property>
<property name="interval">2000</property>
<property name="timeout_check_interval">1000</property>
</protocol>
<protocol type="VERIFY_SUSPECT">
<property name="timeout">1000</property>
</protocol>
{noformat}
The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
{noformat}
<FD_ALL timeout="60000"
interval="15000"
timeout_check_interval="5000"
/>
{noformat}
> ISPN000208: No live owners found for segments
> ---------------------------------------------
>
> Key: WFLY-10755
> URL: https://issues.jboss.org/browse/WFLY-10755
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 14.0.0.CR1
> Reporter: tommaso borgato
> Assignee: Paul Ferraro
>
> This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
> The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
> {noformat}
> <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
> <transport lock-timeout="60000"/>
> <distributed-cache owners="2" name="dist">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </distributed-cache>
> <replicated-cache name="repl">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </replicated-cache>
> <invalidation-cache name="offload">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
> <table prefix="s">
> <id-column name="id" type="VARCHAR(255)"/>
> <data-column name="datum" type="BYTEA"/>
> <timestamp-column name="version" type="BIGINT"/>
> </table>
> </jdbc-store>
> </invalidation-cache>
> </cache-container>
> {noformat}
> The error is observed on node dev212:
> right after Node dev214 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
> [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
> [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
> [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [JBossINF] at java.lang.Thread.run(Thread.java:748)
> [JBossINF]
> ...
> [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
> {noformat}
> right after Node dev215 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
> {noformat}
> Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
> This run already used modified jgroups time-outs:
> {noformat}
> <protocol type="FD_ALL">
> <property name="timeout">10000</property>
> <property name="interval">2000</property>
> <property name="timeout_check_interval">1000</property>
> </protocol>
> <protocol type="VERIFY_SUSPECT">
> <property name="timeout">1000</property>
> </protocol>
> {noformat}
> The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
> The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
> {noformat}
> <FD_ALL timeout="60000"
> interval="15000"
> timeout_check_interval="5000"
> />
> {noformat}
> In this run, the error is observed on node dev212:
> {noformat}
> [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message
> [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
> [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
> [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected
> {noformat}
> but the logs on dev214 show the node wasn't down; it was just restarted and logged the following:
> {noformat}
> [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand)
> 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started!
> [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms)
> [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms)
> [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> ...
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 9 months
[JBoss JIRA] (WFLY-10755) ISPN000208: No live owners found for segments
by tommaso borgato (JIRA)
[ https://issues.jboss.org/browse/WFLY-10755?page=com.atlassian.jira.plugin... ]
tommaso borgato updated WFLY-10755:
-----------------------------------
Description:
This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
{noformat}
<cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache owners="2" name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
<replicated-cache name="repl">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</replicated-cache>
<invalidation-cache name="offload">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
<table prefix="s">
<id-column name="id" type="VARCHAR(255)"/>
<data-column name="datum" type="BYTEA"/>
<timestamp-column name="version" type="BIGINT"/>
</table>
</jdbc-store>
</invalidation-cache>
</cache-container>
{noformat}
The error is observed on node dev212:
right after Node dev214 left the cluster:
{noformat}
[JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
[JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
[JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
[JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[JBossINF] at java.lang.Thread.run(Thread.java:748)
[JBossINF]
...
[JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
{noformat}
right after Node dev215 left the cluster:
{noformat}
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
{noformat}
Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
This run already used modified jgroups time-outs:
{noformat}
<protocol type="FD_ALL">
<property name="timeout">10000</property>
<property name="interval">2000</property>
<property name="timeout_check_interval">1000</property>
</protocol>
<protocol type="VERIFY_SUSPECT">
<property name="timeout">1000</property>
</protocol>
{noformat}
The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
{noformat}
<FD_ALL timeout="60000"
interval="15000"
timeout_check_interval="5000"
/>
{noformat}
was:
This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
{noformat}
<cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache owners="2" name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
<replicated-cache name="repl">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</replicated-cache>
<invalidation-cache name="offload">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
<table prefix="s">
<id-column name="id" type="VARCHAR(255)"/>
<data-column name="datum" type="BYTEA"/>
<timestamp-column name="version" type="BIGINT"/>
</table>
</jdbc-store>
</invalidation-cache>
</cache-container>
{noformat}
The error is observed on node dev212:
right after Node dev214 left the cluster:
{noformat}
[JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
[JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
[JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
[JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[JBossINF] at java.lang.Thread.run(Thread.java:748)
[JBossINF]
...
[JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
{noformat}
right after Node dev215 left the cluster:
{noformat}
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
{noformat}
Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
This run already used modified jgroups time-outs:
{noformat}
<protocol type="FD_ALL">
<property name="timeout">10000</property>
<property name="interval">2000</property>
<property name="timeout_check_interval">1000</property>
</protocol>
<protocol type="VERIFY_SUSPECT">
<property name="timeout">1000</property>
</protocol>
{noformat}
The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were.
> ISPN000208: No live owners found for segments
> ---------------------------------------------
>
> Key: WFLY-10755
> URL: https://issues.jboss.org/browse/WFLY-10755
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 14.0.0.CR1
> Reporter: tommaso borgato
> Assignee: Paul Ferraro
>
> This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
> The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
> {noformat}
> <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
> <transport lock-timeout="60000"/>
> <distributed-cache owners="2" name="dist">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </distributed-cache>
> <replicated-cache name="repl">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </replicated-cache>
> <invalidation-cache name="offload">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
> <table prefix="s">
> <id-column name="id" type="VARCHAR(255)"/>
> <data-column name="datum" type="BYTEA"/>
> <timestamp-column name="version" type="BIGINT"/>
> </table>
> </jdbc-store>
> </invalidation-cache>
> </cache-container>
> {noformat}
> The error is observed on node dev212:
> right after Node dev214 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
> [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
> [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
> [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [JBossINF] at java.lang.Thread.run(Thread.java:748)
> [JBossINF]
> ...
> [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
> {noformat}
> right after Node dev215 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
> {noformat}
> Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
> This run already used modified jgroups time-outs:
> {noformat}
> <protocol type="FD_ALL">
> <property name="timeout">10000</property>
> <property name="interval">2000</property>
> <property name="timeout_check_interval">1000</property>
> </protocol>
> <protocol type="VERIFY_SUSPECT">
> <property name="timeout">1000</property>
> </protocol>
> {noformat}
> The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
> The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
> {noformat}
> <FD_ALL timeout="60000"
> interval="15000"
> timeout_check_interval="5000"
> />
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 9 months
[JBoss JIRA] (WFLY-10755) ISPN000208: No live owners found for segments
by tommaso borgato (JIRA)
[ https://issues.jboss.org/browse/WFLY-10755?page=com.atlassian.jira.plugin... ]
tommaso borgato updated WFLY-10755:
-----------------------------------
Description:
This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
{noformat}
<cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache owners="2" name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
<replicated-cache name="repl">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</replicated-cache>
<invalidation-cache name="offload">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
<table prefix="s">
<id-column name="id" type="VARCHAR(255)"/>
<data-column name="datum" type="BYTEA"/>
<timestamp-column name="version" type="BIGINT"/>
</table>
</jdbc-store>
</invalidation-cache>
</cache-container>
{noformat}
The error is observed on node dev212:
right after Node dev214 left the cluster:
{noformat}
[JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
[JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
[JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
[JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[JBossINF] at java.lang.Thread.run(Thread.java:748)
[JBossINF]
...
[JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
{noformat}
right after Node dev215 left the cluster:
{noformat}
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
{noformat}
Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
This run already used modified jgroups time-outs:
{noformat}
<protocol type="FD_ALL">
<property name="timeout">10000</property>
<property name="interval">2000</property>
<property name="timeout_check_interval">1000</property>
</protocol>
<protocol type="VERIFY_SUSPECT">
<property name="timeout">1000</property>
</protocol>
{noformat}
The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were.
was:
This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
{noformat}
<cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache owners="2" name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
<replicated-cache name="repl">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</replicated-cache>
<invalidation-cache name="offload">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
<table prefix="s">
<id-column name="id" type="VARCHAR(255)"/>
<data-column name="datum" type="BYTEA"/>
<timestamp-column name="version" type="BIGINT"/>
</table>
</jdbc-store>
</invalidation-cache>
</cache-container>
{noformat}
The error is observed on node dev212:
right after Node dev214 left the cluster:
{noformat}
[JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
[JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
[JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
[JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
[JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
[JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[JBossINF] at java.lang.Thread.run(Thread.java:748)
[JBossINF]
...
[JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
{noformat}
right after Node dev215 left the cluster:
{noformat}
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
[JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
[JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
{noformat}
Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
This run already used modified jgroups time-outs:
{noformat}
<protocol type="FD_ALL">
<property name="timeout">10000</property>
<property name="interval">2000</property>
<property name="timeout_check_interval">1000</property>
</protocol>
<protocol type="VERIFY_SUSPECT">
<property name="timeout">1000</property>
</protocol>
{noformat}
but the error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
> ISPN000208: No live owners found for segments
> ---------------------------------------------
>
> Key: WFLY-10755
> URL: https://issues.jboss.org/browse/WFLY-10755
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 14.0.0.CR1
> Reporter: tommaso borgato
> Assignee: Paul Ferraro
>
> This error was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
> The scenario is composed of 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
> {noformat}
> <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
> <transport lock-timeout="60000"/>
> <distributed-cache owners="2" name="dist">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </distributed-cache>
> <replicated-cache name="repl">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </replicated-cache>
> <invalidation-cache name="offload">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
> <table prefix="s">
> <id-column name="id" type="VARCHAR(255)"/>
> <data-column name="datum" type="BYTEA"/>
> <timestamp-column name="version" type="BIGINT"/>
> </table>
> </jdbc-store>
> </invalidation-cache>
> </cache-container>
> {noformat}
> The error is observed on node dev212:
> right after Node dev214 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
> [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
> [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
> [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [JBossINF] at java.lang.Thread.run(Thread.java:748)
> [JBossINF]
> ...
> [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
> {noformat}
> right after Node dev215 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
> {noformat}
> Please note that node dev213 didn't actually leave the cluster: it was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 look suspicious.
> This run already used modified jgroups time-outs:
> {noformat}
> <protocol type="FD_ALL">
> <property name="timeout">10000</property>
> <property name="interval">2000</property>
> <property name="timeout_check_interval">1000</property>
> </protocol>
> <protocol type="VERIFY_SUSPECT">
> <property name="timeout">1000</property>
> </protocol>
> {noformat}
> The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values were unmodified.
> The error is observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those values are set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 9 months
[JBoss JIRA] (WFLY-10754) NullPointerException using Stateless with configured interceptors
by Luca Stancapiano (JIRA)
[ https://issues.jboss.org/browse/WFLY-10754?page=com.atlassian.jira.plugin... ]
Luca Stancapiano updated WFLY-10754:
------------------------------------
Description:
I report a strange behavior on WildFly 13 when configuring interceptors within stateless. Below I describe the scenario:
Here a simple interceptor:
{code:java}
package it.vige.injection.interceptors;
import javax.interceptor.AroundInvoke;
import javax.interceptor.Interceptor;
import javax.interceptor.InvocationContext;
@Interceptor
public class OKInterceptor {
@AroundInvoke
public Object aroundInvoke(InvocationContext ic) throws Exception {
return ic.proceed();
}
}
{code}
Here an annotation used as interceptor binding:
{code:java}
package it.vige.injection.interceptors;
import static java.lang.annotation.ElementType.CONSTRUCTOR;
import static java.lang.annotation.ElementType.METHOD;
import static java.lang.annotation.ElementType.TYPE;
import static java.lang.annotation.RetentionPolicy.RUNTIME;
import java.lang.annotation.Retention;
import java.lang.annotation.Target;
import javax.interceptor.InterceptorBinding;
@Retention(RUNTIME)
@Target({ METHOD, TYPE, CONSTRUCTOR })
@InterceptorBinding
public @interface NotOK {
}
{code}
Here an interceptor annotated with the interceptor binding:
{code:java}
package it.vige.injection.interceptors;
import javax.interceptor.AroundInvoke;
import javax.interceptor.Interceptor;
import javax.interceptor.InvocationContext;
@Interceptor
@NotOK
public class NotOKInterceptor {
@AroundInvoke
public Object aroundInvoke(InvocationContext ic) throws Exception {
return ic.proceed();
}
}
{code}
Here the stateless service configured with both the interceptors:
{code:java}
package it.vige.injection.interceptors;
import javax.ejb.Stateless;
import javax.interceptor.Interceptors;
@Stateless
public class SimpleService {
@Interceptors({ OKInterceptor.class })
public void ok() {
}
@NotOK
public void notOk() {
}
}
{code}
This service must have two methods, one attached to the simple interceptor and the other attached to the interceptor binding.
Here the beans.xml configuration:
{code:java}
<beans xmlns="http://xmlns.jcp.org/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
http://xmlns.jcp.org/xml/ns/javaee/beans_2_0.xsd"
version="2.0" bean-discovery-mode="all">
<interceptors>
<class>it.vige.injection.interceptors.OKInterceptor</class>
<class>it.vige.injection.interceptors.NotOKInterceptor</class>
</interceptors>
</beans>
{code}
And in the end the client who call the service:
{code:java}
....
@Inject
private SimpleService simpleService;
...
// this call works:
simpleService.ok();
// this call starts a NullPointerException:
simpleService.notOk();
...
{code}
when I try to call the notOk method I get this exception:
{code:java}
javax.ejb.EJBException: java.lang.NullPointerException
at deployment.test.war//it.vige.injection.test.InterceptorsTestCase.testNotOk(InterceptorsTestCase.java:52)
Caused by: java.lang.NullPointerException
at deployment.test.war//it.vige.injection.test.InterceptorsTestCase.testNotOk(InterceptorsTestCase.java:52)
{code}
The same thing was tested on WildFly 12.0.0.Final and it was ok.
If on WildFfly 13.0.0.Final I remove the @Stateless annotation from the service it works
was:
I report a strange behavior on WildFly 13 when configuring interceptors within stateless. Below I describe the scenario:
Here a simple interceptor:
{code:java}
package it.vige.injection.interceptors;
import javax.interceptor.AroundInvoke;
import javax.interceptor.Interceptor;
import javax.interceptor.InvocationContext;
@Interceptor
public class OKInterceptor {
@AroundInvoke
public Object aroundInvoke(InvocationContext ic) throws Exception {
return ic.proceed();
}
}
{code}
Here an annotation used as interceptor binding:
{code:java}
package it.vige.injection.interceptors;
import static java.lang.annotation.ElementType.CONSTRUCTOR;
import static java.lang.annotation.ElementType.METHOD;
import static java.lang.annotation.ElementType.TYPE;
import static java.lang.annotation.RetentionPolicy.RUNTIME;
import java.lang.annotation.Retention;
import java.lang.annotation.Target;
import javax.interceptor.InterceptorBinding;
@Retention(RUNTIME)
@Target({ METHOD, TYPE, CONSTRUCTOR })
@InterceptorBinding
public @interface NotOK {
}
{code}
Here an interceptor annotated with the interceptor binding:
{code:java}
package it.vige.injection.interceptors;
import javax.interceptor.AroundInvoke;
import javax.interceptor.Interceptor;
import javax.interceptor.InvocationContext;
@Interceptor
@NotOK
public class NotOKInterceptor {
@AroundInvoke
public Object aroundInvoke(InvocationContext ic) throws Exception {
return ic.proceed();
}
}
{code}
Here the stateless service configured with both the interceptors:
{code:java}
package it.vige.injection.interceptors;
import javax.ejb.Stateless;
import javax.interceptor.Interceptors;
@Stateless
public class SimpleService {
@Interceptors({ OKInterceptor.class })
public void ok() {
}
@NotOK
public void notOk() {
}
}
{code}
This service must have two methods, one attached to the simple nterceptor and the other attached to the interceptor binding.
Here the beans.xml configuration:
{code:java}
<beans xmlns="http://xmlns.jcp.org/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
http://xmlns.jcp.org/xml/ns/javaee/beans_2_0.xsd"
version="2.0" bean-discovery-mode="all">
<interceptors>
<class>it.vige.injection.interceptors.OKInterceptor</class>
<class>it.vige.injection.interceptors.NotOKInterceptor</class>
</interceptors>
</beans>
{code}
And in the end the client who call the service:
{code:java}
....
@Inject
private SimpleService simpleService;
...
// this call works:
simpleService.ok();
// this call starts a NullPointerException:
simpleService.notOk();
...
{code}
when I try to call the notOk method I get this exception:
{code:java}
javax.ejb.EJBException: java.lang.NullPointerException
at deployment.test.war//it.vige.injection.test.InterceptorsTestCase.testNotOk(InterceptorsTestCase.java:52)
Caused by: java.lang.NullPointerException
at deployment.test.war//it.vige.injection.test.InterceptorsTestCase.testNotOk(InterceptorsTestCase.java:52)
{code}
The same thing was tested on WildFly 12.0.0.Final and it was ok.
If on WildFfly 13.0.0.Final I remove the @Stateless annotation from the service it works
> NullPointerException using Stateless with configured interceptors
> -----------------------------------------------------------------
>
> Key: WFLY-10754
> URL: https://issues.jboss.org/browse/WFLY-10754
> Project: WildFly
> Issue Type: Bug
> Components: CDI / Weld
> Affects Versions: 13.0.0.Final
> Environment: WildFly 13.0.0.Final and java 10.0.1
> Reporter: Luca Stancapiano
> Assignee: Matej Novotny
>
> I report a strange behavior on WildFly 13 when configuring interceptors within stateless. Below I describe the scenario:
> Here a simple interceptor:
> {code:java}
> package it.vige.injection.interceptors;
> import javax.interceptor.AroundInvoke;
> import javax.interceptor.Interceptor;
> import javax.interceptor.InvocationContext;
> @Interceptor
> public class OKInterceptor {
> @AroundInvoke
> public Object aroundInvoke(InvocationContext ic) throws Exception {
> return ic.proceed();
> }
> }
> {code}
> Here an annotation used as interceptor binding:
> {code:java}
> package it.vige.injection.interceptors;
> import static java.lang.annotation.ElementType.CONSTRUCTOR;
> import static java.lang.annotation.ElementType.METHOD;
> import static java.lang.annotation.ElementType.TYPE;
> import static java.lang.annotation.RetentionPolicy.RUNTIME;
> import java.lang.annotation.Retention;
> import java.lang.annotation.Target;
> import javax.interceptor.InterceptorBinding;
> @Retention(RUNTIME)
> @Target({ METHOD, TYPE, CONSTRUCTOR })
> @InterceptorBinding
> public @interface NotOK {
> }
> {code}
> Here an interceptor annotated with the interceptor binding:
> {code:java}
> package it.vige.injection.interceptors;
> import javax.interceptor.AroundInvoke;
> import javax.interceptor.Interceptor;
> import javax.interceptor.InvocationContext;
> @Interceptor
> @NotOK
> public class NotOKInterceptor {
> @AroundInvoke
> public Object aroundInvoke(InvocationContext ic) throws Exception {
> return ic.proceed();
> }
> }
> {code}
> Here the stateless service configured with both the interceptors:
> {code:java}
> package it.vige.injection.interceptors;
> import javax.ejb.Stateless;
> import javax.interceptor.Interceptors;
> @Stateless
> public class SimpleService {
> @Interceptors({ OKInterceptor.class })
> public void ok() {
> }
> @NotOK
> public void notOk() {
> }
> }
> {code}
> This service must have two methods, one attached to the simple interceptor and the other attached to the interceptor binding.
> Here the beans.xml configuration:
> {code:java}
> <beans xmlns="http://xmlns.jcp.org/xml/ns/javaee"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
> http://xmlns.jcp.org/xml/ns/javaee/beans_2_0.xsd"
> version="2.0" bean-discovery-mode="all">
> <interceptors>
> <class>it.vige.injection.interceptors.OKInterceptor</class>
> <class>it.vige.injection.interceptors.NotOKInterceptor</class>
> </interceptors>
> </beans>
> {code}
> And in the end the client who call the service:
> {code:java}
> ....
> @Inject
> private SimpleService simpleService;
> ...
> // this call works:
> simpleService.ok();
> // this call starts a NullPointerException:
> simpleService.notOk();
> ...
> {code}
> when I try to call the notOk method I get this exception:
> {code:java}
> javax.ejb.EJBException: java.lang.NullPointerException
> at deployment.test.war//it.vige.injection.test.InterceptorsTestCase.testNotOk(InterceptorsTestCase.java:52)
> Caused by: java.lang.NullPointerException
> at deployment.test.war//it.vige.injection.test.InterceptorsTestCase.testNotOk(InterceptorsTestCase.java:52)
> {code}
> The same thing was tested on WildFly 12.0.0.Final and it was ok.
> If on WildFfly 13.0.0.Final I remove the @Stateless annotation from the service it works
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 9 months