jboss-jira July 2018

jboss-jira@lists.jboss.org

1 participants
2474 discussions

[JBoss JIRA] (WFLY-10756) ISPN000136: Error executing command LockControlCommand

by tommaso borgato (JIRA)

tommaso borgato created WFLY-10756: -------------------------------------- Summary: ISPN000136: Error executing command LockControlCommand Key: WFLY-10756 URL: https://issues.jboss.org/browse/WFLY-10756 Project: WildFly Issue Type: Bug Components: Clustering Affects Versions: 14.0.0.CR1 Reporter: tommaso borgato Assignee: Paul Ferraro The error was observed in scenario [eap-7x-failover-ejb-ejbservlet-jvmkill-dist-sync|https://jenkins.hosts.mw...]: a 4 nodes cluster with a mod_jk load balancer where fail-over is introduced by killing the node jvm. The error was observed on node {{*[perf19|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-ejb-ejbservlet-tests/job/eap-7x-failover-ejb-ejbservlet-jvmkill-dist-sync_JJB/4/console-perf19/]*}} right after it was re-started after being killed: {noformat} [JBossINF] [0m[0m02:43:51,746 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 12738ms - Started 973 of 1160 services (477 services are lazy, passive or on-demand) [JBossINF] [0m[0m02:44:37,268 INFO [org.jboss.ejb.client] (default task-1) JBoss EJB Client version 4.0.11.Final [JBossINF] [0m[0m02:44:56,928 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|8] (3) [perf21, perf18, perf19] [JBossINF] [0m[0m02:44:56,942 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN100001: Node perf20 left the cluster [JBossINF] [0m[0m02:44:56,948 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|8] (3) [perf21, perf18, perf19] [JBossINF] [0m[0m02:44:56,949 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN100001: Node perf20 left the cluster [JBossINF] [0m[0m02:44:56,949 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|8] (3) [perf21, perf18, perf19] [JBossINF] [0m[0m02:44:56,950 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN100001: Node perf20 left the cluster [JBossINF] [0m[0m02:44:56,955 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|8] (3) [perf21, perf18, perf19] [JBossINF] [0m[0m02:44:56,978 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN100001: Node perf20 left the cluster [JBossINF] [0m[0m02:45:58,220 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|9] (4) [perf21, perf18, perf19, perf20] [JBossINF] [0m[0m02:45:58,221 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN100000: Node perf20 joined the cluster [JBossINF] [0m[0m02:45:58,223 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|9] (4) [perf21, perf18, perf19, perf20] [JBossINF] [0m[0m02:45:58,223 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN100000: Node perf20 joined the cluster [JBossINF] [0m[0m02:45:58,224 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|9] (4) [perf21, perf18, perf19, perf20] [JBossINF] [0m[0m02:45:58,225 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN100000: Node perf20 joined the cluster [JBossINF] [0m[0m02:45:58,225 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|9] (4) [perf21, perf18, perf19, perf20] [JBossINF] [0m[0m02:45:58,226 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN100000: Node perf20 joined the cluster [JBossINF] [0m[0m02:47:11,387 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf18|10] (3) [perf18, perf19, perf20] [JBossINF] [0m[0m02:47:11,389 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN100001: Node perf21 left the cluster [JBossINF] [0m[0m02:47:11,390 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf18|10] (3) [perf18, perf19, perf20] [JBossINF] [0m[0m02:47:11,390 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN100001: Node perf21 left the cluster [JBossINF] [0m[0m02:47:11,392 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf18|10] (3) [perf18, perf19, perf20] [JBossINF] [0m[0m02:47:11,393 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN100001: Node perf21 left the cluster [JBossINF] [0m[0m02:47:11,393 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf18|10] (3) [perf18, perf19, perf20] [JBossINF] [0m[0m02:47:11,394 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN100001: Node perf21 left the cluster [JBossINF] [0m[31m02:47:12,082 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (remote-thread--p8-t60) ISPN000136: Error executing command LockControlCommand, writing keys []: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 0 milliseconds for key SessionCreationMetaDataKey(2UakcseFRb1cI_zcOL5fwpJnkR8j-fo74TsVeNSo) and requestor GlobalTx:perf20:25537. Lock is held by GlobalTx:perf19:118742 [JBossINF] at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:288) [JBossINF] at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.lock(DefaultLockManager.java:261) [JBossINF] at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.localLockCommandWork(PessimisticLockingInterceptor.java:209) [JBossINF] at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.lambda$new$0(PessimisticLockingInterceptor.java:46) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextThenApply(BaseAsyncInterceptor.java:81) [JBossINF] at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.visitLockControlCommand(PessimisticLockingInterceptor.java:192) [JBossINF] at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:117) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndHandle(BaseAsyncInterceptor.java:183) [JBossINF] at org.infinispan.interceptors.impl.TxInterceptor.visitLockControlCommand(TxInterceptor.java:222) [JBossINF] at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:117) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.lambda$new$0(BaseAsyncInterceptor.java:22) [JBossINF] at org.infinispan.interceptors.InvocationSuccessFunction.apply(InvocationSuccessFunction.java:25) [JBossINF] at org.infinispan.interceptors.impl.SimpleAsyncInvocationStage.addCallback(SimpleAsyncInvocationStage.java:70) [JBossINF] at org.infinispan.interceptors.InvocationStage.thenApply(InvocationStage.java:45) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.asyncInvokeNext(BaseAsyncInterceptor.java:224) [JBossINF] at org.infinispan.statetransfer.TransactionSynchronizerInterceptor.visitCommand(TransactionSynchronizerInterceptor.java:46) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndHandle(BaseAsyncInterceptor.java:185) [JBossINF] at org.infinispan.statetransfer.StateTransferInterceptor.visitLockControlCommand(StateTransferInterceptor.java:90) [JBossINF] at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:117) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:123) [JBossINF] at org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:90) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:56) [JBossINF] at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54) [JBossINF] at org.infinispan.interceptors.DDAsyncInterceptor.visitLockControlCommand(DDAsyncInterceptor.java:160) [JBossINF] at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:117) [JBossINF] at org.infinispan.interceptors.DDAsyncInterceptor.visitCommand(DDAsyncInterceptor.java:50) [JBossINF] at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invokeAsync(AsyncInterceptorChainImpl.java:234) [JBossINF] at org.infinispan.commands.control.LockControlCommand.invokeAsync(LockControlCommand.java:126) [JBossINF] at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:94) [JBossINF] at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99) [JBossINF] at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71) [JBossINF] at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47) [JBossINF] at java.lang.Thread.run(Thread.java:748) {noformat} -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 9 months

1
0
0 / 0

[JBoss JIRA] (WFLY-10756) ISPN000136: Error executing command LockControlCommand

by tommaso borgato (JIRA)

[ https://issues.jboss.org/browse/WFLY-10756?page=com.atlassian.jira.plugin... ] tommaso borgato updated WFLY-10756: ----------------------------------- Description: The error was observed in scenario {{*[eap-7x-failover-ejb-ejbservlet-jvmkill-dist-sync|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-ejb-ejbservlet-tests/job/eap-7x-failover-ejb-ejbservlet-jvmkill-dist-sync_JJB/4/]*}}: a 4 nodes cluster with a mod_jk load balancer where fail-over is introduced by killing the node jvm. The error was observed on node {{*[perf19|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-ejb-ejbservlet-tests/job/eap-7x-failover-ejb-ejbservlet-jvmkill-dist-sync_JJB/4/console-perf19/]*}} right after it was re-started after being killed: {noformat} [JBossINF] [0m[0m02:43:51,746 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 12738ms - Started 973 of 1160 services (477 services are lazy, passive or on-demand) [JBossINF] [0m[0m02:44:37,268 INFO [org.jboss.ejb.client] (default task-1) JBoss EJB Client version 4.0.11.Final [JBossINF] [0m[0m02:44:56,928 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|8] (3) [perf21, perf18, perf19] [JBossINF] [0m[0m02:44:56,942 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN100001: Node perf20 left the cluster [JBossINF] [0m[0m02:44:56,948 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|8] (3) [perf21, perf18, perf19] [JBossINF] [0m[0m02:44:56,949 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN100001: Node perf20 left the cluster [JBossINF] [0m[0m02:44:56,949 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|8] (3) [perf21, perf18, perf19] [JBossINF] [0m[0m02:44:56,950 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN100001: Node perf20 left the cluster [JBossINF] [0m[0m02:44:56,955 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|8] (3) [perf21, perf18, perf19] [JBossINF] [0m[0m02:44:56,978 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN100001: Node perf20 left the cluster [JBossINF] [0m[0m02:45:58,220 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|9] (4) [perf21, perf18, perf19, perf20] [JBossINF] [0m[0m02:45:58,221 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN100000: Node perf20 joined the cluster [JBossINF] [0m[0m02:45:58,223 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|9] (4) [perf21, perf18, perf19, perf20] [JBossINF] [0m[0m02:45:58,223 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN100000: Node perf20 joined the cluster [JBossINF] [0m[0m02:45:58,224 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|9] (4) [perf21, perf18, perf19, perf20] [JBossINF] [0m[0m02:45:58,225 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN100000: Node perf20 joined the cluster [JBossINF] [0m[0m02:45:58,225 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|9] (4) [perf21, perf18, perf19, perf20] [JBossINF] [0m[0m02:45:58,226 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN100000: Node perf20 joined the cluster [JBossINF] [0m[0m02:47:11,387 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf18|10] (3) [perf18, perf19, perf20] [JBossINF] [0m[0m02:47:11,389 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN100001: Node perf21 left the cluster [JBossINF] [0m[0m02:47:11,390 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf18|10] (3) [perf18, perf19, perf20] [JBossINF] [0m[0m02:47:11,390 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN100001: Node perf21 left the cluster [JBossINF] [0m[0m02:47:11,392 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf18|10] (3) [perf18, perf19, perf20] [JBossINF] [0m[0m02:47:11,393 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN100001: Node perf21 left the cluster [JBossINF] [0m[0m02:47:11,393 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf18|10] (3) [perf18, perf19, perf20] [JBossINF] [0m[0m02:47:11,394 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN100001: Node perf21 left the cluster [JBossINF] [0m[31m02:47:12,082 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (remote-thread--p8-t60) ISPN000136: Error executing command LockControlCommand, writing keys []: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 0 milliseconds for key SessionCreationMetaDataKey(2UakcseFRb1cI_zcOL5fwpJnkR8j-fo74TsVeNSo) and requestor GlobalTx:perf20:25537. Lock is held by GlobalTx:perf19:118742 [JBossINF] at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:288) [JBossINF] at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.lock(DefaultLockManager.java:261) [JBossINF] at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.localLockCommandWork(PessimisticLockingInterceptor.java:209) [JBossINF] at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.lambda$new$0(PessimisticLockingInterceptor.java:46) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextThenApply(BaseAsyncInterceptor.java:81) [JBossINF] at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.visitLockControlCommand(PessimisticLockingInterceptor.java:192) [JBossINF] at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:117) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndHandle(BaseAsyncInterceptor.java:183) [JBossINF] at org.infinispan.interceptors.impl.TxInterceptor.visitLockControlCommand(TxInterceptor.java:222) [JBossINF] at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:117) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.lambda$new$0(BaseAsyncInterceptor.java:22) [JBossINF] at org.infinispan.interceptors.InvocationSuccessFunction.apply(InvocationSuccessFunction.java:25) [JBossINF] at org.infinispan.interceptors.impl.SimpleAsyncInvocationStage.addCallback(SimpleAsyncInvocationStage.java:70) [JBossINF] at org.infinispan.interceptors.InvocationStage.thenApply(InvocationStage.java:45) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.asyncInvokeNext(BaseAsyncInterceptor.java:224) [JBossINF] at org.infinispan.statetransfer.TransactionSynchronizerInterceptor.visitCommand(TransactionSynchronizerInterceptor.java:46) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndHandle(BaseAsyncInterceptor.java:185) [JBossINF] at org.infinispan.statetransfer.StateTransferInterceptor.visitLockControlCommand(StateTransferInterceptor.java:90) [JBossINF] at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:117) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:123) [JBossINF] at org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:90) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:56) [JBossINF] at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54) [JBossINF] at org.infinispan.interceptors.DDAsyncInterceptor.visitLockControlCommand(DDAsyncInterceptor.java:160) [JBossINF] at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:117) [JBossINF] at org.infinispan.interceptors.DDAsyncInterceptor.visitCommand(DDAsyncInterceptor.java:50) [JBossINF] at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invokeAsync(AsyncInterceptorChainImpl.java:234) [JBossINF] at org.infinispan.commands.control.LockControlCommand.invokeAsync(LockControlCommand.java:126) [JBossINF] at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:94) [JBossINF] at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99) [JBossINF] at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71) [JBossINF] at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47) [JBossINF] at java.lang.Thread.run(Thread.java:748) {noformat} was: The error was observed in scenario [eap-7x-failover-ejb-ejbservlet-jvmkill-dist-sync|https://jenkins.hosts.mw...]: a 4 nodes cluster with a mod_jk load balancer where fail-over is introduced by killing the node jvm. The error was observed on node {{*[perf19|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-ejb-ejbservlet-tests/job/eap-7x-failover-ejb-ejbservlet-jvmkill-dist-sync_JJB/4/console-perf19/]*}} right after it was re-started after being killed: {noformat} [JBossINF] [0m[0m02:43:51,746 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 12738ms - Started 973 of 1160 services (477 services are lazy, passive or on-demand) [JBossINF] [0m[0m02:44:37,268 INFO [org.jboss.ejb.client] (default task-1) JBoss EJB Client version 4.0.11.Final [JBossINF] [0m[0m02:44:56,928 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|8] (3) [perf21, perf18, perf19] [JBossINF] [0m[0m02:44:56,942 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN100001: Node perf20 left the cluster [JBossINF] [0m[0m02:44:56,948 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|8] (3) [perf21, perf18, perf19] [JBossINF] [0m[0m02:44:56,949 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN100001: Node perf20 left the cluster [JBossINF] [0m[0m02:44:56,949 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|8] (3) [perf21, perf18, perf19] [JBossINF] [0m[0m02:44:56,950 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN100001: Node perf20 left the cluster [JBossINF] [0m[0m02:44:56,955 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|8] (3) [perf21, perf18, perf19] [JBossINF] [0m[0m02:44:56,978 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN100001: Node perf20 left the cluster [JBossINF] [0m[0m02:45:58,220 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|9] (4) [perf21, perf18, perf19, perf20] [JBossINF] [0m[0m02:45:58,221 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN100000: Node perf20 joined the cluster [JBossINF] [0m[0m02:45:58,223 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|9] (4) [perf21, perf18, perf19, perf20] [JBossINF] [0m[0m02:45:58,223 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN100000: Node perf20 joined the cluster [JBossINF] [0m[0m02:45:58,224 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|9] (4) [perf21, perf18, perf19, perf20] [JBossINF] [0m[0m02:45:58,225 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN100000: Node perf20 joined the cluster [JBossINF] [0m[0m02:45:58,225 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|9] (4) [perf21, perf18, perf19, perf20] [JBossINF] [0m[0m02:45:58,226 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN100000: Node perf20 joined the cluster [JBossINF] [0m[0m02:47:11,387 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf18|10] (3) [perf18, perf19, perf20] [JBossINF] [0m[0m02:47:11,389 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN100001: Node perf21 left the cluster [JBossINF] [0m[0m02:47:11,390 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf18|10] (3) [perf18, perf19, perf20] [JBossINF] [0m[0m02:47:11,390 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN100001: Node perf21 left the cluster [JBossINF] [0m[0m02:47:11,392 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf18|10] (3) [perf18, perf19, perf20] [JBossINF] [0m[0m02:47:11,393 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN100001: Node perf21 left the cluster [JBossINF] [0m[0m02:47:11,393 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf18|10] (3) [perf18, perf19, perf20] [JBossINF] [0m[0m02:47:11,394 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN100001: Node perf21 left the cluster [JBossINF] [0m[31m02:47:12,082 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (remote-thread--p8-t60) ISPN000136: Error executing command LockControlCommand, writing keys []: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 0 milliseconds for key SessionCreationMetaDataKey(2UakcseFRb1cI_zcOL5fwpJnkR8j-fo74TsVeNSo) and requestor GlobalTx:perf20:25537. Lock is held by GlobalTx:perf19:118742 [JBossINF] at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:288) [JBossINF] at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.lock(DefaultLockManager.java:261) [JBossINF] at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.localLockCommandWork(PessimisticLockingInterceptor.java:209) [JBossINF] at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.lambda$new$0(PessimisticLockingInterceptor.java:46) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextThenApply(BaseAsyncInterceptor.java:81) [JBossINF] at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.visitLockControlCommand(PessimisticLockingInterceptor.java:192) [JBossINF] at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:117) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndHandle(BaseAsyncInterceptor.java:183) [JBossINF] at org.infinispan.interceptors.impl.TxInterceptor.visitLockControlCommand(TxInterceptor.java:222) [JBossINF] at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:117) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.lambda$new$0(BaseAsyncInterceptor.java:22) [JBossINF] at org.infinispan.interceptors.InvocationSuccessFunction.apply(InvocationSuccessFunction.java:25) [JBossINF] at org.infinispan.interceptors.impl.SimpleAsyncInvocationStage.addCallback(SimpleAsyncInvocationStage.java:70) [JBossINF] at org.infinispan.interceptors.InvocationStage.thenApply(InvocationStage.java:45) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.asyncInvokeNext(BaseAsyncInterceptor.java:224) [JBossINF] at org.infinispan.statetransfer.TransactionSynchronizerInterceptor.visitCommand(TransactionSynchronizerInterceptor.java:46) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndHandle(BaseAsyncInterceptor.java:185) [JBossINF] at org.infinispan.statetransfer.StateTransferInterceptor.visitLockControlCommand(StateTransferInterceptor.java:90) [JBossINF] at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:117) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:123) [JBossINF] at org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:90) [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:56) [JBossINF] at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54) [JBossINF] at org.infinispan.interceptors.DDAsyncInterceptor.visitLockControlCommand(DDAsyncInterceptor.java:160) [JBossINF] at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:117) [JBossINF] at org.infinispan.interceptors.DDAsyncInterceptor.visitCommand(DDAsyncInterceptor.java:50) [JBossINF] at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invokeAsync(AsyncInterceptorChainImpl.java:234) [JBossINF] at org.infinispan.commands.control.LockControlCommand.invokeAsync(LockControlCommand.java:126) [JBossINF] at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:94) [JBossINF] at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99) [JBossINF] at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71) [JBossINF] at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47) [JBossINF] at java.lang.Thread.run(Thread.java:748) {noformat} > ISPN000136: Error executing command LockControlCommand > ------------------------------------------------------- > > Key: WFLY-10756 > URL: https://issues.jboss.org/browse/WFLY-10756 > Project: WildFly > Issue Type: Bug > Components: Clustering > Affects Versions: 14.0.0.CR1 > Reporter: tommaso borgato > Assignee: Paul Ferraro > > The error was observed in scenario {{*[eap-7x-failover-ejb-ejbservlet-jvmkill-dist-sync|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-ejb-ejbservlet-tests/job/eap-7x-failover-ejb-ejbservlet-jvmkill-dist-sync_JJB/4/]*}}: a 4 nodes cluster with a mod_jk load balancer where fail-over is introduced by killing the node jvm. > The error was observed on node {{*[perf19|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-ejb-ejbservlet-tests/job/eap-7x-failover-ejb-ejbservlet-jvmkill-dist-sync_JJB/4/console-perf19/]*}} right after it was re-started after being killed: > {noformat} > [JBossINF] [0m[0m02:43:51,746 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 12738ms - Started 973 of 1160 services (477 services are lazy, passive or on-demand) > [JBossINF] [0m[0m02:44:37,268 INFO [org.jboss.ejb.client] (default task-1) JBoss EJB Client version 4.0.11.Final > [JBossINF] [0m[0m02:44:56,928 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|8] (3) [perf21, perf18, perf19] > [JBossINF] [0m[0m02:44:56,942 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN100001: Node perf20 left the cluster > [JBossINF] [0m[0m02:44:56,948 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|8] (3) [perf21, perf18, perf19] > [JBossINF] [0m[0m02:44:56,949 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN100001: Node perf20 left the cluster > [JBossINF] [0m[0m02:44:56,949 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|8] (3) [perf21, perf18, perf19] > [JBossINF] [0m[0m02:44:56,950 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN100001: Node perf20 left the cluster > [JBossINF] [0m[0m02:44:56,955 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|8] (3) [perf21, perf18, perf19] > [JBossINF] [0m[0m02:44:56,978 INFO [org.infinispan.CLUSTER] (thread-65,ejb,perf19) ISPN100001: Node perf20 left the cluster > [JBossINF] [0m[0m02:45:58,220 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|9] (4) [perf21, perf18, perf19, perf20] > [JBossINF] [0m[0m02:45:58,221 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN100000: Node perf20 joined the cluster > [JBossINF] [0m[0m02:45:58,223 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|9] (4) [perf21, perf18, perf19, perf20] > [JBossINF] [0m[0m02:45:58,223 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN100000: Node perf20 joined the cluster > [JBossINF] [0m[0m02:45:58,224 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|9] (4) [perf21, perf18, perf19, perf20] > [JBossINF] [0m[0m02:45:58,225 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN100000: Node perf20 joined the cluster > [JBossINF] [0m[0m02:45:58,225 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf21|9] (4) [perf21, perf18, perf19, perf20] > [JBossINF] [0m[0m02:45:58,226 INFO [org.infinispan.CLUSTER] (thread-71,ejb,perf19) ISPN100000: Node perf20 joined the cluster > [JBossINF] [0m[0m02:47:11,387 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf18|10] (3) [perf18, perf19, perf20] > [JBossINF] [0m[0m02:47:11,389 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN100001: Node perf21 left the cluster > [JBossINF] [0m[0m02:47:11,390 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf18|10] (3) [perf18, perf19, perf20] > [JBossINF] [0m[0m02:47:11,390 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN100001: Node perf21 left the cluster > [JBossINF] [0m[0m02:47:11,392 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf18|10] (3) [perf18, perf19, perf20] > [JBossINF] [0m[0m02:47:11,393 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN100001: Node perf21 left the cluster > [JBossINF] [0m[0m02:47:11,393 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN000094: Received new cluster view for channel ejb: [perf18|10] (3) [perf18, perf19, perf20] > [JBossINF] [0m[0m02:47:11,394 INFO [org.infinispan.CLUSTER] (thread-87,ejb,perf19) ISPN100001: Node perf21 left the cluster > [JBossINF] [0m[31m02:47:12,082 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (remote-thread--p8-t60) ISPN000136: Error executing command LockControlCommand, writing keys []: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 0 milliseconds for key SessionCreationMetaDataKey(2UakcseFRb1cI_zcOL5fwpJnkR8j-fo74TsVeNSo) and requestor GlobalTx:perf20:25537. Lock is held by GlobalTx:perf19:118742 > [JBossINF] at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:288) > [JBossINF] at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.lock(DefaultLockManager.java:261) > [JBossINF] at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.localLockCommandWork(PessimisticLockingInterceptor.java:209) > [JBossINF] at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.lambda$new$0(PessimisticLockingInterceptor.java:46) > [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextThenApply(BaseAsyncInterceptor.java:81) > [JBossINF] at org.infinispan.interceptors.locking.PessimisticLockingInterceptor.visitLockControlCommand(PessimisticLockingInterceptor.java:192) > [JBossINF] at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:117) > [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndHandle(BaseAsyncInterceptor.java:183) > [JBossINF] at org.infinispan.interceptors.impl.TxInterceptor.visitLockControlCommand(TxInterceptor.java:222) > [JBossINF] at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:117) > [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54) > [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.lambda$new$0(BaseAsyncInterceptor.java:22) > [JBossINF] at org.infinispan.interceptors.InvocationSuccessFunction.apply(InvocationSuccessFunction.java:25) > [JBossINF] at org.infinispan.interceptors.impl.SimpleAsyncInvocationStage.addCallback(SimpleAsyncInvocationStage.java:70) > [JBossINF] at org.infinispan.interceptors.InvocationStage.thenApply(InvocationStage.java:45) > [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.asyncInvokeNext(BaseAsyncInterceptor.java:224) > [JBossINF] at org.infinispan.statetransfer.TransactionSynchronizerInterceptor.visitCommand(TransactionSynchronizerInterceptor.java:46) > [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndHandle(BaseAsyncInterceptor.java:185) > [JBossINF] at org.infinispan.statetransfer.StateTransferInterceptor.visitLockControlCommand(StateTransferInterceptor.java:90) > [JBossINF] at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:117) > [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:123) > [JBossINF] at org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:90) > [JBossINF] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:56) > [JBossINF] at org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54) > [JBossINF] at org.infinispan.interceptors.DDAsyncInterceptor.visitLockControlCommand(DDAsyncInterceptor.java:160) > [JBossINF] at org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:117) > [JBossINF] at org.infinispan.interceptors.DDAsyncInterceptor.visitCommand(DDAsyncInterceptor.java:50) > [JBossINF] at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invokeAsync(AsyncInterceptorChainImpl.java:234) > [JBossINF] at org.infinispan.commands.control.LockControlCommand.invokeAsync(LockControlCommand.java:126) > [JBossINF] at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:94) > [JBossINF] at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99) > [JBossINF] at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71) > [JBossINF] at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47) > [JBossINF] at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 9 months

1
0
0 / 0

[JBoss JIRA] (WFLY-10755) ISPN000208: No live owners found for segments

by tommaso borgato (JIRA)

[ https://issues.jboss.org/browse/WFLY-10755?page=com.atlassian.jira.plugin... ] tommaso borgato updated WFLY-10755: ----------------------------------- Description: The error {{*"ISPN000208: No live owners found for segments"*}} was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...]. The scenario is composed of a 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database: {noformat} <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan"> <transport lock-timeout="60000"/> <distributed-cache owners="2" name="dist"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <file-store/> </distributed-cache> <replicated-cache name="repl"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <file-store/> </replicated-cache> <invalidation-cache name="offload"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES"> <table prefix="s"> <id-column name="id" type="VARCHAR(255)"/> <data-column name="datum" type="BYTEA"/> <timestamp-column name="version" type="BIGINT"/> </table> </jdbc-store> </invalidation-cache> </cache-container> {noformat} h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed 2 times on node dev212; The first time, right after Node dev214 left the cluster: {noformat} [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215] [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215 [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at java.lang.Thread.run(Thread.java:748) [JBossINF] ... [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214] {noformat} and, the second time, right after Node dev215 left the cluster: {noformat} [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212] {noformat} bq. please note that the log saying that node dev213 left the cluster, look suspicious: node dev213 was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 is a bit strange This run already used modified jgroups time-outs: {noformat} <protocol type="FD_ALL"> <property name="timeout">10000</property> <property name="interval">2000</property> <property name="timeout_check_interval">1000</property> </protocol> <protocol type="VERIFY_SUSPECT"> <property name="timeout">1000</property> </protocol> {noformat} h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were unmodified. h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were: {noformat} <FD_ALL timeout="60000" interval="15000" timeout_check_interval="5000" /> {noformat} In this run, the error is observed on node dev212: {noformat} [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected {noformat} but the logs on dev214 show the node wasn't down; it was just restarted and was logging the following: {noformat} [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990 [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand) 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started! [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms) [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms) [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message ... {noformat} h2. Fourth run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 23|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] We made an attempt with the following setting on the same segment (perf27-eap) (the idea being allowing some longer time-out to verify suspected nodes): {noformat} <protocol type="FD_ALL"> <property name="timeout">5000</property> <property name="interval">1000</property> <property name="timeout_check_interval">2000</property> </protocol> <protocol type="VERIFY_SUSPECT"> <property name="timeout">5000</property> </protocol> {noformat} we didn't observe the error, but we observed the following {{*ERROR*}} on node {{*[dev215|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/23/console-dev215/]*}}: {noformat} [JBossINF] [0m[31m13:42:40,794 ERROR [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 [JBossINF] at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) [JBossINF] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) [JBossINF] at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:93) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:585) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:450) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.becomeCoordinator(ClusterTopologyManagerImpl.java:334) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:313) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:87) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:731) [JBossINF] at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) [JBossINF] at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37) [JBossINF] at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47) [JBossINF] at java.lang.Thread.run(Thread.java:748) [JBossINF] Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] ... 1 more [JBossINF] [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p16-t24) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p13-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p15-t2) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. {noformat} was: The error {{*"ISPN000208: No live owners found for segments"*}} was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...]. The scenario is composed of a 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database: {noformat} <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan"> <transport lock-timeout="60000"/> <distributed-cache owners="2" name="dist"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <file-store/> </distributed-cache> <replicated-cache name="repl"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <file-store/> </replicated-cache> <invalidation-cache name="offload"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES"> <table prefix="s"> <id-column name="id" type="VARCHAR(255)"/> <data-column name="datum" type="BYTEA"/> <timestamp-column name="version" type="BIGINT"/> </table> </jdbc-store> </invalidation-cache> </cache-container> {noformat} h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed 2 times on node dev212; The first time, right after Node dev214 left the cluster: {noformat} [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215] [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215 [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at java.lang.Thread.run(Thread.java:748) [JBossINF] ... [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214] {noformat} and, the second time, right after Node dev215 left the cluster: {noformat} [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212] {noformat} bq. please note that the log saying that node dev213 left the cluster, look suspicious: node dev213 was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 is a bit strange This run already used modified jgroups time-outs: {noformat} <protocol type="FD_ALL"> <property name="timeout">10000</property> <property name="interval">2000</property> <property name="timeout_check_interval">1000</property> </protocol> <protocol type="VERIFY_SUSPECT"> <property name="timeout">1000</property> </protocol> {noformat} h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were unmodified. h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were: {noformat} <FD_ALL timeout="60000" interval="15000" timeout_check_interval="5000" /> {noformat} In this run, the error is observed on node dev212: {noformat} [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected {noformat} but the logs on dev214 show the node wasn't down; it was just restarted and was logging the following: {noformat} [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990 [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand) 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started! [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms) [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms) [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message ... {noformat} h2. Fourth run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 23|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] We made an attempt with the following setting on the same segment (perf27-eap): {noformat} <protocol type="FD_ALL"> <property name="timeout">5000</property> <property name="interval">1000</property> <property name="timeout_check_interval">2000</property> </protocol> <protocol type="VERIFY_SUSPECT"> <property name="timeout">5000</property> </protocol> {noformat} we didn't observe the error, but we observed the following {{*ERROR*}} on node {{*[dev215|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/23/console-dev215/]*}}: {noformat} [JBossINF] [0m[31m13:42:40,794 ERROR [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 [JBossINF] at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) [JBossINF] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) [JBossINF] at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:93) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:585) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:450) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.becomeCoordinator(ClusterTopologyManagerImpl.java:334) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:313) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:87) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:731) [JBossINF] at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) [JBossINF] at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37) [JBossINF] at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47) [JBossINF] at java.lang.Thread.run(Thread.java:748) [JBossINF] Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] ... 1 more [JBossINF] [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p16-t24) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p13-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p15-t2) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. {noformat} > ISPN000208: No live owners found for segments > --------------------------------------------- > > Key: WFLY-10755 > URL: https://issues.jboss.org/browse/WFLY-10755 > Project: WildFly > Issue Type: Bug > Components: Clustering > Affects Versions: 14.0.0.CR1 > Reporter: tommaso borgato > Assignee: Paul Ferraro > > The error {{*"ISPN000208: No live owners found for segments"*}} was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...]. > The scenario is composed of a 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database: > {noformat} > <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan"> > <transport lock-timeout="60000"/> > <distributed-cache owners="2" name="dist"> > <locking isolation="REPEATABLE_READ"/> > <transaction mode="BATCH"/> > <file-store/> > </distributed-cache> > <replicated-cache name="repl"> > <locking isolation="REPEATABLE_READ"/> > <transaction mode="BATCH"/> > <file-store/> > </replicated-cache> > <invalidation-cache name="offload"> > <locking isolation="REPEATABLE_READ"/> > <transaction mode="BATCH"/> > <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES"> > <table prefix="s"> > <id-column name="id" type="VARCHAR(255)"/> > <data-column name="datum" type="BYTEA"/> > <timestamp-column name="version" type="BIGINT"/> > </table> > </jdbc-store> > </invalidation-cache> > </cache-container> > {noformat} > h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] > The error was observed 2 times on node dev212; > The first time, right after Node dev214 left the cluster: > {noformat} > [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215] > [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215 > [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) > [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) > [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) > [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [JBossINF] at java.lang.Thread.run(Thread.java:748) > [JBossINF] > ... > [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214] > {noformat} > and, the second time, right after Node dev215 left the cluster: > {noformat} > [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster > [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] > [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] > [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212] > {noformat} > bq. please note that the log saying that node dev213 left the cluster, look suspicious: node dev213 was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 is a bit strange > This run already used modified jgroups time-outs: > {noformat} > <protocol type="FD_ALL"> > <property name="timeout">10000</property> > <property name="interval">2000</property> > <property name="timeout_check_interval">1000</property> > </protocol> > <protocol type="VERIFY_SUSPECT"> > <property name="timeout">1000</property> > </protocol> > {noformat} > h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] > The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were unmodified. > h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] > The error was observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were: > {noformat} > <FD_ALL timeout="60000" > interval="15000" > timeout_check_interval="5000" > /> > {noformat} > In this run, the error is observed on node dev212: > {noformat} > [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message > [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] > [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] > [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected > {noformat} > but the logs on dev214 show the node wasn't down; it was just restarted and was logging the following: > {noformat} > [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server > [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management > [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990 > [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand) > 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started! > [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms) > [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms) > [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster > [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster > [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster > [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster > [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster > [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster > [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster > [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster > [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > ... > {noformat} > h2. Fourth run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 23|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] > We made an attempt with the following setting on the same segment (perf27-eap) (the idea being allowing some longer time-out to verify suspected nodes): > {noformat} > <protocol type="FD_ALL"> > <property name="timeout">5000</property> > <property name="interval">1000</property> > <property name="timeout_check_interval">2000</property> > </protocol> > <protocol type="VERIFY_SUSPECT"> > <property name="timeout">5000</property> > </protocol> > {noformat} > we didn't observe the error, but we observed the following {{*ERROR*}} on node {{*[dev215|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/23/console-dev215/]*}}: > {noformat} > [JBossINF] [0m[31m13:42:40,794 ERROR [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 > [JBossINF] at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > [JBossINF] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > [JBossINF] at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:93) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:585) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:450) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.becomeCoordinator(ClusterTopologyManagerImpl.java:334) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:313) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:87) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:731) > [JBossINF] at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) > [JBossINF] at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37) > [JBossINF] at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47) > [JBossINF] at java.lang.Thread.run(Thread.java:748) > [JBossINF] Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 > [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) > [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) > [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) > [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [JBossINF] ... 1 more > [JBossINF] > [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p16-t24) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. > [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p13-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. > [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p15-t2) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. > [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. > {noformat} -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 9 months

1
0
0 / 0

[JBoss JIRA] (WFCORE-3909) jboss-cli script does not delegate -D properties to java process

by Marek Marusic (JIRA)

[ https://issues.jboss.org/browse/WFCORE-3909?page=com.atlassian.jira.plugi... ] Marek Marusic reassigned WFCORE-3909: ------------------------------------- Assignee: Marek Marusic (was: Jean-Francois Denise) > jboss-cli script does not delegate -D properties to java process > ---------------------------------------------------------------- > > Key: WFCORE-3909 > URL: https://issues.jboss.org/browse/WFCORE-3909 > Project: WildFly Core > Issue Type: Bug > Components: CLI, Scripts > Affects Versions: 5.0.0.Final > Reporter: Erich Duda > Assignee: Marek Marusic > > *Scenario:* I want to change logging configuration of CLI using system properties. My expectation is that I just put the properties on command line as arguments. > {code} > ./jboss-cli.sh -Dlogging.configuration=... -Djboss.cli.log.level=TRACE -Djboss.cli.log.level=TRACE > {code} > However what I actually have to do is to put the properties into the {{JAVA_OPTS}} env variable. > {code} > JAVA_OPTS="-Dlogging.configuration=... -Djboss.cli.log.level=TRACE -Djboss.cli.log.level=TRACE" ./jboss-cli.sh > {code} > This is different behavior than I am used to standalone.sh script. IMO it is more natural to put the properties as arguments of the script. -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 9 months

1
0
0 / 0

[JBoss JIRA] (DROOLS-2802) Create incremental compilation tests with DMN resources

by Tibor Zimányi (JIRA)

Tibor Zimányi created DROOLS-2802: ------------------------------------- Summary: Create incremental compilation tests with DMN resources Key: DROOLS-2802 URL: https://issues.jboss.org/browse/DROOLS-2802 Project: Drools Issue Type: Story Components: dmn engine Affects Versions: 7.9.0.Final Reporter: Tibor Zimányi Assignee: Tibor Zimányi We currently don't have incremental compilation tests that use DMN resources. -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 9 months

1
0
0 / 0

[JBoss JIRA] (WFLY-10588) System property -Djboss.server.log.dir ignored

by Bartosz Baranowski (JIRA)

[ https://issues.jboss.org/browse/WFLY-10588?page=com.atlassian.jira.plugin... ] Bartosz Baranowski resolved WFLY-10588. --------------------------------------- Resolution: Rejected > System property -Djboss.server.log.dir ignored > ---------------------------------------------- > > Key: WFLY-10588 > URL: https://issues.jboss.org/browse/WFLY-10588 > Project: WildFly > Issue Type: Bug > Components: Server > Affects Versions: 13.0.0.Final > Environment: java version "9.0.1" > Java(TM) SE Runtime Environment (build 9.0.1+11) > Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode) > Default locale: de_DE, platform encoding: UTF-8 > OS name: "linux", version: "3.13.0-93-generic", arch: "amd64", family: "unix" > Reporter: Gunther v. Wolffersdorff > Assignee: Bartosz Baranowski > > When starting a doman host node with jvm parameter -Djboss.server.log.dir this parameter will not be set for the subsequent server processes. > This was correct in WildFly 10.1.0.Final -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 9 months

1
0
0 / 0

[JBoss JIRA] (DROOLS-1807) Verification & Validation: Add three states for running the validation

by Jozef Marko (JIRA)

[ https://issues.jboss.org/browse/DROOLS-1807?page=com.atlassian.jira.plugi... ] Jozef Marko updated DROOLS-1807: -------------------------------- Tester: Jozef Marko Labels: drools-tools (was: ) > Verification & Validation: Add three states for running the validation > ----------------------------------------------------------------------- > > Key: DROOLS-1807 > URL: https://issues.jboss.org/browse/DROOLS-1807 > Project: Drools > Issue Type: Enhancement > Components: Guided Decision Table Editor > Reporter: Anton Giertli > Assignee: Toni Rikkola > Labels: drools-tools > > The statuses would be managed with system properties and extend the current on/off switch we have. > It will be possible to change the status for each validation check type. Check types are for example redundancy, range, missing columns. > > * Reactive > The "normal way" you make a change and the verification runs instantly > * Off > Verification is disabled > * On demand > Verification only runs when requested. Can be useful for large dtables where you for example change every value in a column and just really care about the verification status after all the changes. > Few ideas to think about: > # Might be worth it to also store the state per dtable. > # If V&V runs for over X amount of seconds. Ask if the user wants to change the setting from "Reactive" to "On Demand" for this table. -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 9 months

1
0
0 / 0

[JBoss JIRA] (WFLY-10755) ISPN000208: No live owners found for segments

by tommaso borgato (JIRA)

[ https://issues.jboss.org/browse/WFLY-10755?page=com.atlassian.jira.plugin... ] tommaso borgato updated WFLY-10755: ----------------------------------- Description: The error {{*"ISPN000208: No live owners found for segments"*}} was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...]. The scenario is composed of a 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database: {noformat} <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan"> <transport lock-timeout="60000"/> <distributed-cache owners="2" name="dist"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <file-store/> </distributed-cache> <replicated-cache name="repl"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <file-store/> </replicated-cache> <invalidation-cache name="offload"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES"> <table prefix="s"> <id-column name="id" type="VARCHAR(255)"/> <data-column name="datum" type="BYTEA"/> <timestamp-column name="version" type="BIGINT"/> </table> </jdbc-store> </invalidation-cache> </cache-container> {noformat} h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed 2 times on node dev212; The first time, right after Node dev214 left the cluster: {noformat} [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215] [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215 [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at java.lang.Thread.run(Thread.java:748) [JBossINF] ... [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214] {noformat} and, the second time, right after Node dev215 left the cluster: {noformat} [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212] {noformat} bq. please note that the log saying that node dev213 left the cluster, look suspicious: node dev213 was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 is a bit strange This run already used modified jgroups time-outs: {noformat} <protocol type="FD_ALL"> <property name="timeout">10000</property> <property name="interval">2000</property> <property name="timeout_check_interval">1000</property> </protocol> <protocol type="VERIFY_SUSPECT"> <property name="timeout">1000</property> </protocol> {noformat} h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were unmodified. h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were: {noformat} <FD_ALL timeout="60000" interval="15000" timeout_check_interval="5000" /> {noformat} In this run, the error is observed on node dev212: {noformat} [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected {noformat} but the logs on dev214 show the node wasn't down; it was just restarted and was logging the following: {noformat} [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990 [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand) 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started! [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms) [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms) [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message ... {noformat} h2. Fourth run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 23|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] We made an attempt with the following setting on the same segment (perf27-eap): {noformat} <protocol type="FD_ALL"> <property name="timeout">5000</property> <property name="interval">1000</property> <property name="timeout_check_interval">2000</property> </protocol> <protocol type="VERIFY_SUSPECT"> <property name="timeout">5000</property> </protocol> {noformat} we didn't observe the error, but we observed the following {{*ERROR*}} on node {{*[dev215|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/23/console-dev215/]*}}: {noformat} [JBossINF] [0m[31m13:42:40,794 ERROR [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 [JBossINF] at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) [JBossINF] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) [JBossINF] at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:93) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:585) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:450) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.becomeCoordinator(ClusterTopologyManagerImpl.java:334) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:313) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:87) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:731) [JBossINF] at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) [JBossINF] at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37) [JBossINF] at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47) [JBossINF] at java.lang.Thread.run(Thread.java:748) [JBossINF] Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] ... 1 more [JBossINF] [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p16-t24) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p13-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p15-t2) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. {noformat} was: The error {{*"ISPN000208: No live owners found for segments"*}} was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...]. The scenario is composed of a 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database: {noformat} <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan"> <transport lock-timeout="60000"/> <distributed-cache owners="2" name="dist"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <file-store/> </distributed-cache> <replicated-cache name="repl"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <file-store/> </replicated-cache> <invalidation-cache name="offload"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES"> <table prefix="s"> <id-column name="id" type="VARCHAR(255)"/> <data-column name="datum" type="BYTEA"/> <timestamp-column name="version" type="BIGINT"/> </table> </jdbc-store> </invalidation-cache> </cache-container> {noformat} h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed 2 times on node dev212; The first time, right after Node dev214 left the cluster: {noformat} [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215] [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215 [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at java.lang.Thread.run(Thread.java:748) [JBossINF] ... [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214] {noformat} and, the second time, right after Node dev215 left the cluster: {noformat} [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212] {noformat} bq. please note that the log saying that node dev213 left the cluster, look suspicious: node dev213 was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 is a bit strange This run already used modified jgroups time-outs: {noformat} <protocol type="FD_ALL"> <property name="timeout">10000</property> <property name="interval">2000</property> <property name="timeout_check_interval">1000</property> </protocol> <protocol type="VERIFY_SUSPECT"> <property name="timeout">1000</property> </protocol> {noformat} h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were unmodified. h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were: {noformat} <FD_ALL timeout="60000" interval="15000" timeout_check_interval="5000" /> {noformat} In this run, the error is observed on node dev212: {noformat} [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected {noformat} but the logs on dev214 show the node wasn't down; it was just restarted and was logging the following: {noformat} [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990 [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand) 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started! [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms) [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms) [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message ... {noformat} h2. Fourth run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 23|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] We made an attempt with the following setting on the same segment (perf27-eap): {noformat} <protocol type="FD_ALL"> <property name="timeout">5000</property> <property name="interval">1000</property> <property name="timeout_check_interval">2000</property> </protocol> <protocol type="VERIFY_SUSPECT"> <property name="timeout">5000</property> </protocol> {noformat} we didn't observe the error, but we observed the following {{*[ERROR|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/23/console-dev215/]*}} on node *dev215*: {noformat} [JBossINF] [0m[31m13:42:40,794 ERROR [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 [JBossINF] at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) [JBossINF] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) [JBossINF] at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:93) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:585) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:450) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.becomeCoordinator(ClusterTopologyManagerImpl.java:334) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:313) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:87) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:731) [JBossINF] at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) [JBossINF] at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37) [JBossINF] at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47) [JBossINF] at java.lang.Thread.run(Thread.java:748) [JBossINF] Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] ... 1 more [JBossINF] [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p16-t24) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p13-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p15-t2) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. {noformat} > ISPN000208: No live owners found for segments > --------------------------------------------- > > Key: WFLY-10755 > URL: https://issues.jboss.org/browse/WFLY-10755 > Project: WildFly > Issue Type: Bug > Components: Clustering > Affects Versions: 14.0.0.CR1 > Reporter: tommaso borgato > Assignee: Paul Ferraro > > The error {{*"ISPN000208: No live owners found for segments"*}} was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...]. > The scenario is composed of a 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database: > {noformat} > <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan"> > <transport lock-timeout="60000"/> > <distributed-cache owners="2" name="dist"> > <locking isolation="REPEATABLE_READ"/> > <transaction mode="BATCH"/> > <file-store/> > </distributed-cache> > <replicated-cache name="repl"> > <locking isolation="REPEATABLE_READ"/> > <transaction mode="BATCH"/> > <file-store/> > </replicated-cache> > <invalidation-cache name="offload"> > <locking isolation="REPEATABLE_READ"/> > <transaction mode="BATCH"/> > <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES"> > <table prefix="s"> > <id-column name="id" type="VARCHAR(255)"/> > <data-column name="datum" type="BYTEA"/> > <timestamp-column name="version" type="BIGINT"/> > </table> > </jdbc-store> > </invalidation-cache> > </cache-container> > {noformat} > h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] > The error was observed 2 times on node dev212; > The first time, right after Node dev214 left the cluster: > {noformat} > [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215] > [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215 > [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) > [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) > [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) > [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [JBossINF] at java.lang.Thread.run(Thread.java:748) > [JBossINF] > ... > [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214] > {noformat} > and, the second time, right after Node dev215 left the cluster: > {noformat} > [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster > [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] > [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] > [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212] > {noformat} > bq. please note that the log saying that node dev213 left the cluster, look suspicious: node dev213 was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 is a bit strange > This run already used modified jgroups time-outs: > {noformat} > <protocol type="FD_ALL"> > <property name="timeout">10000</property> > <property name="interval">2000</property> > <property name="timeout_check_interval">1000</property> > </protocol> > <protocol type="VERIFY_SUSPECT"> > <property name="timeout">1000</property> > </protocol> > {noformat} > h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] > The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were unmodified. > h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] > The error was observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were: > {noformat} > <FD_ALL timeout="60000" > interval="15000" > timeout_check_interval="5000" > /> > {noformat} > In this run, the error is observed on node dev212: > {noformat} > [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message > [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] > [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] > [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected > {noformat} > but the logs on dev214 show the node wasn't down; it was just restarted and was logging the following: > {noformat} > [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server > [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management > [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990 > [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand) > 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started! > [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms) > [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms) > [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster > [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster > [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster > [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster > [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster > [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster > [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster > [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster > [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > ... > {noformat} > h2. Fourth run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 23|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] > We made an attempt with the following setting on the same segment (perf27-eap): > {noformat} > <protocol type="FD_ALL"> > <property name="timeout">5000</property> > <property name="interval">1000</property> > <property name="timeout_check_interval">2000</property> > </protocol> > <protocol type="VERIFY_SUSPECT"> > <property name="timeout">5000</property> > </protocol> > {noformat} > we didn't observe the error, but we observed the following {{*ERROR*}} on node {{*[dev215|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/23/console-dev215/]*}}: > {noformat} > [JBossINF] [0m[31m13:42:40,794 ERROR [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 > [JBossINF] at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > [JBossINF] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > [JBossINF] at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:93) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:585) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:450) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.becomeCoordinator(ClusterTopologyManagerImpl.java:334) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:313) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:87) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:731) > [JBossINF] at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) > [JBossINF] at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37) > [JBossINF] at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47) > [JBossINF] at java.lang.Thread.run(Thread.java:748) > [JBossINF] Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 > [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) > [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) > [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) > [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [JBossINF] ... 1 more > [JBossINF] > [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p16-t24) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. > [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p13-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. > [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p15-t2) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. > [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. > {noformat} -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 9 months

1
0
0 / 0

[JBoss JIRA] (WFLY-10755) ISPN000208: No live owners found for segments

by tommaso borgato (JIRA)

[ https://issues.jboss.org/browse/WFLY-10755?page=com.atlassian.jira.plugin... ] tommaso borgato updated WFLY-10755: ----------------------------------- Description: The error {{*"ISPN000208: No live owners found for segments"*}} was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...]. The scenario is composed of a 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database: {noformat} <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan"> <transport lock-timeout="60000"/> <distributed-cache owners="2" name="dist"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <file-store/> </distributed-cache> <replicated-cache name="repl"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <file-store/> </replicated-cache> <invalidation-cache name="offload"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES"> <table prefix="s"> <id-column name="id" type="VARCHAR(255)"/> <data-column name="datum" type="BYTEA"/> <timestamp-column name="version" type="BIGINT"/> </table> </jdbc-store> </invalidation-cache> </cache-container> {noformat} h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed 2 times on node dev212; The first time, right after Node dev214 left the cluster: {noformat} [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215] [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215 [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at java.lang.Thread.run(Thread.java:748) [JBossINF] ... [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214] {noformat} and, the second time, right after Node dev215 left the cluster: {noformat} [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212] {noformat} bq. please note that the log saying that node dev213 left the cluster, look suspicious: node dev213 was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 is a bit strange This run already used modified jgroups time-outs: {noformat} <protocol type="FD_ALL"> <property name="timeout">10000</property> <property name="interval">2000</property> <property name="timeout_check_interval">1000</property> </protocol> <protocol type="VERIFY_SUSPECT"> <property name="timeout">1000</property> </protocol> {noformat} h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were unmodified. h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were: {noformat} <FD_ALL timeout="60000" interval="15000" timeout_check_interval="5000" /> {noformat} In this run, the error is observed on node dev212: {noformat} [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected {noformat} but the logs on dev214 show the node wasn't down; it was just restarted and was logging the following: {noformat} [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990 [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand) 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started! [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms) [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms) [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message ... {noformat} h2. Fourth run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 23|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] We made an attempt with the following setting on the same segment (perf27-eap): {noformat} <protocol type="FD_ALL"> <property name="timeout">5000</property> <property name="interval">1000</property> <property name="timeout_check_interval">2000</property> </protocol> <protocol type="VERIFY_SUSPECT"> <property name="timeout">5000</property> </protocol> {noformat} we didn't observe the error, but we observed the following {{*[ERROR|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/23/console-dev215/]*}} on node *dev215*: {noformat} [JBossINF] [0m[31m13:42:40,794 ERROR [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 [JBossINF] at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) [JBossINF] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) [JBossINF] at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:93) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:585) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:450) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.becomeCoordinator(ClusterTopologyManagerImpl.java:334) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:313) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:87) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:731) [JBossINF] at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) [JBossINF] at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37) [JBossINF] at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47) [JBossINF] at java.lang.Thread.run(Thread.java:748) [JBossINF] Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] ... 1 more [JBossINF] [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p16-t24) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p13-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p15-t2) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. {noformat} was: The error {{*"ISPN000208: No live owners found for segments"*}} was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...]. The scenario is composed of a 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database: {noformat} <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan"> <transport lock-timeout="60000"/> <distributed-cache owners="2" name="dist"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <file-store/> </distributed-cache> <replicated-cache name="repl"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <file-store/> </replicated-cache> <invalidation-cache name="offload"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES"> <table prefix="s"> <id-column name="id" type="VARCHAR(255)"/> <data-column name="datum" type="BYTEA"/> <timestamp-column name="version" type="BIGINT"/> </table> </jdbc-store> </invalidation-cache> </cache-container> {noformat} h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed 2 times on node dev212; The first time, right after Node dev214 left the cluster: {noformat} [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215] [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215 [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at java.lang.Thread.run(Thread.java:748) [JBossINF] ... [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214] {noformat} and, the second time, right after Node dev215 left the cluster: {noformat} [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212] {noformat} bq. please note that the log saying that node dev213 left the cluster, look suspicious: node dev213 was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 is a bit strange This run already used modified jgroups time-outs: {noformat} <protocol type="FD_ALL"> <property name="timeout">10000</property> <property name="interval">2000</property> <property name="timeout_check_interval">1000</property> </protocol> <protocol type="VERIFY_SUSPECT"> <property name="timeout">1000</property> </protocol> {noformat} h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were unmodified. h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were: {noformat} <FD_ALL timeout="60000" interval="15000" timeout_check_interval="5000" /> {noformat} In this run, the error is observed on node dev212: {noformat} [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected {noformat} but the logs on dev214 show the node wasn't down; it was just restarted and was logging the following: {noformat} [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990 [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand) 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started! [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms) [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms) [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message ... {noformat} h2. Fourth run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 23|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] We made an attempt with the following setting on the same segment (perf27-eap): {noformat} <protocol type="FD_ALL"> <property name="timeout">5000</property> <property name="interval">1000</property> <property name="timeout_check_interval">2000</property> </protocol> <protocol type="VERIFY_SUSPECT"> <property name="timeout">5000</property> </protocol> {noformat} we didn't observe the error, but we observed the following {{*[ERROR|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/23/console-dev215/]*}}: {noformat} [JBossINF] [0m[31m13:42:40,794 ERROR [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 [JBossINF] at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) [JBossINF] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) [JBossINF] at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:93) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:585) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:450) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.becomeCoordinator(ClusterTopologyManagerImpl.java:334) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:313) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:87) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:731) [JBossINF] at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) [JBossINF] at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37) [JBossINF] at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47) [JBossINF] at java.lang.Thread.run(Thread.java:748) [JBossINF] Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] ... 1 more [JBossINF] [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p16-t24) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p13-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p15-t2) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. {noformat} > ISPN000208: No live owners found for segments > --------------------------------------------- > > Key: WFLY-10755 > URL: https://issues.jboss.org/browse/WFLY-10755 > Project: WildFly > Issue Type: Bug > Components: Clustering > Affects Versions: 14.0.0.CR1 > Reporter: tommaso borgato > Assignee: Paul Ferraro > > The error {{*"ISPN000208: No live owners found for segments"*}} was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...]. > The scenario is composed of a 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database: > {noformat} > <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan"> > <transport lock-timeout="60000"/> > <distributed-cache owners="2" name="dist"> > <locking isolation="REPEATABLE_READ"/> > <transaction mode="BATCH"/> > <file-store/> > </distributed-cache> > <replicated-cache name="repl"> > <locking isolation="REPEATABLE_READ"/> > <transaction mode="BATCH"/> > <file-store/> > </replicated-cache> > <invalidation-cache name="offload"> > <locking isolation="REPEATABLE_READ"/> > <transaction mode="BATCH"/> > <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES"> > <table prefix="s"> > <id-column name="id" type="VARCHAR(255)"/> > <data-column name="datum" type="BYTEA"/> > <timestamp-column name="version" type="BIGINT"/> > </table> > </jdbc-store> > </invalidation-cache> > </cache-container> > {noformat} > h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] > The error was observed 2 times on node dev212; > The first time, right after Node dev214 left the cluster: > {noformat} > [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215] > [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215 > [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) > [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) > [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) > [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [JBossINF] at java.lang.Thread.run(Thread.java:748) > [JBossINF] > ... > [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214] > {noformat} > and, the second time, right after Node dev215 left the cluster: > {noformat} > [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster > [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] > [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] > [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212] > {noformat} > bq. please note that the log saying that node dev213 left the cluster, look suspicious: node dev213 was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 is a bit strange > This run already used modified jgroups time-outs: > {noformat} > <protocol type="FD_ALL"> > <property name="timeout">10000</property> > <property name="interval">2000</property> > <property name="timeout_check_interval">1000</property> > </protocol> > <protocol type="VERIFY_SUSPECT"> > <property name="timeout">1000</property> > </protocol> > {noformat} > h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] > The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were unmodified. > h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] > The error was observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were: > {noformat} > <FD_ALL timeout="60000" > interval="15000" > timeout_check_interval="5000" > /> > {noformat} > In this run, the error is observed on node dev212: > {noformat} > [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message > [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] > [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] > [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected > {noformat} > but the logs on dev214 show the node wasn't down; it was just restarted and was logging the following: > {noformat} > [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server > [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management > [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990 > [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand) > 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started! > [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms) > [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms) > [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster > [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster > [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster > [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster > [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster > [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster > [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster > [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster > [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > ... > {noformat} > h2. Fourth run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 23|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] > We made an attempt with the following setting on the same segment (perf27-eap): > {noformat} > <protocol type="FD_ALL"> > <property name="timeout">5000</property> > <property name="interval">1000</property> > <property name="timeout_check_interval">2000</property> > </protocol> > <protocol type="VERIFY_SUSPECT"> > <property name="timeout">5000</property> > </protocol> > {noformat} > we didn't observe the error, but we observed the following {{*[ERROR|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/23/console-dev215/]*}} on node *dev215*: > {noformat} > [JBossINF] [0m[31m13:42:40,794 ERROR [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 > [JBossINF] at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > [JBossINF] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > [JBossINF] at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:93) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:585) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:450) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.becomeCoordinator(ClusterTopologyManagerImpl.java:334) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:313) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:87) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:731) > [JBossINF] at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) > [JBossINF] at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37) > [JBossINF] at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47) > [JBossINF] at java.lang.Thread.run(Thread.java:748) > [JBossINF] Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 > [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) > [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) > [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) > [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [JBossINF] ... 1 more > [JBossINF] > [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p16-t24) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. > [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p13-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. > [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p15-t2) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. > [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. > {noformat} -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 9 months

1
0
0 / 0

[JBoss JIRA] (WFLY-10755) ISPN000208: No live owners found for segments

by tommaso borgato (JIRA)

[ https://issues.jboss.org/browse/WFLY-10755?page=com.atlassian.jira.plugin... ] tommaso borgato updated WFLY-10755: ----------------------------------- Description: The error {{*"ISPN000208: No live owners found for segments"*}} was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...]. The scenario is composed of a 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database: {noformat} <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan"> <transport lock-timeout="60000"/> <distributed-cache owners="2" name="dist"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <file-store/> </distributed-cache> <replicated-cache name="repl"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <file-store/> </replicated-cache> <invalidation-cache name="offload"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES"> <table prefix="s"> <id-column name="id" type="VARCHAR(255)"/> <data-column name="datum" type="BYTEA"/> <timestamp-column name="version" type="BIGINT"/> </table> </jdbc-store> </invalidation-cache> </cache-container> {noformat} h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed 2 times on node dev212; The first time, right after Node dev214 left the cluster: {noformat} [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215] [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215 [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at java.lang.Thread.run(Thread.java:748) [JBossINF] ... [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214] {noformat} and, the second time, right after Node dev215 left the cluster: {noformat} [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212] {noformat} bq. please note that the log saying that node dev213 left the cluster, look suspicious: node dev213 was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 is a bit strange This run already used modified jgroups time-outs: {noformat} <protocol type="FD_ALL"> <property name="timeout">10000</property> <property name="interval">2000</property> <property name="timeout_check_interval">1000</property> </protocol> <protocol type="VERIFY_SUSPECT"> <property name="timeout">1000</property> </protocol> {noformat} h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were unmodified. h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were: {noformat} <FD_ALL timeout="60000" interval="15000" timeout_check_interval="5000" /> {noformat} In this run, the error is observed on node dev212: {noformat} [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected {noformat} but the logs on dev214 show the node wasn't down; it was just restarted and was logging the following: {noformat} [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990 [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand) 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started! [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms) [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms) [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message ... {noformat} h2. Fourth run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 23|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] We made an attempt with the following setting on the same segment (perf27-eap): {noformat} <protocol type="FD_ALL"> <property name="timeout">5000</property> <property name="interval">1000</property> <property name="timeout_check_interval">2000</property> </protocol> <protocol type="VERIFY_SUSPECT"> <property name="timeout">5000</property> </protocol> {noformat} we didn't observe the error, but we observed the following {{*[ERROR|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/23/console-dev215/]*}}: {noformat} [JBossINF] [0m[31m13:42:40,794 ERROR [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 [JBossINF] at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) [JBossINF] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) [JBossINF] at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:93) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:585) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:450) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.becomeCoordinator(ClusterTopologyManagerImpl.java:334) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:313) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:87) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:731) [JBossINF] at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) [JBossINF] at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37) [JBossINF] at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47) [JBossINF] at java.lang.Thread.run(Thread.java:748) [JBossINF] Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] ... 1 more [JBossINF] [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p16-t24) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p13-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p15-t2) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. {noformat} was: The error {{*"ISPN000208: No live owners found for segments"*}} was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...]. The scenario is composed of a 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database: {noformat} <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan"> <transport lock-timeout="60000"/> <distributed-cache owners="2" name="dist"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <file-store/> </distributed-cache> <replicated-cache name="repl"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <file-store/> </replicated-cache> <invalidation-cache name="offload"> <locking isolation="REPEATABLE_READ"/> <transaction mode="BATCH"/> <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES"> <table prefix="s"> <id-column name="id" type="VARCHAR(255)"/> <data-column name="datum" type="BYTEA"/> <timestamp-column name="version" type="BIGINT"/> </table> </jdbc-store> </invalidation-cache> </cache-container> {noformat} h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed 2 times on node dev212; The first time, right after Node dev214 left the cluster: {noformat} [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215] [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215 [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at java.lang.Thread.run(Thread.java:748) [JBossINF] ... [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214] {noformat} and, the second time, right after Node dev215 left the cluster: {noformat} [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212] {noformat} bq. please note that the log saying that node dev213 left the cluster, look suspicious: node dev213 was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 is a bit strange This run already used modified jgroups time-outs: {noformat} <protocol type="FD_ALL"> <property name="timeout">10000</property> <property name="interval">2000</property> <property name="timeout_check_interval">1000</property> </protocol> <protocol type="VERIFY_SUSPECT"> <property name="timeout">1000</property> </protocol> {noformat} h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were unmodified. h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] The error was observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were: {noformat} <FD_ALL timeout="60000" interval="15000" timeout_check_interval="5000" /> {noformat} In this run, the error is observed on node dev212: {noformat} [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected {noformat} but the logs on dev214 show the node wasn't down; it was just restarted and was logging the following: {noformat} [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990 [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand) 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started! [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms) [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms) [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message ... {noformat} h2. Fourth run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 23|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] In an attempt with the following setting on the same segment (perf27-eap): {noformat} <protocol type="FD_ALL"> <property name="timeout">5000</property> <property name="interval">1000</property> <property name="timeout_check_interval">2000</property> </protocol> <protocol type="VERIFY_SUSPECT"> <property name="timeout">5000</property> </protocol> {noformat} we didn't observe the error, but we observed the following {{*[ERROR|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/23/console-dev215/]*}}: {noformat} [JBossINF] [0m[31m13:42:40,794 ERROR [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 [JBossINF] at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) [JBossINF] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) [JBossINF] at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:93) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:585) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:450) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.becomeCoordinator(ClusterTopologyManagerImpl.java:334) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:313) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:87) [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:731) [JBossINF] at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) [JBossINF] at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37) [JBossINF] at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47) [JBossINF] at java.lang.Thread.run(Thread.java:748) [JBossINF] Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [JBossINF] ... 1 more [JBossINF] [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p16-t24) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p13-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p15-t2) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. {noformat} > ISPN000208: No live owners found for segments > --------------------------------------------- > > Key: WFLY-10755 > URL: https://issues.jboss.org/browse/WFLY-10755 > Project: WildFly > Issue Type: Bug > Components: Clustering > Affects Versions: 14.0.0.CR1 > Reporter: tommaso borgato > Assignee: Paul Ferraro > > The error {{*"ISPN000208: No live owners found for segments"*}} was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...]. > The scenario is composed of a 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database: > {noformat} > <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan"> > <transport lock-timeout="60000"/> > <distributed-cache owners="2" name="dist"> > <locking isolation="REPEATABLE_READ"/> > <transaction mode="BATCH"/> > <file-store/> > </distributed-cache> > <replicated-cache name="repl"> > <locking isolation="REPEATABLE_READ"/> > <transaction mode="BATCH"/> > <file-store/> > </replicated-cache> > <invalidation-cache name="offload"> > <locking isolation="REPEATABLE_READ"/> > <transaction mode="BATCH"/> > <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES"> > <table prefix="s"> > <id-column name="id" type="VARCHAR(255)"/> > <data-column name="datum" type="BYTEA"/> > <timestamp-column name="version" type="BIGINT"/> > </table> > </jdbc-store> > </invalidation-cache> > </cache-container> > {noformat} > h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] > The error was observed 2 times on node dev212; > The first time, right after Node dev214 left the cluster: > {noformat} > [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215] > [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215 > [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) > [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) > [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) > [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [JBossINF] at java.lang.Thread.run(Thread.java:748) > [JBossINF] > ... > [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214] > {noformat} > and, the second time, right after Node dev215 left the cluster: > {noformat} > [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster > [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] > [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214] > [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212] > {noformat} > bq. please note that the log saying that node dev213 left the cluster, look suspicious: node dev213 was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 is a bit strange > This run already used modified jgroups time-outs: > {noformat} > <protocol type="FD_ALL"> > <property name="timeout">10000</property> > <property name="interval">2000</property> > <property name="timeout_check_interval">1000</property> > </protocol> > <protocol type="VERIFY_SUSPECT"> > <property name="timeout">1000</property> > </protocol> > {noformat} > h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] > The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were unmodified. > h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] > The error was observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were: > {noformat} > <FD_ALL timeout="60000" > interval="15000" > timeout_check_interval="5000" > /> > {noformat} > In this run, the error is observed on node dev212: > {noformat} > [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message > [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster > [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] > [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215] > [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected > {noformat} > but the logs on dev214 show the node wasn't down; it was just restarted and was logging the following: > {noformat} > [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server > [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management > [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990 > [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand) > 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started! > [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms) > [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms) > [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster > [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster > [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster > [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster > [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster > [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster > [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214] > [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster > [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster > [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message > ... > {noformat} > h2. Fourth run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 23|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...] > We made an attempt with the following setting on the same segment (perf27-eap): > {noformat} > <protocol type="FD_ALL"> > <property name="timeout">5000</property> > <property name="interval">1000</property> > <property name="timeout_check_interval">2000</property> > </protocol> > <protocol type="VERIFY_SUSPECT"> > <property name="timeout">5000</property> > </protocol> > {noformat} > we didn't observe the error, but we observed the following {{*[ERROR|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/23/console-dev215/]*}}: > {noformat} > [JBossINF] [0m[31m13:42:40,794 ERROR [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 > [JBossINF] at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > [JBossINF] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > [JBossINF] at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:93) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:585) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:450) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.becomeCoordinator(ClusterTopologyManagerImpl.java:334) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:313) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:87) > [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:731) > [JBossINF] at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) > [JBossINF] at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37) > [JBossINF] at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47) > [JBossINF] at java.lang.Thread.run(Thread.java:748) > [JBossINF] Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214 > [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) > [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) > [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) > [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [JBossINF] ... 1 more > [JBossINF] > [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p16-t24) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. > [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p13-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. > [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p15-t2) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. > [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214]. > {noformat} -- This message was sent by Atlassian JIRA (v7.5.0#75005)

7 years, 9 months

1
0
0 / 0

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

jboss-jira July 2018