[jboss-jira] [JBoss JIRA] (WFLY-6213) Failed to recover cluster state after the current node became the coordinator

Mon Feb 15 09:45:00 EST 2016

     [ https://issues.jboss.org/browse/WFLY-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michal Vinkler updated WFLY-6213:
---------------------------------
    Affects Version/s: 10.0.0.Final


> Failed to recover cluster state after the current node became the coordinator
> -----------------------------------------------------------------------------
>
>                 Key: WFLY-6213
>                 URL: https://issues.jboss.org/browse/WFLY-6213
>             Project: WildFly
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 10.0.0.Final
>            Reporter: Michal Vinkler
>            Assignee: Paul Ferraro
>            Priority: Critical
>
> Seen in failover tests - HA Singleton deployment scenarios - jvmkill failover type, random election policy
> Something weird is happening when a new election should take place after any of the nodes is killed (it doesn't have to be cluster coordinator nor the singleton provider):
> Timeline:
>  - perf18 was killed around 03:40:07
>  - perf19 was elected, JBEAP-2254 occured, perf19 left the cluster
>  - right after that perf20 was elected, JBEAP-2254 occured, perf20 left the cluster
>  - right after that perrf21 was elected and logged these errors (perf21 was the only node in the cluster that time):
> {code}
> [JBossINF] [0m[0m03:40:07,816 INFO  [org.wildfly.clustering.server] (notification-thread--p2-t1) WFLYCLSV0001: This node will now operate as the singleton provider of the jboss.deployment.unit."clusterbench-ee7-singleton-jbossall.ear".FIRST_MODULE_USE service
> [JBossINF] [0m[31m03:40:07,821 ERROR [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p15-t5) ISPN000196: Failed to recover cluster state after the current node became the coordinator: org.infinispan.commons.CacheException: Unsuccessful response received from node perf20: CacheNotFoundResponse
> [JBossINF] 	at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:480)
> [JBossINF] 	at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:348)
> [JBossINF] 	at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:286)
> [JBossINF] 	at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$1.run(ClusterTopologyManagerImpl.java:617)
> [JBossINF] 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [JBossINF] 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [JBossINF] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [JBossINF] 	at java.lang.Thread.run(Thread.java:745)
> ...
> [JBossINF] [0m[0m03:40:07,905 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-8,ee,perf21) ISPN000094: Received new cluster view for channel server: [perf21|6] (1) [perf21]
> [JBossINF] [0m[31m03:40:07,904 ERROR [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p14-t15) ISPN000196: Failed to recover cluster state after the current node became the coordinator: java.util.concurrent.ExecutionException: org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node perf20
> [JBossINF] 	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> [JBossINF] 	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
> [JBossINF] 	at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:471)
> [JBossINF] 	at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:348)
> [JBossINF] 	at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:286)
> [JBossINF] 	at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$1.run(ClusterTopologyManagerImpl.java:617)
> [JBossINF] 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [JBossINF] 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [JBossINF] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [JBossINF] 	at java.lang.Thread.run(Thread.java:745)
> [JBossINF] Caused by: org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node perf20
> [JBossINF] 	at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:46)
> [JBossINF] 	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:763)
> [JBossINF] 	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$174(JGroupsTransport.java:599)
> [JBossINF] 	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
> [JBossINF] 	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> [JBossINF] 	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> [JBossINF] 	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> [JBossINF] 	at org.infinispan.remoting.transport.jgroups.SingleResponseFuture.futureDone(SingleResponseFuture.java:30)
> [JBossINF] 	at org.jgroups.blocks.Request.checkCompletion(Request.java:169)
> [JBossINF] 	at org.jgroups.blocks.UnicastRequest.viewChange(UnicastRequest.java:164)
> [JBossINF] 	at org.jgroups.blocks.RequestCorrelator.receiveView(RequestCorrelator.java:331)
> [JBossINF] 	at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:242)
> [JBossINF] 	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:684)
> [JBossINF] 	at org.jgroups.JChannel.up(JChannel.java:738)
> [JBossINF] 	at org.jgroups.fork.ForkProtocolStack.up(ForkProtocolStack.java:123)
> [JBossINF] 	at org.jgroups.stack.Protocol.up(Protocol.java:374)
> [JBossINF] 	at org.jgroups.protocols.FORK.up(FORK.java:118)
> [JBossINF] 	at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
> [JBossINF] 	at org.jgroups.protocols.FlowControl.up(FlowControl.java:394)
> [JBossINF] 	at org.jgroups.protocols.FlowControl.up(FlowControl.java:394)
> [JBossINF] 	at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:735)
> [JBossINF] 	at org.jgroups.protocols.pbcast.CoordGmsImpl.handleViewChange(CoordGmsImpl.java:244)
> [JBossINF] 	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:925)
> [JBossINF] 	at org.jgroups.stack.Protocol.up(Protocol.java:412)
> [JBossINF] 	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:294)
> [JBossINF] 	at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:474)
> [JBossINF] 	at org.jgroups.protocols.pbcast.NAKACK2.deliverBatch(NAKACK2.java:982)
> [JBossINF] 	at org.jgroups.protocols.pbcast.NAKACK2.removeAndPassUp(NAKACK2.java:912)
> [JBossINF] 	at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:846)
> [JBossINF] 	at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:618)
> [JBossINF] 	at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
> [JBossINF] 	at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
> [JBossINF] 	at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:310)
> [JBossINF] 	at org.jgroups.protocols.MERGE3.up(MERGE3.java:285)
> [JBossINF] 	at org.jgroups.protocols.Discovery.up(Discovery.java:295)
> [JBossINF] 	at org.jgroups.protocols.TP.passMessageUp(TP.java:1577)
> [JBossINF] 	at org.jgroups.protocols.TP$3.run(TP.java:1511)
> [JBossINF] 	... 3 more
> {code}
> Link:
> http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-7x-failover-singleton-deployment-jvmkill-random-election-policy/9/console-perf21/


--
This message was sent by Atlassian JIRA
(v6.4.11#64026)