]
Miroslav Novak commented on WFLY-10736:
---------------------------------------
[~pferraro] There is no issue with test. Test was written for checking fault tolerance of
inbound connections for MDB. There are changes to messaging-activemq subsystem but the
rest of configuration is left in default as there is no reason to modify it for the test.
Still this issue was hit as "side effect". There is used default configuration
for infinispan/jgroups/ejb3.
Server in cluster hangs during start after previous kill
--------------------------------------------------------
Key: WFLY-10736
URL:
https://issues.jboss.org/browse/WFLY-10736
Project: WildFly
Issue Type: Bug
Components: Clustering
Reporter: Miroslav Novak
Assignee: Paul Ferraro
Priority: Blocker
Labels: blocker-WF14
Fix For: 15.0.0.Alpha1
Attachments: Lodh2TestCase.testRemoteJcaInboundOnly-traces.zip,
Lodh2TestCase.testRemoteJcaInboundOnly.zip, Lodh2TestCase.testRemoteJcaInboundOnly2.zip,
clusterKilTest.zip, logs-traces.zip, logs-traces2.zip, logs-with-workaround.zip,
node-1-thread-dump-before-kill-shutdown-sequence.txt, server-with-mdb.log,
standalone-full-ha-1.xml, standalone-full-ha-2.xml
There is regression in JGroups or Infinispan in one of our tests for fault tolerance of
JMS bridges. However work on JMS bridge appears to be unrelated. Issue was hit in WF
weekly run.
Test Scenario:
* There are two servers. InQueue is deployed on Node 1,
* OutQueue is deployed on Node 2. Both servers are started.
* Large byte messages are sent to InQueue deployed on Node 1. Bridge between
servers/queues transfers messages from node 1 to node 2.
* Node 1 is killed and started again.
* All messages are received from OutQueue deployed on Node 2.
Result:
Node 1 does not start after kill and hangs. There is following exception logged in node
2:
{code}
:26:17,894 INFO [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN100000: Node node-1
joined the cluster
09:26:18,520 INFO [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN000094: Received
new cluster view for channel ejb: [node-2|7] (2) [node-2, node-1]
09:26:18,521 INFO [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN100001: Node
node-1 left the cluster
09:26:18,521 INFO [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN000094: Received
new cluster view for channel ejb: [node-2|7] (2) [node-2, node-1]
09:26:18,522 INFO [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN100001: Node
node-1 left the cluster
09:26:18,522 INFO [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN000094: Received
new cluster view for channel ejb: [node-2|7] (2) [node-2, node-1]
09:26:18,522 INFO [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN100001: Node
node-1 left the cluster
09:26:18,522 INFO [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN000094: Received
new cluster view for channel ejb: [node-2|7] (2) [node-2, node-1]
09:26:18,523 INFO [org.infinispan.CLUSTER] (thread-12,ejb,node-2) ISPN100001: Node
node-1 left the cluster
09:26:18,868 INFO [org.infinispan.CLUSTER] (remote-thread--p5-t2) ISPN000310: Starting
cluster-wide rebalance for cache default, topology CacheTopology{id=17,
phase=READ_OLD_WRITE_ALL, rebalanceId=6, currentCH=ReplicatedConsistentHash{ns = 256,
owners = (2)[node-2: 122, node-1: 134]}, pendingCH=ReplicatedConsistentHash{ns = 256,
owners = (3)[node-2: 84, node-1: 90, node-1: 82]}, unionCH=null, actualMembers=[node-2,
node-1, node-1], persistentUUIDs=[12443bfb-e88a-46f3-919e-9213bf38ce19,
2873237f-d881-463f-8a5a-940bf1d764e5, a05ea8af-a83b-42a9-b937-dc2da1cae6d1]}
09:26:18,869 INFO [org.infinispan.CLUSTER] (remote-thread--p5-t2)
[Context=default][Scope=node-2]ISPN100002: Started rebalance with topology id 17
09:26:18,870 INFO [org.infinispan.CLUSTER] (transport-thread--p14-t5)
[Context=default][Scope=node-2]ISPN100003: Node node-2 finished rebalance phase with
topology id 17
09:26:18,981 INFO [org.infinispan.CLUSTER] (remote-thread--p5-t2)
[Context=default][Scope=node-1]ISPN100003: Node node-1 finished rebalance phase with
topology id 17
09:27:18,530 WARN [org.infinispan.topology.ClusterTopologyManagerImpl]
(transport-thread--p15-t4) ISPN000197: Error updating cluster member list:
org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for
responses for request 4 from node-1
at
org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_131]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[rt.jar:1.8.0_131]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[rt.jar:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[rt.jar:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[rt.jar:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131]
Suppressed: java.util.concurrent.ExecutionException:
org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for
responses for request 4 from node-1
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
[rt.jar:1.8.0_131]
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
[rt.jar:1.8.0_131]
at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:82)
at org.infinispan.remoting.transport.Transport.invokeRemotely(Transport.java:71)
at
org.infinispan.topology.ClusterTopologyManagerImpl.confirmMembersAvailable(ClusterTopologyManagerImpl.java:540)
at
org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheMembers(ClusterTopologyManagerImpl.java:523)
at
org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:334)
at
org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:85)
at
org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:745)
at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144)
at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33)
at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[rt.jar:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[rt.jar:1.8.0_131]
at
org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47)
... 1 more
Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out
waiting for responses for request 4 from node-1
at
org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_131]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[rt.jar:1.8.0_131]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[rt.jar:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[rt.jar:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[rt.jar:1.8.0_131]
... 1 more
[CIRCULAR REFERENCE:java.util.concurrent.ExecutionException:
org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for
responses for request 4 from node-1]
09:27:18,530 WARN [org.infinispan.topology.ClusterTopologyManagerImpl]
(transport-thread--p16-t4) ISPN000197: Error updating cluster member list:
org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for
responses for request 4 from node-1
at
org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_131]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[rt.jar:1.8.0_131]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[rt.jar:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[rt.jar:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[rt.jar:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131]
Suppressed: java.util.concurrent.ExecutionException:
org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for
responses for request 4 from node-1
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
[rt.jar:1.8.0_131]
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
[rt.jar:1.8.0_131]
at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:82)
at org.infinispan.remoting.transport.Transport.invokeRemotely(Transport.java:71)
at
org.infinispan.topology.ClusterTopologyManagerImpl.confirmMembersAvailable(ClusterTopologyManagerImpl.java:540)
at
org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheMembers(ClusterTopologyManagerImpl.java:523)
at
org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:334)
at
org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:85)
at
org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:745)
at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144)
at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33)
at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[rt.jar:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[rt.jar:1.8.0_131]
at
org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47)
... 1 more
Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out
waiting for responses for request 4 from node-1
at
org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_131]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[rt.jar:1.8.0_131]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[rt.jar:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[rt.jar:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[rt.jar:1.8.0_131]
... 1 more
[CIRCULAR REFERENCE:java.util.concurrent.ExecutionException:
org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for
responses for request 4 from node-1]
{code}
There is default JGroups udp stack configured which is used by Infinispan. Both of the
servers (jgroups udp) are bound to 127.0.0.1. Node 2 has port offset 1000.
Attaching thread dump from node 1 when it hangs during start.