[
https://issues.jboss.org/browse/ISPN-6599?page=com.atlassian.jira.plugin....
]
Gustavo Fernandes edited comment on ISPN-6599 at 5/12/16 12:00 PM:
-------------------------------------------------------------------
Tried running the reproducer using the PR linked and I am having issues starting the
servers: very often they cannot start and hung with errors:
{noformat}
2016-05-12 14:33:14,598 ERROR [org.jgroups.protocols.UNICAST3] (OOB-20,server2)
JGRP000039: server2: failed to deliver OOB message [dst: server2, src: server0 (4
headers), size=151 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER]:
java.lang.NullPointerException
...
2016-05-12 14:33:14,598 ERROR [org.jgroups.protocols.UNICAST3] (OOB-18,server2)
JGRP000039: server2: failed to deliver OOB message [dst: server2, src: server1 (4
headers), size=151 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER]:
java.lang.NullPointerException
...
2016-05-12 14:33:20,598 ERROR [org.infinispan.CLUSTER] (transport-thread--p4-t1)
ISPN000196: Failed to recover cluster state after the current node became the coordinator
(or after merge): java.util.concurrent.ExecutionException:
org.infinispan.util.concurrent.TimeoutException: Replication timeout for server0
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1907)
at
org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:75)
at
org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:578)
at
org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:448)
at
org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:365)
at
org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$328(ClusterTopologyManagerImpl.java:717)
at
org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$$Lambda$41/1877974992.call(Unknown
Source)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.infinispan.executors.SemaphoreCompletionService$QueueingTask.runInternal(SemaphoreCompletionService.java:172)
at
org.infinispan.executors.SemaphoreCompletionService$QueueingTask.run(SemaphoreCompletionService.java:151)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-05-12 14:33:20,599 FATAL [org.infinispan.CLUSTER] (transport-thread--p4-t1)
ISPN100004: After merge (or coordinator change), the coordinator failed to recover
cluster. Cluster members are [server2, server1, server0].
{noformat}
Attached are the trace logs for all servers plus thread dump
[^start.zip]
was (Author: gustavonalle):
Tried running the reproducer using the PR linked and I am having issues starting the
servers: very often they cannot start and hung with errors:
{noformat}
2016-05-12 14:33:14,598 ERROR [org.jgroups.protocols.UNICAST3] (OOB-20,server2)
JGRP000039: server2: failed to deliver OOB message [dst: server2, src: server0 (4
headers), size=151 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER]:
java.lang.NullPointerException
...
2016-05-12 14:33:14,598 ERROR [org.jgroups.protocols.UNICAST3] (OOB-18,server2)
JGRP000039: server2: failed to deliver OOB message [dst: server2, src: server1 (4
headers), size=151 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER]:
java.lang.NullPointerException
...
2016-05-12 14:33:20,598 ERROR [org.infinispan.CLUSTER] (transport-thread--p4-t1)
ISPN000196: Failed to recover cluster state after the current node became the coordinator
(or after merge): java.util.concurrent.ExecutionException:
org.infinispan.util.concurrent.TimeoutException: Replication timeout for server0
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1907)
at
org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:75)
at
org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:578)
at
org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:448)
at
org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:365)
at
org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$328(ClusterTopologyManagerImpl.java:717)
at
org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$$Lambda$41/1877974992.call(Unknown
Source)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.infinispan.executors.SemaphoreCompletionService$QueueingTask.runInternal(SemaphoreCompletionService.java:172)
at
org.infinispan.executors.SemaphoreCompletionService$QueueingTask.run(SemaphoreCompletionService.java:151)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-05-12 14:33:20,599 FATAL [org.infinispan.CLUSTER] (transport-thread--p4-t1)
ISPN100004: After merge (or coordinator change), the coordinator failed to recover
cluster. Cluster members are [server2, server1, server0].
{noformat}
Attached are the trace logs for all servers plus thread dump
PutAll operation in the Hot Rod client only partially completed
during topology changes
---------------------------------------------------------------------------------------
Key: ISPN-6599
URL:
https://issues.jboss.org/browse/ISPN-6599
Project: Infinispan
Issue Type: Bug
Components: Server
Affects Versions: 9.0.0.Alpha1
Reporter: Gustavo Fernandes
Assignee: Dan Berindei
Attachments: reproducer.zip, start.zip, trace.zip
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)