[infinispan-issues] [JBoss JIRA] (ISPN-6599) PutAll operation in the Hot Rod client only partially completed during topology changes

Thu May 12 12:01:00 EDT 2016

    [ https://issues.jboss.org/browse/ISPN-6599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204462#comment-13204462 ] 

Gustavo Fernandes edited comment on ISPN-6599 at 5/12/16 12:00 PM:
-------------------------------------------------------------------

Tried running the reproducer using the PR linked and I am having issues starting the servers: very often they cannot start and hung with errors:
{noformat}
2016-05-12 14:33:14,598 ERROR [org.jgroups.protocols.UNICAST3] (OOB-20,server2) JGRP000039: server2: failed to deliver OOB message [dst: server2, src: server0 (4 headers), size=151 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER]: java.lang.NullPointerException
...
2016-05-12 14:33:14,598 ERROR [org.jgroups.protocols.UNICAST3] (OOB-18,server2) JGRP000039: server2: failed to deliver OOB message [dst: server2, src: server1 (4 headers), size=151 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER]: java.lang.NullPointerException
...
2016-05-12 14:33:20,598 ERROR [org.infinispan.CLUSTER] (transport-thread--p4-t1) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: Replication timeout for server0
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1907)
        at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:75)
        at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:578)
        at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:448)
        at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:365)
        at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$328(ClusterTopologyManagerImpl.java:717)
        at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$$Lambda$41/1877974992.call(Unknown Source)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.infinispan.executors.SemaphoreCompletionService$QueueingTask.runInternal(SemaphoreCompletionService.java:172)
        at org.infinispan.executors.SemaphoreCompletionService$QueueingTask.run(SemaphoreCompletionService.java:151)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2016-05-12 14:33:20,599 FATAL [org.infinispan.CLUSTER] (transport-thread--p4-t1) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [server2, server1, server0].
{noformat}

Attached are the trace logs for all servers plus thread dump
[^start.zip]


was (Author: gustavonalle):
Tried running the reproducer using the PR linked and I am having issues starting the servers: very often they cannot start and hung with errors:
{noformat}
2016-05-12 14:33:14,598 ERROR [org.jgroups.protocols.UNICAST3] (OOB-20,server2) JGRP000039: server2: failed to deliver OOB message [dst: server2, src: server0 (4 headers), size=151 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER]: java.lang.NullPointerException
...
2016-05-12 14:33:14,598 ERROR [org.jgroups.protocols.UNICAST3] (OOB-18,server2) JGRP000039: server2: failed to deliver OOB message [dst: server2, src: server1 (4 headers), size=151 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER]: java.lang.NullPointerException
...
2016-05-12 14:33:20,598 ERROR [org.infinispan.CLUSTER] (transport-thread--p4-t1) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: Replication timeout for server0
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1907)
        at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:75)
        at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:578)
        at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:448)
        at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:365)
        at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$328(ClusterTopologyManagerImpl.java:717)
        at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$$Lambda$41/1877974992.call(Unknown Source)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.infinispan.executors.SemaphoreCompletionService$QueueingTask.runInternal(SemaphoreCompletionService.java:172)
        at org.infinispan.executors.SemaphoreCompletionService$QueueingTask.run(SemaphoreCompletionService.java:151)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2016-05-12 14:33:20,599 FATAL [org.infinispan.CLUSTER] (transport-thread--p4-t1) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [server2, server1, server0].
{noformat}

Attached are the trace logs for all servers plus thread dump

> PutAll operation in the Hot Rod client only partially completed during topology changes
> ---------------------------------------------------------------------------------------
>
>                 Key: ISPN-6599
>                 URL: https://issues.jboss.org/browse/ISPN-6599
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Server
>    Affects Versions: 9.0.0.Alpha1
>            Reporter: Gustavo Fernandes
>            Assignee: Dan Berindei
>         Attachments: reproducer.zip, start.zip, trace.zip
>
>


--
This message was sent by Atlassian JIRA
(v6.4.11#64026)