[infinispan-issues] [JBoss JIRA] (ISPN-6721) SuspectException when stopping and starting nodes in an embedded cluster using invalidation

Karl von Randow (JIRA) issues at jboss.org
Thu May 26 23:14:00 EDT 2016


     [ https://issues.jboss.org/browse/ISPN-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl von Randow updated ISPN-6721:
----------------------------------
    Description: 
We run a cluster of four app servers on Tomcat with embedded Infinispan for Hibernate L2. When we bring down one of the app servers it shuts down that cache and then exits. On other nodes we frequently, although not consistently, receive SuspectExceptions from other nodes stating that the cache isn't running on the shutting down node. Similarly when starting a new app server we get the same.

This feels like a race condition during startup and shutdown of the caches. Does that sound likely?

Here is a shutdown exception:
{code:java}
May 27, 2016 3:05:01 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [default] in context with path [] threw exception
org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node app1-12786
        at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:46)
        at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:798)
        at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
        at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
        at org.infinispan.remoting.transport.jgroups.RspListFuture.futureDone(RspListFuture.java:31)
        at org.jgroups.blocks.Request.checkCompletion(Request.java:162)
        at org.jgroups.blocks.GroupRequest.viewChange(GroupRequest.java:250)
        at org.jgroups.blocks.RequestCorrelator.receiveView(RequestCorrelator.java:316)
        at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:229)
        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:695)
        at org.jgroups.JChannel.up(JChannel.java:738)
        at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
        at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
        at org.jgroups.protocols.FlowControl.up(FlowControl.java:392)
        at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:733)
        at org.jgroups.protocols.pbcast.ParticipantGmsImpl.handleViewChange(ParticipantGmsImpl.java:140)
        at org.jgroups.protocols.pbcast.GMS.up(GMS.java:923)
        at org.jgroups.stack.Protocol.up(Protocol.java:417)
        at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:294)
        at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:474)
        at org.jgroups.protocols.pbcast.NAKACK2.deliverBatch(NAKACK2.java:982)
        at org.jgroups.protocols.pbcast.NAKACK2.removeAndPassUp(NAKACK2.java:912)
        at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:846)
        at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:618)
        at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
        at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
        at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:310)
        at org.jgroups.protocols.MERGE3.up(MERGE3.java:285)
        at org.jgroups.protocols.Discovery.up(Discovery.java:296)
        at org.jgroups.protocols.TP.passMessageUp(TP.java:1590)
        at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1802)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}

And a startup exception:
{code:java}
org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node app1-46933
        at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:46)
        at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:795)
        at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
        at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
        at org.infinispan.remoting.transport.jgroups.RspListFuture.futureDone(RspListFuture.java:31)
        at org.jgroups.blocks.Request.checkCompletion(Request.java:162)
        at org.jgroups.blocks.GroupRequest.receiveResponse(GroupRequest.java:136)
        at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:373)
        at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:237)
        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:695)
        at org.jgroups.JChannel.up(JChannel.java:738)
        at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
        at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
        at org.jgroups.protocols.FlowControl.up(FlowControl.java:392)
        at org.jgroups.protocols.pbcast.GMS.up(GMS.java:1043)
        at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
        at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1064)
        at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:779)
        at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:426)
        at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:652)
        at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
        at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
        at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:310)
        at org.jgroups.protocols.MERGE3.up(MERGE3.java:285)
        at org.jgroups.protocols.Discovery.up(Discovery.java:296)
        at org.jgroups.protocols.TP.passMessageUp(TP.java:1590)
        at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1802)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}

  was:
We run a cluster of four app servers on Tomcat with embedded Infinispan for Hibernate L2. When we bring down one of the app servers it shuts down that cache and then exits. On other nodes we frequently, although not consistently, receive SuspectExceptions from other nodes stating that the cache isn't running on the shutting down node. Similarly when starting a new app server we get the same.

This feels like a race condition during startup and shutdown of the caches. Does that sound likely?

Here is a shutdown exception:
{code:java}
May 27, 2016 3:05:01 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [default] in context with path [] threw exception
org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node app1-12786
        at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:46)
        at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:798)
        at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
        at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
        at org.infinispan.remoting.transport.jgroups.RspListFuture.futureDone(RspListFuture.java:31)
        at org.jgroups.blocks.Request.checkCompletion(Request.java:162)
        at org.jgroups.blocks.GroupRequest.viewChange(GroupRequest.java:250)
        at org.jgroups.blocks.RequestCorrelator.receiveView(RequestCorrelator.java:316)
        at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:229)
        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:695)
        at org.jgroups.JChannel.up(JChannel.java:738)
        at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
        at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
        at org.jgroups.protocols.FlowControl.up(FlowControl.java:392)
        at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:733)
        at org.jgroups.protocols.pbcast.ParticipantGmsImpl.handleViewChange(ParticipantGmsImpl.java:140)
        at org.jgroups.protocols.pbcast.GMS.up(GMS.java:923)
        at org.jgroups.stack.Protocol.up(Protocol.java:417)
        at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:294)
        at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:474)
        at org.jgroups.protocols.pbcast.NAKACK2.deliverBatch(NAKACK2.java:982)
        at org.jgroups.protocols.pbcast.NAKACK2.removeAndPassUp(NAKACK2.java:912)
        at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:846)
        at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:618)
        at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
        at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
        at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:310)
        at org.jgroups.protocols.MERGE3.up(MERGE3.java:285)
        at org.jgroups.protocols.Discovery.up(Discovery.java:296)
        at org.jgroups.protocols.TP.passMessageUp(TP.java:1590)
        at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1802)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code:java}

And a startup exception:
{code:java}
org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node app1-46933
        at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:46)
        at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:795)
        at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
        at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
        at org.infinispan.remoting.transport.jgroups.RspListFuture.futureDone(RspListFuture.java:31)
        at org.jgroups.blocks.Request.checkCompletion(Request.java:162)
        at org.jgroups.blocks.GroupRequest.receiveResponse(GroupRequest.java:136)
        at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:373)
        at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:237)
        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:695)
        at org.jgroups.JChannel.up(JChannel.java:738)
        at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
        at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
        at org.jgroups.protocols.FlowControl.up(FlowControl.java:392)
        at org.jgroups.protocols.pbcast.GMS.up(GMS.java:1043)
        at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
        at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1064)
        at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:779)
        at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:426)
        at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:652)
        at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
        at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
        at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:310)
        at org.jgroups.protocols.MERGE3.up(MERGE3.java:285)
        at org.jgroups.protocols.Discovery.up(Discovery.java:296)
        at org.jgroups.protocols.TP.passMessageUp(TP.java:1590)
        at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1802)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code:java}



> SuspectException when stopping and starting nodes in an embedded cluster using invalidation
> -------------------------------------------------------------------------------------------
>
>                 Key: ISPN-6721
>                 URL: https://issues.jboss.org/browse/ISPN-6721
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 8.2.2.Final
>            Reporter: Karl von Randow
>
> We run a cluster of four app servers on Tomcat with embedded Infinispan for Hibernate L2. When we bring down one of the app servers it shuts down that cache and then exits. On other nodes we frequently, although not consistently, receive SuspectExceptions from other nodes stating that the cache isn't running on the shutting down node. Similarly when starting a new app server we get the same.
> This feels like a race condition during startup and shutdown of the caches. Does that sound likely?
> Here is a shutdown exception:
> {code:java}
> May 27, 2016 3:05:01 PM org.apache.catalina.core.StandardWrapperValve invoke
> SEVERE: Servlet.service() for servlet [default] in context with path [] threw exception
> org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node app1-12786
>         at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:46)
>         at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:798)
>         at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
>         at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>         at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>         at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
>         at org.infinispan.remoting.transport.jgroups.RspListFuture.futureDone(RspListFuture.java:31)
>         at org.jgroups.blocks.Request.checkCompletion(Request.java:162)
>         at org.jgroups.blocks.GroupRequest.viewChange(GroupRequest.java:250)
>         at org.jgroups.blocks.RequestCorrelator.receiveView(RequestCorrelator.java:316)
>         at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:229)
>         at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:695)
>         at org.jgroups.JChannel.up(JChannel.java:738)
>         at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
>         at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
>         at org.jgroups.protocols.FlowControl.up(FlowControl.java:392)
>         at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:733)
>         at org.jgroups.protocols.pbcast.ParticipantGmsImpl.handleViewChange(ParticipantGmsImpl.java:140)
>         at org.jgroups.protocols.pbcast.GMS.up(GMS.java:923)
>         at org.jgroups.stack.Protocol.up(Protocol.java:417)
>         at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:294)
>         at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:474)
>         at org.jgroups.protocols.pbcast.NAKACK2.deliverBatch(NAKACK2.java:982)
>         at org.jgroups.protocols.pbcast.NAKACK2.removeAndPassUp(NAKACK2.java:912)
>         at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:846)
>         at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:618)
>         at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
>         at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
>         at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:310)
>         at org.jgroups.protocols.MERGE3.up(MERGE3.java:285)
>         at org.jgroups.protocols.Discovery.up(Discovery.java:296)
>         at org.jgroups.protocols.TP.passMessageUp(TP.java:1590)
>         at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1802)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> And a startup exception:
> {code:java}
> org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node app1-46933
>         at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:46)
>         at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:795)
>         at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
>         at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>         at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>         at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
>         at org.infinispan.remoting.transport.jgroups.RspListFuture.futureDone(RspListFuture.java:31)
>         at org.jgroups.blocks.Request.checkCompletion(Request.java:162)
>         at org.jgroups.blocks.GroupRequest.receiveResponse(GroupRequest.java:136)
>         at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:373)
>         at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:237)
>         at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:695)
>         at org.jgroups.JChannel.up(JChannel.java:738)
>         at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
>         at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
>         at org.jgroups.protocols.FlowControl.up(FlowControl.java:392)
>         at org.jgroups.protocols.pbcast.GMS.up(GMS.java:1043)
>         at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
>         at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1064)
>         at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:779)
>         at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:426)
>         at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:652)
>         at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
>         at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
>         at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:310)
>         at org.jgroups.protocols.MERGE3.up(MERGE3.java:285)
>         at org.jgroups.protocols.Discovery.up(Discovery.java:296)
>         at org.jgroups.protocols.TP.passMessageUp(TP.java:1590)
>         at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1802)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.11#64026)


More information about the infinispan-issues mailing list