[JBoss JIRA] (HRJS-17) Failover not happening
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/HRJS-17?page=com.atlassian.jira.plugin.sy... ]
Galder Zamarreño updated HRJS-17:
---------------------------------
Affects Version/s: 0.2.0
> Failover not happening
> ----------------------
>
> Key: HRJS-17
> URL: https://issues.jboss.org/browse/HRJS-17
> Project: Infinispan Javascript client
> Issue Type: Bug
> Affects Versions: 0.2.0
> Reporter: Gustavo Fernandes
> Assignee: Galder Zamarreño
> Fix For: 0.3.0
>
>
> I have a cluster of 3 nodes (127.0.0.1:11422, 127.0.0.1:11222, 127.0.0.1:11322) and connect to it using:
> {code}
> var connected = infinispan.client({port: 11222, host: 127.0.0.1});
> {code}
> Client works normally, until I try to kill one server at a time, leaving just one in the cluster, but apparently when I kill the server the client is connected to, it does not failover:
> Initial Request:
> {noformat}
> [2016-05-27 10:13:06.947] [DEBUG] connection - Connecting to 127.0.0.1:11222
> [2016-05-27 10:13:06.951] [DEBUG] connection - Connected to 127.0.0.1:11222
> [2016-05-27 10:13:06.952] [DEBUG] client - Invoke ping(msgId=1)
> [2016-05-27 10:13:06.952] [TRACE] encoder - Encode operation with topology id 0
> [2016-05-27 10:13:06.954] [TRACE] transport - Write buffer(msgId=1) to 127.0.0.1:11222
> [2016-05-27 10:13:07.009] [TRACE] decoder - Read header(msgId=1): opCode=24, status=0, hasNewTopology=1
> [2016-05-27 10:13:07.012] [DEBUG] transport - New topology(id=4) discovered: [127.0.0.1:11422,127.0.0.1:11222,127.0.0.1:11322]
> [2016-05-27 10:13:07.012] [DEBUG] transport - Removed server: none
> [2016-05-27 10:13:07.012] [DEBUG] transport - Added servers: [127.0.0.1:11422,127.0.0.1:11322]
> [2016-05-27 10:13:07.012] [DEBUG] transport - Connected servers: [127.0.0.1:11222]
> [2016-05-27 10:13:07.013] [TRACE] decoder - Call decode for request(msgId=1)
> [2016-05-27 10:13:07.013] [TRACE] connection - After decoding request(msgId=1), buffer size is 105, and offset 105
> [2016-05-27 10:13:07.013] [DEBUG] connection - Connecting to 127.0.0.1:11422
> [2016-05-27 10:13:07.013] [DEBUG] connection - Connecting to 127.0.0.1:11322
> [2016-05-27 10:13:07.014] [DEBUG] connection - Connected to 127.0.0.1:11422
> [2016-05-27 10:13:07.014] [DEBUG] connection - Connected to 127.0.0.1:11322
> [2016-05-27 10:13:07.015] [TRACE] connection - Complete success for request(msgId=1) with undefined
> [2016-05-27 10:13:14.918] [DEBUG] client - Invoke get(msgId=2,key=Hodor)
> [2016-05-27 10:13:14.919] [TRACE] encoder - Encode operation with topology id 4
> [2016-05-27 10:13:14.921] [TRACE] transport - Write buffer(msgId=2) to 127.0.0.1:11322
> [2016-05-27 10:13:14.948] [TRACE] decoder - Read header(msgId=2): opCode=4, status=2, hasNewTopology=0
> [2016-05-27 10:13:14.948] [TRACE] decoder - Call decode for request(msgId=2)
> [2016-05-27 10:13:14.949] [TRACE] connection - After decoding request(msgId=2), buffer size is 5, and offset 5
> [2016-05-27 10:13:14.949] [TRACE] connection - Complete success for request(msgId=2) with undefined
> [2016-05-27 10:13:14.953] [DEBUG] client - Invoke put(msgId=4,key=Hodor,value=Fri May 27 2016 10:13:14 GMT+0100 (BST),opts=undefined)
> [2016-05-27 10:13:14.953] [TRACE] encoder - Encode operation with topology id 4
> [2016-05-27 10:13:14.953] [TRACE] transport - Write buffer(msgId=4) to 127.0.0.1:11322
> [2016-05-27 10:13:14.960] [TRACE] decoder - Read header(msgId=4): opCode=2, status=0, hasNewTopology=0
> [2016-05-27 10:13:14.960] [TRACE] decoder - Call decode for request(msgId=4)
> [2016-05-27 10:13:14.960] [TRACE] connection - After decoding request(msgId=4), buffer size is 5, and offset 5
> [2016-05-27 10:13:14.960] [TRACE] connection - Complete success for request(msgId=4) with undefined
> {noformat}
> Killing server 127.0.0.1:11222:
> {noformat}
> [2016-05-27 10:13:36.707] [DEBUG] connection - Disconnected from 127.0.0.1:11222
> [2016-05-27 10:13:46.616] [DEBUG] client - Invoke get(msgId=6,key=Hodor)
> [2016-05-27 10:13:46.616] [TRACE] encoder - Encode operation with topology id 4
> [2016-05-27 10:13:46.617] [TRACE] transport - Write buffer(msgId=6) to 127.0.0.1:11322
> [2016-05-27 10:13:46.631] [TRACE] decoder - Read header(msgId=6): opCode=4, status=0, hasNewTopology=1
> [2016-05-27 10:13:46.632] [DEBUG] transport - New topology(id=8) discovered: [127.0.0.1:11422,127.0.0.1:11322]
> [2016-05-27 10:13:46.632] [DEBUG] transport - Removed servers are: [127.0.0.1:11222]
> [2016-05-27 10:13:46.633] [DEBUG] transport - Added servers: []
> [2016-05-27 10:13:46.633] [DEBUG] transport - Connected servers: [127.0.0.1:11422,127.0.0.1:11322]
> [2016-05-27 10:13:46.633] [TRACE] decoder - Call decode for request(msgId=6)
> [2016-05-27 10:13:46.633] [TRACE] connection - After decoding request(msgId=6), buffer size is 133, and offset 133
> [2016-05-27 10:13:46.633] [TRACE] connection - Complete success for request(msgId=6) with Fri May 27 2016 10:13:14 GMT+0100 (BST)
> {noformat}
> Killing server 127.0.0.1:11322, the client hangs:
> {noformat}
> [2016-05-27 10:14:59.623] [DEBUG] connection - Disconnected from 127.0.0.1:11322
> [2016-05-27 10:15:04.813] [DEBUG] client - Invoke get(msgId=8,key=Hodor)
> [2016-05-27 10:15:04.814] [TRACE] encoder - Encode operation with topology id 8
> [2016-05-27 10:15:04.814] [TRACE] transport - Write buffer(msgId=8) to 127.0.0.1:11322
> [2016-05-27 10:15:04.814] [ERROR] connection - Error from 127.0.0.1:11322: This socket has been ended by the other party
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 7 months
[JBoss JIRA] (HRJS-17) Failover not happening
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/HRJS-17?page=com.atlassian.jira.plugin.sy... ]
Galder Zamarreño updated HRJS-17:
---------------------------------
Fix Version/s: 0.3.0
> Failover not happening
> ----------------------
>
> Key: HRJS-17
> URL: https://issues.jboss.org/browse/HRJS-17
> Project: Infinispan Javascript client
> Issue Type: Bug
> Affects Versions: 0.2.0
> Reporter: Gustavo Fernandes
> Assignee: Galder Zamarreño
> Fix For: 0.3.0
>
>
> I have a cluster of 3 nodes (127.0.0.1:11422, 127.0.0.1:11222, 127.0.0.1:11322) and connect to it using:
> {code}
> var connected = infinispan.client({port: 11222, host: 127.0.0.1});
> {code}
> Client works normally, until I try to kill one server at a time, leaving just one in the cluster, but apparently when I kill the server the client is connected to, it does not failover:
> Initial Request:
> {noformat}
> [2016-05-27 10:13:06.947] [DEBUG] connection - Connecting to 127.0.0.1:11222
> [2016-05-27 10:13:06.951] [DEBUG] connection - Connected to 127.0.0.1:11222
> [2016-05-27 10:13:06.952] [DEBUG] client - Invoke ping(msgId=1)
> [2016-05-27 10:13:06.952] [TRACE] encoder - Encode operation with topology id 0
> [2016-05-27 10:13:06.954] [TRACE] transport - Write buffer(msgId=1) to 127.0.0.1:11222
> [2016-05-27 10:13:07.009] [TRACE] decoder - Read header(msgId=1): opCode=24, status=0, hasNewTopology=1
> [2016-05-27 10:13:07.012] [DEBUG] transport - New topology(id=4) discovered: [127.0.0.1:11422,127.0.0.1:11222,127.0.0.1:11322]
> [2016-05-27 10:13:07.012] [DEBUG] transport - Removed server: none
> [2016-05-27 10:13:07.012] [DEBUG] transport - Added servers: [127.0.0.1:11422,127.0.0.1:11322]
> [2016-05-27 10:13:07.012] [DEBUG] transport - Connected servers: [127.0.0.1:11222]
> [2016-05-27 10:13:07.013] [TRACE] decoder - Call decode for request(msgId=1)
> [2016-05-27 10:13:07.013] [TRACE] connection - After decoding request(msgId=1), buffer size is 105, and offset 105
> [2016-05-27 10:13:07.013] [DEBUG] connection - Connecting to 127.0.0.1:11422
> [2016-05-27 10:13:07.013] [DEBUG] connection - Connecting to 127.0.0.1:11322
> [2016-05-27 10:13:07.014] [DEBUG] connection - Connected to 127.0.0.1:11422
> [2016-05-27 10:13:07.014] [DEBUG] connection - Connected to 127.0.0.1:11322
> [2016-05-27 10:13:07.015] [TRACE] connection - Complete success for request(msgId=1) with undefined
> [2016-05-27 10:13:14.918] [DEBUG] client - Invoke get(msgId=2,key=Hodor)
> [2016-05-27 10:13:14.919] [TRACE] encoder - Encode operation with topology id 4
> [2016-05-27 10:13:14.921] [TRACE] transport - Write buffer(msgId=2) to 127.0.0.1:11322
> [2016-05-27 10:13:14.948] [TRACE] decoder - Read header(msgId=2): opCode=4, status=2, hasNewTopology=0
> [2016-05-27 10:13:14.948] [TRACE] decoder - Call decode for request(msgId=2)
> [2016-05-27 10:13:14.949] [TRACE] connection - After decoding request(msgId=2), buffer size is 5, and offset 5
> [2016-05-27 10:13:14.949] [TRACE] connection - Complete success for request(msgId=2) with undefined
> [2016-05-27 10:13:14.953] [DEBUG] client - Invoke put(msgId=4,key=Hodor,value=Fri May 27 2016 10:13:14 GMT+0100 (BST),opts=undefined)
> [2016-05-27 10:13:14.953] [TRACE] encoder - Encode operation with topology id 4
> [2016-05-27 10:13:14.953] [TRACE] transport - Write buffer(msgId=4) to 127.0.0.1:11322
> [2016-05-27 10:13:14.960] [TRACE] decoder - Read header(msgId=4): opCode=2, status=0, hasNewTopology=0
> [2016-05-27 10:13:14.960] [TRACE] decoder - Call decode for request(msgId=4)
> [2016-05-27 10:13:14.960] [TRACE] connection - After decoding request(msgId=4), buffer size is 5, and offset 5
> [2016-05-27 10:13:14.960] [TRACE] connection - Complete success for request(msgId=4) with undefined
> {noformat}
> Killing server 127.0.0.1:11222:
> {noformat}
> [2016-05-27 10:13:36.707] [DEBUG] connection - Disconnected from 127.0.0.1:11222
> [2016-05-27 10:13:46.616] [DEBUG] client - Invoke get(msgId=6,key=Hodor)
> [2016-05-27 10:13:46.616] [TRACE] encoder - Encode operation with topology id 4
> [2016-05-27 10:13:46.617] [TRACE] transport - Write buffer(msgId=6) to 127.0.0.1:11322
> [2016-05-27 10:13:46.631] [TRACE] decoder - Read header(msgId=6): opCode=4, status=0, hasNewTopology=1
> [2016-05-27 10:13:46.632] [DEBUG] transport - New topology(id=8) discovered: [127.0.0.1:11422,127.0.0.1:11322]
> [2016-05-27 10:13:46.632] [DEBUG] transport - Removed servers are: [127.0.0.1:11222]
> [2016-05-27 10:13:46.633] [DEBUG] transport - Added servers: []
> [2016-05-27 10:13:46.633] [DEBUG] transport - Connected servers: [127.0.0.1:11422,127.0.0.1:11322]
> [2016-05-27 10:13:46.633] [TRACE] decoder - Call decode for request(msgId=6)
> [2016-05-27 10:13:46.633] [TRACE] connection - After decoding request(msgId=6), buffer size is 133, and offset 133
> [2016-05-27 10:13:46.633] [TRACE] connection - Complete success for request(msgId=6) with Fri May 27 2016 10:13:14 GMT+0100 (BST)
> {noformat}
> Killing server 127.0.0.1:11322, the client hangs:
> {noformat}
> [2016-05-27 10:14:59.623] [DEBUG] connection - Disconnected from 127.0.0.1:11322
> [2016-05-27 10:15:04.813] [DEBUG] client - Invoke get(msgId=8,key=Hodor)
> [2016-05-27 10:15:04.814] [TRACE] encoder - Encode operation with topology id 8
> [2016-05-27 10:15:04.814] [TRACE] transport - Write buffer(msgId=8) to 127.0.0.1:11322
> [2016-05-27 10:15:04.814] [ERROR] connection - Error from 127.0.0.1:11322: This socket has been ended by the other party
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 7 months
[JBoss JIRA] (HRJS-17) Failover not happening
by Gustavo Fernandes (JIRA)
Gustavo Fernandes created HRJS-17:
-------------------------------------
Summary: Failover not happening
Key: HRJS-17
URL: https://issues.jboss.org/browse/HRJS-17
Project: Infinispan Javascript client
Issue Type: Bug
Reporter: Gustavo Fernandes
I have a cluster of 3 nodes (127.0.0.1:11422, 127.0.0.1:11222, 127.0.0.1:11322) and connect to it using:
{code}
var connected = infinispan.client({port: 11222, host: 127.0.0.1});
{code}
Client works normally, until I try to kill one server at a time, leaving just one in the cluster, but apparently when I kill the server the client is connected to, it does not failover:
Initial Request:
{noformat}
[2016-05-27 10:13:06.947] [DEBUG] connection - Connecting to 127.0.0.1:11222
[2016-05-27 10:13:06.951] [DEBUG] connection - Connected to 127.0.0.1:11222
[2016-05-27 10:13:06.952] [DEBUG] client - Invoke ping(msgId=1)
[2016-05-27 10:13:06.952] [TRACE] encoder - Encode operation with topology id 0
[2016-05-27 10:13:06.954] [TRACE] transport - Write buffer(msgId=1) to 127.0.0.1:11222
[2016-05-27 10:13:07.009] [TRACE] decoder - Read header(msgId=1): opCode=24, status=0, hasNewTopology=1
[2016-05-27 10:13:07.012] [DEBUG] transport - New topology(id=4) discovered: [127.0.0.1:11422,127.0.0.1:11222,127.0.0.1:11322]
[2016-05-27 10:13:07.012] [DEBUG] transport - Removed server: none
[2016-05-27 10:13:07.012] [DEBUG] transport - Added servers: [127.0.0.1:11422,127.0.0.1:11322]
[2016-05-27 10:13:07.012] [DEBUG] transport - Connected servers: [127.0.0.1:11222]
[2016-05-27 10:13:07.013] [TRACE] decoder - Call decode for request(msgId=1)
[2016-05-27 10:13:07.013] [TRACE] connection - After decoding request(msgId=1), buffer size is 105, and offset 105
[2016-05-27 10:13:07.013] [DEBUG] connection - Connecting to 127.0.0.1:11422
[2016-05-27 10:13:07.013] [DEBUG] connection - Connecting to 127.0.0.1:11322
[2016-05-27 10:13:07.014] [DEBUG] connection - Connected to 127.0.0.1:11422
[2016-05-27 10:13:07.014] [DEBUG] connection - Connected to 127.0.0.1:11322
[2016-05-27 10:13:07.015] [TRACE] connection - Complete success for request(msgId=1) with undefined
[2016-05-27 10:13:14.918] [DEBUG] client - Invoke get(msgId=2,key=Hodor)
[2016-05-27 10:13:14.919] [TRACE] encoder - Encode operation with topology id 4
[2016-05-27 10:13:14.921] [TRACE] transport - Write buffer(msgId=2) to 127.0.0.1:11322
[2016-05-27 10:13:14.948] [TRACE] decoder - Read header(msgId=2): opCode=4, status=2, hasNewTopology=0
[2016-05-27 10:13:14.948] [TRACE] decoder - Call decode for request(msgId=2)
[2016-05-27 10:13:14.949] [TRACE] connection - After decoding request(msgId=2), buffer size is 5, and offset 5
[2016-05-27 10:13:14.949] [TRACE] connection - Complete success for request(msgId=2) with undefined
[2016-05-27 10:13:14.953] [DEBUG] client - Invoke put(msgId=4,key=Hodor,value=Fri May 27 2016 10:13:14 GMT+0100 (BST),opts=undefined)
[2016-05-27 10:13:14.953] [TRACE] encoder - Encode operation with topology id 4
[2016-05-27 10:13:14.953] [TRACE] transport - Write buffer(msgId=4) to 127.0.0.1:11322
[2016-05-27 10:13:14.960] [TRACE] decoder - Read header(msgId=4): opCode=2, status=0, hasNewTopology=0
[2016-05-27 10:13:14.960] [TRACE] decoder - Call decode for request(msgId=4)
[2016-05-27 10:13:14.960] [TRACE] connection - After decoding request(msgId=4), buffer size is 5, and offset 5
[2016-05-27 10:13:14.960] [TRACE] connection - Complete success for request(msgId=4) with undefined
{noformat}
Killing server 127.0.0.1:11222:
{noformat}
[2016-05-27 10:13:36.707] [DEBUG] connection - Disconnected from 127.0.0.1:11222
[2016-05-27 10:13:46.616] [DEBUG] client - Invoke get(msgId=6,key=Hodor)
[2016-05-27 10:13:46.616] [TRACE] encoder - Encode operation with topology id 4
[2016-05-27 10:13:46.617] [TRACE] transport - Write buffer(msgId=6) to 127.0.0.1:11322
[2016-05-27 10:13:46.631] [TRACE] decoder - Read header(msgId=6): opCode=4, status=0, hasNewTopology=1
[2016-05-27 10:13:46.632] [DEBUG] transport - New topology(id=8) discovered: [127.0.0.1:11422,127.0.0.1:11322]
[2016-05-27 10:13:46.632] [DEBUG] transport - Removed servers are: [127.0.0.1:11222]
[2016-05-27 10:13:46.633] [DEBUG] transport - Added servers: []
[2016-05-27 10:13:46.633] [DEBUG] transport - Connected servers: [127.0.0.1:11422,127.0.0.1:11322]
[2016-05-27 10:13:46.633] [TRACE] decoder - Call decode for request(msgId=6)
[2016-05-27 10:13:46.633] [TRACE] connection - After decoding request(msgId=6), buffer size is 133, and offset 133
[2016-05-27 10:13:46.633] [TRACE] connection - Complete success for request(msgId=6) with Fri May 27 2016 10:13:14 GMT+0100 (BST)
{noformat}
Killing server 127.0.0.1:11322, the client hangs:
{noformat}
[2016-05-27 10:14:59.623] [DEBUG] connection - Disconnected from 127.0.0.1:11322
[2016-05-27 10:15:04.813] [DEBUG] client - Invoke get(msgId=8,key=Hodor)
[2016-05-27 10:15:04.814] [TRACE] encoder - Encode operation with topology id 8
[2016-05-27 10:15:04.814] [TRACE] transport - Write buffer(msgId=8) to 127.0.0.1:11322
[2016-05-27 10:15:04.814] [ERROR] connection - Error from 127.0.0.1:11322: This socket has been ended by the other party
{noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 7 months
[JBoss JIRA] (HRJS-16) Generate API docs for JS client
by Galder Zamarreño (JIRA)
Galder Zamarreño created HRJS-16:
------------------------------------
Summary: Generate API docs for JS client
Key: HRJS-16
URL: https://issues.jboss.org/browse/HRJS-16
Project: Infinispan Javascript client
Issue Type: Task
Reporter: Galder Zamarreño
We should generate API documentation for the JS client so that it is clear which API is available.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 7 months
[JBoss JIRA] (ISPN-6722) The getopt maven coordinates are broken for Maven Central
by Tristan Tarrant (JIRA)
Tristan Tarrant created ISPN-6722:
-------------------------------------
Summary: The getopt maven coordinates are broken for Maven Central
Key: ISPN-6722
URL: https://issues.jboss.org/browse/ISPN-6722
Project: Infinispan
Issue Type: Bug
Components: Build process
Affects Versions: 8.2.2.Final, 9.0.0.Alpha2
Reporter: Tristan Tarrant
Assignee: Tristan Tarrant
Fix For: 9.0.0.Alpha3, 8.2.3.Final
The build uses gnu-getopt:getopt but the proper coordinates are gnu.getopt:java-getopt
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 7 months
[JBoss JIRA] (ISPN-6721) SuspectException when stopping and starting nodes in an embedded cluster using invalidation
by Karl von Randow (JIRA)
[ https://issues.jboss.org/browse/ISPN-6721?page=com.atlassian.jira.plugin.... ]
Karl von Randow updated ISPN-6721:
----------------------------------
Description:
We run a cluster of four app servers on Tomcat with embedded Infinispan for Hibernate L2. When we bring down one of the app servers it shuts down that cache and then exits. On other nodes we frequently, although not consistently, receive SuspectExceptions from other nodes stating that the cache isn't running on the shutting down node. Similarly when starting a new app server we get the same.
We are using synchronous invalidation.
This feels like a race condition during startup and shutdown of the caches. Does that sound likely?
I note in this wiki article https://github.com/infinispan/infinispan/wiki/Consistency-guarantees-in-I... there is a note "TODO Create an issue in JIRA to ignore suspect exceptions.". I'm not sure if this is related!
There are two stacktraces below. I note that the shutdown example fails in {{JGroupsTransport.java:798}} which is when the response was suspected. While the startup example fails in {{JGroupsTransport.java:795}}, which is a {{CacheNotFoundResponse}} response.
Here is a shutdown exception:
{code:java}
May 27, 2016 3:05:01 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [default] in context with path [] threw exception
org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node app1-12786
at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:46)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:798)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
at org.infinispan.remoting.transport.jgroups.RspListFuture.futureDone(RspListFuture.java:31)
at org.jgroups.blocks.Request.checkCompletion(Request.java:162)
at org.jgroups.blocks.GroupRequest.viewChange(GroupRequest.java:250)
at org.jgroups.blocks.RequestCorrelator.receiveView(RequestCorrelator.java:316)
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:229)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:695)
at org.jgroups.JChannel.up(JChannel.java:738)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:392)
at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:733)
at org.jgroups.protocols.pbcast.ParticipantGmsImpl.handleViewChange(ParticipantGmsImpl.java:140)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:923)
at org.jgroups.stack.Protocol.up(Protocol.java:417)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:294)
at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:474)
at org.jgroups.protocols.pbcast.NAKACK2.deliverBatch(NAKACK2.java:982)
at org.jgroups.protocols.pbcast.NAKACK2.removeAndPassUp(NAKACK2.java:912)
at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:846)
at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:618)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:310)
at org.jgroups.protocols.MERGE3.up(MERGE3.java:285)
at org.jgroups.protocols.Discovery.up(Discovery.java:296)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1590)
at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1802)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
And a startup exception:
{code:java}
org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node app1-46933
at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:46)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:795)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
at org.infinispan.remoting.transport.jgroups.RspListFuture.futureDone(RspListFuture.java:31)
at org.jgroups.blocks.Request.checkCompletion(Request.java:162)
at org.jgroups.blocks.GroupRequest.receiveResponse(GroupRequest.java:136)
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:373)
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:237)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:695)
at org.jgroups.JChannel.up(JChannel.java:738)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:392)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:1043)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1064)
at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:779)
at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:426)
at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:652)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:310)
at org.jgroups.protocols.MERGE3.up(MERGE3.java:285)
at org.jgroups.protocols.Discovery.up(Discovery.java:296)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1590)
at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1802)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
was:
We run a cluster of four app servers on Tomcat with embedded Infinispan for Hibernate L2. When we bring down one of the app servers it shuts down that cache and then exits. On other nodes we frequently, although not consistently, receive SuspectExceptions from other nodes stating that the cache isn't running on the shutting down node. Similarly when starting a new app server we get the same.
This feels like a race condition during startup and shutdown of the caches. Does that sound likely?
There are two stacktraces below. I note that the shutdown example fails in {{JGroupsTransport.java:798}} which is when the response was suspected. While the startup example fails in {{JGroupsTransport.java:795}}, which is a {{CacheNotFoundResponse}} response.
Here is a shutdown exception:
{code:java}
May 27, 2016 3:05:01 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [default] in context with path [] threw exception
org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node app1-12786
at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:46)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:798)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
at org.infinispan.remoting.transport.jgroups.RspListFuture.futureDone(RspListFuture.java:31)
at org.jgroups.blocks.Request.checkCompletion(Request.java:162)
at org.jgroups.blocks.GroupRequest.viewChange(GroupRequest.java:250)
at org.jgroups.blocks.RequestCorrelator.receiveView(RequestCorrelator.java:316)
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:229)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:695)
at org.jgroups.JChannel.up(JChannel.java:738)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:392)
at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:733)
at org.jgroups.protocols.pbcast.ParticipantGmsImpl.handleViewChange(ParticipantGmsImpl.java:140)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:923)
at org.jgroups.stack.Protocol.up(Protocol.java:417)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:294)
at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:474)
at org.jgroups.protocols.pbcast.NAKACK2.deliverBatch(NAKACK2.java:982)
at org.jgroups.protocols.pbcast.NAKACK2.removeAndPassUp(NAKACK2.java:912)
at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:846)
at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:618)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:310)
at org.jgroups.protocols.MERGE3.up(MERGE3.java:285)
at org.jgroups.protocols.Discovery.up(Discovery.java:296)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1590)
at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1802)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
And a startup exception:
{code:java}
org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node app1-46933
at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:46)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:795)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
at org.infinispan.remoting.transport.jgroups.RspListFuture.futureDone(RspListFuture.java:31)
at org.jgroups.blocks.Request.checkCompletion(Request.java:162)
at org.jgroups.blocks.GroupRequest.receiveResponse(GroupRequest.java:136)
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:373)
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:237)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:695)
at org.jgroups.JChannel.up(JChannel.java:738)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:392)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:1043)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1064)
at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:779)
at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:426)
at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:652)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:310)
at org.jgroups.protocols.MERGE3.up(MERGE3.java:285)
at org.jgroups.protocols.Discovery.up(Discovery.java:296)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1590)
at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1802)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
> SuspectException when stopping and starting nodes in an embedded cluster using invalidation
> -------------------------------------------------------------------------------------------
>
> Key: ISPN-6721
> URL: https://issues.jboss.org/browse/ISPN-6721
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 8.2.2.Final
> Reporter: Karl von Randow
>
> We run a cluster of four app servers on Tomcat with embedded Infinispan for Hibernate L2. When we bring down one of the app servers it shuts down that cache and then exits. On other nodes we frequently, although not consistently, receive SuspectExceptions from other nodes stating that the cache isn't running on the shutting down node. Similarly when starting a new app server we get the same.
> We are using synchronous invalidation.
> This feels like a race condition during startup and shutdown of the caches. Does that sound likely?
> I note in this wiki article https://github.com/infinispan/infinispan/wiki/Consistency-guarantees-in-I... there is a note "TODO Create an issue in JIRA to ignore suspect exceptions.". I'm not sure if this is related!
> There are two stacktraces below. I note that the shutdown example fails in {{JGroupsTransport.java:798}} which is when the response was suspected. While the startup example fails in {{JGroupsTransport.java:795}}, which is a {{CacheNotFoundResponse}} response.
> Here is a shutdown exception:
> {code:java}
> May 27, 2016 3:05:01 PM org.apache.catalina.core.StandardWrapperValve invoke
> SEVERE: Servlet.service() for servlet [default] in context with path [] threw exception
> org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node app1-12786
> at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:46)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:798)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
> at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
> at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at org.infinispan.remoting.transport.jgroups.RspListFuture.futureDone(RspListFuture.java:31)
> at org.jgroups.blocks.Request.checkCompletion(Request.java:162)
> at org.jgroups.blocks.GroupRequest.viewChange(GroupRequest.java:250)
> at org.jgroups.blocks.RequestCorrelator.receiveView(RequestCorrelator.java:316)
> at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:229)
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:695)
> at org.jgroups.JChannel.up(JChannel.java:738)
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
> at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:392)
> at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:733)
> at org.jgroups.protocols.pbcast.ParticipantGmsImpl.handleViewChange(ParticipantGmsImpl.java:140)
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:923)
> at org.jgroups.stack.Protocol.up(Protocol.java:417)
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:294)
> at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:474)
> at org.jgroups.protocols.pbcast.NAKACK2.deliverBatch(NAKACK2.java:982)
> at org.jgroups.protocols.pbcast.NAKACK2.removeAndPassUp(NAKACK2.java:912)
> at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:846)
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:618)
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:310)
> at org.jgroups.protocols.MERGE3.up(MERGE3.java:285)
> at org.jgroups.protocols.Discovery.up(Discovery.java:296)
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1590)
> at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1802)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> And a startup exception:
> {code:java}
> org.infinispan.remoting.transport.jgroups.SuspectException: Cache not running on node app1-46933
> at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:46)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:795)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
> at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
> at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at org.infinispan.remoting.transport.jgroups.RspListFuture.futureDone(RspListFuture.java:31)
> at org.jgroups.blocks.Request.checkCompletion(Request.java:162)
> at org.jgroups.blocks.GroupRequest.receiveResponse(GroupRequest.java:136)
> at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:373)
> at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:237)
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:695)
> at org.jgroups.JChannel.up(JChannel.java:738)
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
> at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:392)
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:1043)
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
> at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1064)
> at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:779)
> at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:426)
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:652)
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:310)
> at org.jgroups.protocols.MERGE3.up(MERGE3.java:285)
> at org.jgroups.protocols.Discovery.up(Discovery.java:296)
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1590)
> at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1802)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 7 months