[
https://issues.jboss.org/browse/ISPN-6183?page=com.atlassian.jira.plugin....
]
Vladimir Dzhuvinov updated ISPN-6183:
-------------------------------------
Description:
Hi guys,
I would like to report a somewhat odd issue with initial state transfer. It was observed
in two instances - an Infinispan 7.2.5 cluster with 2 nodes and an Infinispan 7.2.5
cluster with 6 nodes. The two clusters had been running for 2 weeks, the smaller for dev
purposes with very light load - about a dozen cached objects. Upon adding an extra node an
initial state transfer exception was encountered with both clusters, after about 4 minutes
which is the default timeout setting for such situations. Several attempts were made to
add a new node, incl. one with increased timeout (10 mins), but state transfer would still
not complete, and throw an exception:
{code:java}
"message": "Unable to invoke method public void
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete()
throws java.lang.Exception on object of type StateTransferManagerImpl",
"name": "org.infinispan.commons.CacheException",
"cause": {
"commonElementCount": 25,
"localizedMessage": "Initial state transfer timed out for cache
authzStore.codeMap on ip-10-180-242-223-40643",
"message": "Initial state transfer timed out for cache
authzStore.codeMap on ip-10-180-242-223-40643",
"name": "org.infinispan.commons.CacheException",
"extendedStackTrace": [
{
"class":
"org.infinispan.statetransfer.StateTransferManagerImpl",
"method": "waitForInitialStateTransferToComplete",
"file": "StateTransferManagerImpl.java",
"line": 222,
"exact": false,
"location": "StateTransferManagerImpl.class",
"version": "?"
},
{code}
The JMX console reported "stateTransferInProgress=true" and
"joinComplete=true".
The original clusters where then shut down and started again together with the new node,
after which the clusters were successfully formed.
Attached is the exception stack trace and the JGroups config (based on the stock S3
ping).
was:
Hi guys,
I would like to report a somewhat odd issue with initial state transfer. It was observed
in two instances - an Infinispan 7.2.5 cluster with 2 nodes and an Infinispan 7.2.5
cluster with 6 nodes. The two clusters have been running for about a month, the smaller
for dev purposes with very light load - about a dozen cached objects. Upon adding an extra
node an initial state transfer exception was encountered with both clusters, after about 4
minutes which is the default timeout setting for such situations. Several attempts were
made to add a new node, incl. one with increased timeout (10 mins), but state transfer
would still not complete, and throw an exception:
{code:java}
"message": "Unable to invoke method public void
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete()
throws java.lang.Exception on object of type StateTransferManagerImpl",
"name": "org.infinispan.commons.CacheException",
"cause": {
"commonElementCount": 25,
"localizedMessage": "Initial state transfer timed out for cache
authzStore.codeMap on ip-10-180-242-223-40643",
"message": "Initial state transfer timed out for cache
authzStore.codeMap on ip-10-180-242-223-40643",
"name": "org.infinispan.commons.CacheException",
"extendedStackTrace": [
{
"class":
"org.infinispan.statetransfer.StateTransferManagerImpl",
"method": "waitForInitialStateTransferToComplete",
"file": "StateTransferManagerImpl.java",
"line": 222,
"exact": false,
"location": "StateTransferManagerImpl.class",
"version": "?"
},
{code}
The JMX console reported "stateTransferInProgress=true" and
"joinComplete=true".
The original clusters where then shut down and started again together with the new node,
after which the clusters were successfully formed.
Attached is the exception stack trace and the JGroups config (based on the stock S3
ping).
Initial state transfer fails with unexpected timeout
----------------------------------------------------
Key: ISPN-6183
URL:
https://issues.jboss.org/browse/ISPN-6183
Project: Infinispan
Issue Type: Bug
Components: State Transfer
Affects Versions: 7.2.5.Final
Environment: Java 7 on AWS EC2
Reporter: Vladimir Dzhuvinov
Attachments: default-jgroups-s3ping.xml, state-transfer-timeout-stack-trace.txt
Hi guys,
I would like to report a somewhat odd issue with initial state transfer. It was observed
in two instances - an Infinispan 7.2.5 cluster with 2 nodes and an Infinispan 7.2.5
cluster with 6 nodes. The two clusters had been running for 2 weeks, the smaller for dev
purposes with very light load - about a dozen cached objects. Upon adding an extra node an
initial state transfer exception was encountered with both clusters, after about 4 minutes
which is the default timeout setting for such situations. Several attempts were made to
add a new node, incl. one with increased timeout (10 mins), but state transfer would still
not complete, and throw an exception:
{code:java}
"message": "Unable to invoke method public void
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete()
throws java.lang.Exception on object of type StateTransferManagerImpl",
"name": "org.infinispan.commons.CacheException",
"cause": {
"commonElementCount": 25,
"localizedMessage": "Initial state transfer timed out for cache
authzStore.codeMap on ip-10-180-242-223-40643",
"message": "Initial state transfer timed out for cache
authzStore.codeMap on ip-10-180-242-223-40643",
"name": "org.infinispan.commons.CacheException",
"extendedStackTrace": [
{
"class":
"org.infinispan.statetransfer.StateTransferManagerImpl",
"method": "waitForInitialStateTransferToComplete",
"file": "StateTransferManagerImpl.java",
"line": 222,
"exact": false,
"location": "StateTransferManagerImpl.class",
"version": "?"
},
{code}
The JMX console reported "stateTransferInProgress=true" and
"joinComplete=true".
The original clusters where then shut down and started again together with the new node,
after which the clusters were successfully formed.
Attached is the exception stack trace and the JGroups config (based on the stock S3
ping).
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)