[infinispan-issues] [JBoss JIRA] (ISPN-6183) Initial state transfer fails with unexpected timeout

Fri Feb 5 05:44:00 EST 2016

     [ https://issues.jboss.org/browse/ISPN-6183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Dzhuvinov updated ISPN-6183:
-------------------------------------
    Description: 
Hi guys,

I would like to report a somewhat odd issue with initial state transfer. It was observed in two instances - an Infinispan 7.2.5 cluster with 2 nodes and an Infinispan 7.2.5 cluster with 6 nodes. The two clusters have been running for about a month, the smaller for dev purposes with very light load - about a dozen cached objects. Upon adding an extra node an initial state transfer exception was encountered with both clusters, after about 4 minutes which is the default timeout setting for such situations. Several attempts were made to add a new node, incl. one with increased timeout (10 mins), but state transfer would still not complete, a throw an exception:

{code:java}
"message": "Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.Exception on object of type StateTransferManagerImpl",
      "name": "org.infinispan.commons.CacheException",
      "cause": {
        "commonElementCount": 25,
        "localizedMessage": "Initial state transfer timed out for cache authzStore.codeMap on ip-10-180-242-223-40643",
        "message": "Initial state transfer timed out for cache authzStore.codeMap on ip-10-180-242-223-40643",
        "name": "org.infinispan.commons.CacheException",
        "extendedStackTrace": [
          {
            "class": "org.infinispan.statetransfer.StateTransferManagerImpl",
            "method": "waitForInitialStateTransferToComplete",
            "file": "StateTransferManagerImpl.java",
            "line": 222,
            "exact": false,
            "location": "StateTransferManagerImpl.class",
            "version": "?"
          },
{code}

The JMX console reported "stateTransferInProgress=true" and "joinComplete=true".

The original clusters where then shut down and started again together with the new node, after which the clusters were successfully formed.

Attached is the exception stack trace and the JGroups config (based on the stock S3 ping).

  was:
Hi guys,

I would like to report a somewhat odd issue with initial state transfer. It was observed in two instances - an Infinispan 7.2.5 cluster with 2 nodes and an Infinispan 7.2.5 cluster with 6 nodes. The two clusters have been running for about a month, the smaller for dev purposes with very light load - about a dozen cached objects. Upon adding a third node an initial state transfer exception was encountered with both clusters, after about 4 minutes which is the default timeout setting for such situations. Several attempts were made to add a new node, incl. one with increased timeout (10 mins), but state transfer would still not complete, a throw an exception:

{code:java}
"message": "Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.Exception on object of type StateTransferManagerImpl",
      "name": "org.infinispan.commons.CacheException",
      "cause": {
        "commonElementCount": 25,
        "localizedMessage": "Initial state transfer timed out for cache authzStore.codeMap on ip-10-180-242-223-40643",
        "message": "Initial state transfer timed out for cache authzStore.codeMap on ip-10-180-242-223-40643",
        "name": "org.infinispan.commons.CacheException",
        "extendedStackTrace": [
          {
            "class": "org.infinispan.statetransfer.StateTransferManagerImpl",
            "method": "waitForInitialStateTransferToComplete",
            "file": "StateTransferManagerImpl.java",
            "line": 222,
            "exact": false,
            "location": "StateTransferManagerImpl.class",
            "version": "?"
          },
{code}

The JMX console reported "stateTransferInProgress=true" and "joinComplete=true".

The original clusters where then shut down and started again together with the new node, after which the clusters were successfully formed.

Attached is the exception stack trace and the JGroups config (based on the stock S3 ping).

> Initial state transfer fails with unexpected timeout
> ----------------------------------------------------
>
>                 Key: ISPN-6183
>                 URL: https://issues.jboss.org/browse/ISPN-6183
>             Project: Infinispan
>          Issue Type: Bug
>          Components: State Transfer
>    Affects Versions: 7.2.5.Final
>         Environment: Java 7 on AWS EC2
>            Reporter: Vladimir Dzhuvinov
>         Attachments: default-jgroups-s3ping.xml, state-transfer-timeout-stack-trace.txt
>
>
> Hi guys,
> I would like to report a somewhat odd issue with initial state transfer. It was observed in two instances - an Infinispan 7.2.5 cluster with 2 nodes and an Infinispan 7.2.5 cluster with 6 nodes. The two clusters have been running for about a month, the smaller for dev purposes with very light load - about a dozen cached objects. Upon adding an extra node an initial state transfer exception was encountered with both clusters, after about 4 minutes which is the default timeout setting for such situations. Several attempts were made to add a new node, incl. one with increased timeout (10 mins), but state transfer would still not complete, a throw an exception:
> {code:java}
> "message": "Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.Exception on object of type StateTransferManagerImpl",
>       "name": "org.infinispan.commons.CacheException",
>       "cause": {
>         "commonElementCount": 25,
>         "localizedMessage": "Initial state transfer timed out for cache authzStore.codeMap on ip-10-180-242-223-40643",
>         "message": "Initial state transfer timed out for cache authzStore.codeMap on ip-10-180-242-223-40643",
>         "name": "org.infinispan.commons.CacheException",
>         "extendedStackTrace": [
>           {
>             "class": "org.infinispan.statetransfer.StateTransferManagerImpl",
>             "method": "waitForInitialStateTransferToComplete",
>             "file": "StateTransferManagerImpl.java",
>             "line": 222,
>             "exact": false,
>             "location": "StateTransferManagerImpl.class",
>             "version": "?"
>           },
> {code}
> The JMX console reported "stateTransferInProgress=true" and "joinComplete=true".
> The original clusters where then shut down and started again together with the new node, after which the clusters were successfully formed.
> Attached is the exception stack trace and the JGroups config (based on the stock S3 ping).

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)