[infinispan-issues] [JBoss JIRA] (ISPN-5869) Timeout in initial state replication when EAP cluster scaled up/down

Dan Berindei (JIRA) issues at jboss.org
Thu Oct 22 17:07:00 EDT 2015


    [ https://issues.jboss.org/browse/ISPN-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121255#comment-13121255 ] 

Dan Berindei edited comment on ISPN-5869 at 10/22/15 5:06 PM:
--------------------------------------------------------------

Indeed, node2 finished its state transfer correctly. The problem is that we can't start a new state transfer while there is another state transfer in progress. So the state transfer for node3 and the others only starts after node2 finishes, but those 30 seconds of delay are still counted against their state transfer timeout.

I have run the test on my machine, but I haven't been able to reproduce the startup failure yet - state transfer takes 20s at most. It could be because the slow part is actually the file store, and I have an SSD. 

I did see a lot of errors in the {{ab}} output, but almost all of them were {{404}} errors, caused by HaProxy sending requests to a server before the war was fully started.


was (Author: dan.berindei):
Indeed, node2 finished its state transfer correctly. The problem is that we can't start a new state transfer while there is another state transfer in progress. So the state transfer for node3 and the others only starts after node2 finishes, but those 30 seconds of delay are still counted against their state transfer timeout. I'm not sure why it's 

I have run the test on my machine, but I haven't been able to reproduce the startup failure yet - state transfer takes 20s at most. It could be because the slow part is actually the file store, and I have an SSD. 

I did see a lot of errors in the {{ab}} output, but almost all of them were {{404}} errors, caused by HaProxy sending requests to a server before the war was fully started.

> Timeout in initial state replication when EAP cluster scaled up/down
> --------------------------------------------------------------------
>
>                 Key: ISPN-5869
>                 URL: https://issues.jboss.org/browse/ISPN-5869
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 5.2.11.Final
>            Reporter: William DeCoste
>            Assignee: Dan Berindei
>         Attachments: down.sh, haproxy.cfg.down, haproxy.cfg.up, node1.log.gz, node2.log.gz, node3.log.gz, node4.log.gz, node5.log.gz, server.log, SocketConnectionTimeoutTest.java, up.sh
>
>
> When scaling an EAP cluster between 1 and 5 active instances with active traffic the below exception occurs fairly frequently which results in a dead EAP instance. This issue also results in a dead Pod when running in OpenShift. Active traffic is created state replication for HTTPSession and SFSB.
> 13:27:42,553 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 66) MSC000001: Failed to start service jboss.infinispan.web.default-host/sfsbTest: org.jboss.msc.service.StartException in service jboss.infinispan.web.default-host/sfsbTest: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
>     at org.jboss.as.clustering.msc.AsynchronousService$1.run(AsynchronousService.java:91) [jboss-as-clustering-common-7.5.0.Final-redhat-21.jar:7.5.0.Final-redhat-21]
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_45]
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_45]
>     at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_45]
>     at org.jboss.threads.JBossThread.run(JBossThread.java:122) [jboss-threads-2.1.2.Final-redhat-1.jar:2.1.2.Final-redhat-1]
> Caused by: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
>     at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
>     at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:886)
>     at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:657)
>     at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:646)
>     at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:549)
>     at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:217)
>     at org.infinispan.CacheImpl.start(CacheImpl.java:582)
>     at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:686)
>     at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:649)
>     at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:545)
>     at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:559)
>     at org.jboss.as.clustering.infinispan.DefaultCacheContainer.getCache(DefaultCacheContainer.java:113)
>     at org.jboss.as.clustering.infinispan.DefaultCacheContainer.getCache(DefaultCacheContainer.java:104)
>     at org.jboss.as.clustering.infinispan.subsystem.CacheService.start(CacheService.java:78)
>     at org.jboss.as.clustering.msc.AsynchronousService$1.run(AsynchronousService.java:86) [jboss-as-clustering-common-7.5.0.Final-redhat-21.jar:7.5.0.Final-redhat-21]
>     ... 4 more
> Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache default-host/sfsbTest on node2/web
>     at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:216)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_45]
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [rt.jar:1.7.0_45]
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_45]
>     at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_45]
>     at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
>     ... 18 more
> 13:27:42,559 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) JBAS014612: Operation ("deploy") failed - address: ([("deployment" => "sfsbTest.war")]) - failure description: {"JBAS014671: Failed services" => {"jboss.infinispan.web.default-host/sfsbTest" => "org.jboss.msc.service.StartException in service jboss.infinispan.web.default-host/sfsbTest: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
>     Caused by: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
>     Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache default-host/sfsbTest on node2/web"}}
> 13:27:42,606 INFO  [org.jboss.as.server] (ServerService Thread Pool -- 35) JBAS015859: Deployed "sfsbTest.war" (runtime-name : "sfsbTest.war")
> 13:27:42,607 INFO  [org.jboss.as.controller] (Controller Boot Thread) JBAS014774: Service status report
> JBAS014777:   Services which failed to start:      service jboss.infinispan.web.default-host/sfsbTest: org.jboss.msc.service.StartException in service jboss.infinispan.web.default-host/sfsbTest: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
> 13:27:42,613 INFO  [org.jboss.as] (Controller Boot Thread) JBAS015961: Http management interface listening on http://127.0.0.1:10000/management
> 13:27:42,613 INFO  [org.jboss.as] (Controller Boot Thread) JBAS015951: Admin console listening on http://127.0.0.1:10000
> 13:27:42,613 ERROR [org.jboss.as] (Controller Boot Thread) JBAS015875: JBoss EAP 6.4.0.GA (AS 7.5.0.Final-redhat-21) started (with errors) in 72734ms - Started 340 of 411 services (4 services failed or missing dependencies, 126 services are lazy, passive or on-demand)
> 13:27:42,850 INFO  [org.jboss.as.jpa] (ServerService Thread Pool -- 23) JBAS011403: Stopping Persistence Unit Service 'sfsbTest.war#jpa-test'
> 13:27:42,858 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 23) JBAS010282: Stopped sfsbTest.war#jpa-test.org.jboss.jndiTest.TestEntity cache from hibernate container
> 13:27:42,872 INFO  [org.infinispan.jmx.CacheJmxRegistration] (ServerService Thread Pool -- 23) ISPN000031: MBeans were successfully registered to the platform MBean server.
> 13:27:42,873 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 23) JBAS010281: Started local-query cache from hibernate container
> 13:27:42,873 INFO  [org.hibernate.tool.hbm2ddl.SchemaExport] (ServerService Thread Pool -- 23) HHH000227: Running hbm2ddl schema export
> 13:27:42,878 INFO  [org.hibernate.tool.hbm2ddl.SchemaExport] (ServerService Thread Pool -- 23) HHH000230: Schema export complete
> 13:27:42,894 INFO  [org.jboss.as.clustering.infinispan] (MSC service thread 1-5) JBAS010282: Stopped pending-puts cache from hibernate container
> 13:27:42,898 INFO  [org.jboss.as.clustering.infinispan] (MSC service thread 1-5) JBAS010282: Stopped local-query cache from hibernate container
> 13:27:42,899 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) ISPN000080: Disconnecting and closing JGroups Channel
> 13:27:42,905 INFO  [org.infinispan.eviction.PassivationManagerImpl] (ServerService Thread Pool -- 23) ISPN000029: Passivating all entries to disk
> 13:27:42,906 INFO  [org.infinispan.eviction.PassivationManagerImpl] (ServerService Thread Pool -- 23) ISPN000030: Passivated 0 entries in 1 milliseconds
> 13:27:42,915 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 23) JBAS010282: Stopped repl cache from web container
> 13:27:42,918 INFO  [org.infinispan.eviction.PassivationManagerImpl] (MSC service thread 1-4) ISPN000029: Passivating all entries to disk
> 13:27:42,920 INFO  [org.jboss.as.server.deployment] (MSC service thread 1-6) JBAS015877: Stopped deployment sfsbTest.war (runtime-name: sfsbTest.war) in 85ms
> 13:27:43,263 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) ISPN000082: Stopping the RpcDispatcher
> 13:27:47,210 INFO  [org.infinispan.eviction.PassivationManagerImpl] (MSC service thread 1-4) ISPN000030: Passivated 3955 entries in 4.29 seconds
> 13:27:47,223 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) ISPN000080: Disconnecting and closing JGroups Channel
> 13:27:47,587 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) ISPN000082: Stopping the RpcDispatcher
> 13:27:47,662 INFO  [org.jboss.as.server] (DeploymentScanner-threads - 2) JBAS015858: Undeployed "sfsbTest.war" (runtime-name: "sfsbTest.war")
> 13:27:47,664 INFO  [org.jboss.as.controller] (DeploymentScanner-threads - 2) JBAS014774: Service status report
> JBAS014775:    New missing/unsatisfied dependencies:
>       service jboss.deployment.unit."sfsbTest.war".component.EntityTesterBean.START (missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
>       service jboss.deployment.unit."sfsbTest.war".component.StatefulBean1.START (missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
>       service jboss.deployment.unit."sfsbTest.war".component.StatelessBean1.START (missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
>       service jboss.deployment.unit."sfsbTest.war".component.StatelessBean2.START (missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
>       service jboss.deployment.unit."sfsbTest.war".component.TimerBean.START (missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
>       service jboss.deployment.unit."sfsbTest.war".component."com.sun.faces.config.ConfigureListener".START (missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
>       service jboss.deployment.unit."sfsbTest.war".component."javax.faces.webapp.FacetTag".START (missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
>       service jboss.deployment.unit."sfsbTest.war".component."javax.servlet.jsp.jstl.tlv.PermittedTaglibsTLV".START (missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
>       service jboss.deployment.unit."sfsbTest.war".component."javax.servlet.jsp.jstl.tlv.ScriptFreeTLV".START (missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
>       service jboss.deployment.unit."sfsbTest.war".component."org.apache.catalina.servlets.DefaultServlet".START (missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
>       service jboss.deployment.unit."sfsbTest.war".component."org.apache.jasper.servlet.JspServlet".START (missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
>       service jboss.deployment.unit."sfsbTest.war".component."org.jboss.sfsbTest.SfsbServlet".START (missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
>       service jboss.deployment.unit."sfsbTest.war".jndiDependencyService (missing) dependents: [service jboss.web.deployment.default-host./sfsbTest]
>       service jboss.infinispan.web.default-host/sfsbTest (missing) dependents: [service jboss.web.deployment.default-host./sfsbTest.session]
>       service jboss.persistenceunit."sfsbTest.war#jpa-test" (missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
>       service jboss.web.deployment.default-host./sfsbTest (missing) dependents: [service jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
>       service jboss.web.deployment.default-host./sfsbTest.realm (missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
> JBAS014777:   Services which failed to start:      service jboss.infinispan.web.default-host/sfsbTest 



--
This message was sent by Atlassian JIRA
(v6.4.11#64026)


More information about the infinispan-issues mailing list