]
Dan Berindei commented on ISPN-5869:
------------------------------------
Thanks for the logs [~bdecoste]. The SocketConnectionTimeoutTest results look good.
I'm afraid the logs from node1 and node2 are too short however, so I can't say
whether there were problems during shutdown.
The main reason for the timeout in your latest run, seems to be that writing the state on
the joiners is too slow, and that's likely because of all the extra logging. E.g.
between {{08:06:55}} and {{08:07:19}}, node3 only manages to write 3980 cache entries for
the {{default-host/sfsbTest}} cache, out of a first batch of 10k.
I did find a problem with the default JGroups configuration in EAP that I have missed
before: the OOB thread pool has its queue enabled, and because of that Infinispan never
uses more than 20 OOB threads to process remote commands. I'm not sure if it was a
factor here, but I strongly suggest using
{{oob_thread_pool.queue_enabled="false"}} in your configuration, both for TCP
and for UDP.
I would also recommend using {{<state-transfer chunk-size="512"/>}} in
your Infinispan configuration. We've come to realize that the default of 10000 we had
before is too big, and it slows down state transfer (e.g. the batch of entries received by
node3 is 10MB, which has to be split in 158 fragments).
Could you run the test one more time on your machine, with
{{oob_thread_pool.queue_enabled="false"}}, {{chunk-size="512"}}, and
with DEBUG logging for everything except {{org.jgroups}}, {{org.infinispan.topology}},
{{org.infinispan.remoting}}, and {{org.infinispan.statetransfer}} (TRACE)?
Timeout in initial state replication when EAP cluster scaled up/down
--------------------------------------------------------------------
Key: ISPN-5869
URL:
https://issues.jboss.org/browse/ISPN-5869
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 5.2.11.Final
Reporter: William DeCoste
Assignee: Dan Berindei
Attachments: down.sh, haproxy.cfg.down, haproxy.cfg.up, node1.log.gz,
node2.log.gz, node3.log.gz, node4.log.gz, node5.log.gz, server.log,
SocketConnectionTimeoutTest.java, up.sh
When scaling an EAP cluster between 1 and 5 active instances with active traffic the
below exception occurs fairly frequently which results in a dead EAP instance. This issue
also results in a dead Pod when running in OpenShift. Active traffic is created state
replication for HTTPSession and SFSB.
13:27:42,553 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 66)
MSC000001: Failed to start service jboss.infinispan.web.default-host/sfsbTest:
org.jboss.msc.service.StartException in service
jboss.infinispan.web.default-host/sfsbTest: org.infinispan.CacheException: Unable to
invoke method public void
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete()
throws java.lang.InterruptedException on object of type StateTransferManagerImpl
at org.jboss.as.clustering.msc.AsynchronousService$1.run(AsynchronousService.java:91)
[jboss-as-clustering-common-7.5.0.Final-redhat-21.jar:7.5.0.Final-redhat-21]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[rt.jar:1.7.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[rt.jar:1.7.0_45]
at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_45]
at org.jboss.threads.JBossThread.run(JBossThread.java:122)
[jboss-threads-2.1.2.Final-redhat-1.jar:2.1.2.Final-redhat-1]
Caused by: org.infinispan.CacheException: Unable to invoke method public void
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete()
throws java.lang.InterruptedException on object of type StateTransferManagerImpl
at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
at
org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:886)
at
org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:657)
at
org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:646)
at
org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:549)
at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:217)
at org.infinispan.CacheImpl.start(CacheImpl.java:582)
at
org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:686)
at
org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:649)
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:545)
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:559)
at
org.jboss.as.clustering.infinispan.DefaultCacheContainer.getCache(DefaultCacheContainer.java:113)
at
org.jboss.as.clustering.infinispan.DefaultCacheContainer.getCache(DefaultCacheContainer.java:104)
at
org.jboss.as.clustering.infinispan.subsystem.CacheService.start(CacheService.java:78)
at org.jboss.as.clustering.msc.AsynchronousService$1.run(AsynchronousService.java:86)
[jboss-as-clustering-common-7.5.0.Final-redhat-21.jar:7.5.0.Final-redhat-21]
... 4 more
Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache
default-host/sfsbTest on node2/web
at
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:216)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_45]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
[rt.jar:1.7.0_45]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[rt.jar:1.7.0_45]
at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_45]
at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
... 18 more
13:27:42,559 ERROR [org.jboss.as.controller.management-operation] (Controller Boot
Thread) JBAS014612: Operation ("deploy") failed - address:
([("deployment" => "sfsbTest.war")]) - failure description:
{"JBAS014671: Failed services" =>
{"jboss.infinispan.web.default-host/sfsbTest" =>
"org.jboss.msc.service.StartException in service
jboss.infinispan.web.default-host/sfsbTest: org.infinispan.CacheException: Unable to
invoke method public void
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete()
throws java.lang.InterruptedException on object of type StateTransferManagerImpl
Caused by: org.infinispan.CacheException: Unable to invoke method public void
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete()
throws java.lang.InterruptedException on object of type StateTransferManagerImpl
Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache
default-host/sfsbTest on node2/web"}}
13:27:42,606 INFO [org.jboss.as.server] (ServerService Thread Pool -- 35) JBAS015859:
Deployed "sfsbTest.war" (runtime-name : "sfsbTest.war")
13:27:42,607 INFO [org.jboss.as.controller] (Controller Boot Thread) JBAS014774: Service
status report
JBAS014777: Services which failed to start: service
jboss.infinispan.web.default-host/sfsbTest: org.jboss.msc.service.StartException in
service jboss.infinispan.web.default-host/sfsbTest: org.infinispan.CacheException: Unable
to invoke method public void
org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete()
throws java.lang.InterruptedException on object of type StateTransferManagerImpl
13:27:42,613 INFO [org.jboss.as] (Controller Boot Thread) JBAS015961: Http management
interface listening on
http://127.0.0.1:10000/management
13:27:42,613 INFO [org.jboss.as] (Controller Boot Thread) JBAS015951: Admin console
listening on
http://127.0.0.1:10000
13:27:42,613 ERROR [org.jboss.as] (Controller Boot Thread) JBAS015875: JBoss EAP 6.4.0.GA
(AS 7.5.0.Final-redhat-21) started (with errors) in 72734ms - Started 340 of 411 services
(4 services failed or missing dependencies, 126 services are lazy, passive or on-demand)
13:27:42,850 INFO [org.jboss.as.jpa] (ServerService Thread Pool -- 23) JBAS011403:
Stopping Persistence Unit Service 'sfsbTest.war#jpa-test'
13:27:42,858 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 23)
JBAS010282: Stopped sfsbTest.war#jpa-test.org.jboss.jndiTest.TestEntity cache from
hibernate container
13:27:42,872 INFO [org.infinispan.jmx.CacheJmxRegistration] (ServerService Thread Pool
-- 23) ISPN000031: MBeans were successfully registered to the platform MBean server.
13:27:42,873 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 23)
JBAS010281: Started local-query cache from hibernate container
13:27:42,873 INFO [org.hibernate.tool.hbm2ddl.SchemaExport] (ServerService Thread Pool
-- 23) HHH000227: Running hbm2ddl schema export
13:27:42,878 INFO [org.hibernate.tool.hbm2ddl.SchemaExport] (ServerService Thread Pool
-- 23) HHH000230: Schema export complete
13:27:42,894 INFO [org.jboss.as.clustering.infinispan] (MSC service thread 1-5)
JBAS010282: Stopped pending-puts cache from hibernate container
13:27:42,898 INFO [org.jboss.as.clustering.infinispan] (MSC service thread 1-5)
JBAS010282: Stopped local-query cache from hibernate container
13:27:42,899 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC
service thread 1-5) ISPN000080: Disconnecting and closing JGroups Channel
13:27:42,905 INFO [org.infinispan.eviction.PassivationManagerImpl] (ServerService Thread
Pool -- 23) ISPN000029: Passivating all entries to disk
13:27:42,906 INFO [org.infinispan.eviction.PassivationManagerImpl] (ServerService Thread
Pool -- 23) ISPN000030: Passivated 0 entries in 1 milliseconds
13:27:42,915 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 23)
JBAS010282: Stopped repl cache from web container
13:27:42,918 INFO [org.infinispan.eviction.PassivationManagerImpl] (MSC service thread
1-4) ISPN000029: Passivating all entries to disk
13:27:42,920 INFO [org.jboss.as.server.deployment] (MSC service thread 1-6) JBAS015877:
Stopped deployment sfsbTest.war (runtime-name: sfsbTest.war) in 85ms
13:27:43,263 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC
service thread 1-5) ISPN000082: Stopping the RpcDispatcher
13:27:47,210 INFO [org.infinispan.eviction.PassivationManagerImpl] (MSC service thread
1-4) ISPN000030: Passivated 3955 entries in 4.29 seconds
13:27:47,223 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC
service thread 1-4) ISPN000080: Disconnecting and closing JGroups Channel
13:27:47,587 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC
service thread 1-4) ISPN000082: Stopping the RpcDispatcher
13:27:47,662 INFO [org.jboss.as.server] (DeploymentScanner-threads - 2) JBAS015858:
Undeployed "sfsbTest.war" (runtime-name: "sfsbTest.war")
13:27:47,664 INFO [org.jboss.as.controller] (DeploymentScanner-threads - 2) JBAS014774:
Service status report
JBAS014775: New missing/unsatisfied dependencies:
service
jboss.deployment.unit."sfsbTest.war".component.EntityTesterBean.START (missing)
dependents: [service jboss.web.deployment.default-host./sfsbTest, service
jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
service
jboss.deployment.unit."sfsbTest.war".component.StatefulBean1.START (missing)
dependents: [service jboss.web.deployment.default-host./sfsbTest, service
jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
service
jboss.deployment.unit."sfsbTest.war".component.StatelessBean1.START (missing)
dependents: [service jboss.web.deployment.default-host./sfsbTest, service
jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
service
jboss.deployment.unit."sfsbTest.war".component.StatelessBean2.START (missing)
dependents: [service jboss.web.deployment.default-host./sfsbTest, service
jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
service jboss.deployment.unit."sfsbTest.war".component.TimerBean.START
(missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service
jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
service
jboss.deployment.unit."sfsbTest.war".component."com.sun.faces.config.ConfigureListener".START
(missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service
jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
service
jboss.deployment.unit."sfsbTest.war".component."javax.faces.webapp.FacetTag".START
(missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service
jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
service
jboss.deployment.unit."sfsbTest.war".component."javax.servlet.jsp.jstl.tlv.PermittedTaglibsTLV".START
(missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service
jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
service
jboss.deployment.unit."sfsbTest.war".component."javax.servlet.jsp.jstl.tlv.ScriptFreeTLV".START
(missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service
jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
service
jboss.deployment.unit."sfsbTest.war".component."org.apache.catalina.servlets.DefaultServlet".START
(missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service
jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
service
jboss.deployment.unit."sfsbTest.war".component."org.apache.jasper.servlet.JspServlet".START
(missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service
jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
service
jboss.deployment.unit."sfsbTest.war".component."org.jboss.sfsbTest.SfsbServlet".START
(missing) dependents: [service jboss.web.deployment.default-host./sfsbTest, service
jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
service jboss.deployment.unit."sfsbTest.war".jndiDependencyService
(missing) dependents: [service jboss.web.deployment.default-host./sfsbTest]
service jboss.infinispan.web.default-host/sfsbTest (missing) dependents: [service
jboss.web.deployment.default-host./sfsbTest.session]
service jboss.persistenceunit."sfsbTest.war#jpa-test" (missing)
dependents: [service jboss.web.deployment.default-host./sfsbTest, service
jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
service jboss.web.deployment.default-host./sfsbTest (missing) dependents: [service
jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
service jboss.web.deployment.default-host./sfsbTest.realm (missing) dependents:
[service jboss.web.deployment.default-host./sfsbTest, service
jboss.deployment.unit."sfsbTest.war".deploymentCompleteService]
JBAS014777: Services which failed to start: service
jboss.infinispan.web.default-host/sfsbTest