[JBoss JIRA] (ISPN-2802) Cache recovery fails due to missing responses
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-2802?page=com.atlassian.jira.plugin.... ]
Radim Vansa commented on ISPN-2802:
-----------------------------------
I have stable interval set to 5 seconds, so the retransmission should handle it. I don't see why here it does not.
Previously I had the GET_STATUS timeout set to 10 mins and the joiner failed due to shorter timeout (3 mins) in waitForInitialStateTransfer. But as there is no other stress in the system (I have switched stressors off to isolate the issue), even with lost messages it should be fixed quickly (one or two stable periods).
I'll try to setup some tracing on UNICAST2, then, and switch off the connection reaper (JGRP-1586 was not known when I started looking into this case), as soon as I'll be able to work in hyperion again.
> Cache recovery fails due to missing responses
> ---------------------------------------------
>
> Key: ISPN-2802
> URL: https://issues.jboss.org/browse/ISPN-2802
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 5.2.0.CR3
> Reporter: Radim Vansa
> Assignee: Mircea Markus
>
> When the cache recovery is started, the new coordinator sends CacheTopologyControlCommand.GET_STATUS to all nodes and waits for responses. However, I have a reproducible test-case where it always times out waiting for the responses.
> Here are the logs (TRACE is not doable here, but I added some byteman traces - see topology.btm in the archive): http://dl.dropbox.com/u/103079234/recovery.zip
> The problematic spot is on node3 at 05:37:57 receiving cluster view 34.
> All nodes (except the one which is killed, in this case node1) respond quickly to the GET_STATUS command (see BYTEMAN Receiving - Received pairs, these are bound to command execution in CommandAwareRpcDispatcher), but some responses are not received on node3 (look for Receiving rsp bound to GroupRequest).
> JGroups tracing could be useful here but it is not available (intensive logging often blocks on internal log4j locks and the node becomes unresponsive).
> As mentioned above, the case is reproducible, therefore if you can suggest any particular BYTEMAN hook, I can try it.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 2 months
[JBoss JIRA] (ISPN-2807) REST server war is missing configuration file config-samples/sample.xml
by Adrian Nistor (JIRA)
Adrian Nistor created ISPN-2807:
-----------------------------------
Summary: REST server war is missing configuration file config-samples/sample.xml
Key: ISPN-2807
URL: https://issues.jboss.org/browse/ISPN-2807
Project: Infinispan
Issue Type: Bug
Affects Versions: 5.2.0.Final
Reporter: Adrian Nistor
Assignee: Mircea Markus
Fix For: 5.2.1, 5.3.0.Alpha1
config-samples/sample.xml is referenced in web.xml but is not present in classpath.
This file used to be located in older versions of infinispan-core.jar but the enitre config-samples dir was removed from it by ISPN-2560.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 2 months
[JBoss JIRA] (ISPN-2730) Not possible to instantiate org.infinispan.cdi.InfinispanExtension during startup of appserver
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-2730?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-2730:
----------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request: https://github.com/infinispan/infinispan/pull/1655
> Not possible to instantiate org.infinispan.cdi.InfinispanExtension during startup of appserver
> ----------------------------------------------------------------------------------------------
>
> Key: ISPN-2730
> URL: https://issues.jboss.org/browse/ISPN-2730
> Project: Infinispan
> Issue Type: Bug
> Components: CDI integration
> Affects Versions: 5.2.0.CR1
> Environment: tested in openEJB
> java version "1.6.0_33"
> Java(TM) SE Runtime Environment (build 1.6.0_33-b03)
> Oracle JRockit(R) (build R28.2.4-14-151097-1.6.0_33-20120618-1634-windows-x86_64, compiled mode)
> Reporter: Michael Mellenthin
> Assignee: Tristan Tarrant
> Priority: Critical
> Fix For: 5.2.1, 5.3.0.Final
>
>
> I have infinispan-cdi library in my appserver to be able to use JSR-107.
> During startup of application server it loads the InfinispanExtension because the service provider is declared in META-INF/services.
> During startup I get the exception
> {code}
> Caused by: java.util.ServiceConfigurationError: javax.enterprise.inject.spi.Extension: Provider org.infinispan.cdi.InfinispanExtension could not be instantiated: java.lang.IllegalAccessException: Class java.util.ServiceLoader$LazyIterator can not access a member of class org.infinispan.cdi.InfinispanExtension with modifiers ""
> at java.util.ServiceLoader.fail(ServiceLoader.java:207)
> at java.util.ServiceLoader.access$100(ServiceLoader.java:164)
> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:353)
> at java.util.ServiceLoader$1.next(ServiceLoader.java:421)
> {code}
> The reason is that the InfinispanExtension only provides a package scoped constructor.
> For sure the documentation of java.util.ServiceLoader is not very clear here. It just says:
> *"The only requirement enforced by this facility is that provider classes must have a zero-argument constructor so that they can be instantiated during loading"*
> But the implementation fails if the constructor is not public. Hence, we need a public constructor for InfinispanExtension.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 2 months
[JBoss JIRA] (ISPN-2794) undeploy of infinispan.war doesn't stop DefaultCacheManager
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-2794?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-2794:
--------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
Integrated. Thanks!
> undeploy of infinispan.war doesn't stop DefaultCacheManager
> -----------------------------------------------------------
>
> Key: ISPN-2794
> URL: https://issues.jboss.org/browse/ISPN-2794
> Project: Infinispan
> Issue Type: Bug
> Components: Server
> Affects Versions: 5.2.0.Final
> Reporter: Michal Linhard
> Assignee: Tristan Tarrant
> Fix For: 5.2.1, 5.3.0.Alpha1
>
>
> 1.deploy infinispan.war
> 2.undeploy
> 3.deploy, and you'll get:
> {code}
> 09:41:43,436 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/infinispan]] (MSC service thread 1-1) Servlet /infinispan threw load() exception: org.infinispan.jmx.JmxDomainConflictException: Domain already registered org.infinispan when trying to register: type=CacheManager,name="DefaultCacheManager"
> at org.infinispan.jmx.JmxUtil.buildJmxDomain(JmxUtil.java:75) [infinispan-core-5.2.0-SNAPSHOT.jar:5.2.0-SNAPSHOT]
> at org.infinispan.jmx.CacheManagerJmxRegistration.updateDomain(CacheManagerJmxRegistration.java:101) [infinispan-core-5.2.0-SNAPSHOT.jar:5.2.0-SNAPSHOT]
> at org.infinispan.jmx.CacheManagerJmxRegistration.buildRegistrar(CacheManagerJmxRegistration.java:95) [infinispan-core-5.2.0-SNAPSHOT.jar:5.2.0-SNAPSHOT]
> at org.infinispan.jmx.AbstractJmxRegistration.registerMBeans(AbstractJmxRegistration.java:59) [infinispan-core-5.2.0-SNAPSHOT.jar:5.2.0-SNAPSHOT]
> at org.infinispan.jmx.CacheManagerJmxRegistration.start(CacheManagerJmxRegistration.java:63) [infinispan-core-5.2.0-SNAPSHOT.jar:5.2.0-SNAPSHOT]
> at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:705) [infinispan-core-5.2.0-SNAPSHOT.jar:5.2.0-SNAPSHOT]
> at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:300) [infinispan-core-5.2.0-SNAPSHOT.jar:5.2.0-SNAPSHOT]
> at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:139) [infinispan-core-5.2.0-SNAPSHOT.jar:5.2.0-SNAPSHOT]
> at org.infinispan.rest.StartupListener.init(StartupListener.scala:57) [classes:]
> at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1202) [jbossweb-7.0.17.Final-redhat-1.jar:]
> at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:1102) [jbossweb-7.0.17.Final-redhat-1.jar:]
> at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:3655) [jbossweb-7.0.17.Final-redhat-1.jar:]
> at org.apache.catalina.core.StandardContext.start(StandardContext.java:3873) [jbossweb-7.0.17.Final-redhat-1.jar:]
> at org.jboss.as.web.deployment.WebDeploymentService.start(WebDeploymentService.java:89) [jboss-as-web-7.1.3.Final-redhat-4.jar:7.1.3.Final-redhat-4]
> at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1811)
> at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1746)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_37]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_37]
> at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_37]
> {code}
> The problem is that DefaultCacheManager created in ManagerInstance.instance still lives after undeploy.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 2 months
[JBoss JIRA] (ISPN-2802) Cache recovery fails due to missing responses
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-2802?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-2802:
------------------------------------
Radim, what's your JGroups configuration?
Note that the CacheTopologyControlCommand responses are not themselves marked with the RSVP flag, so if they are dropped at any point it's up to UNICAST2 to re-transmit them. So if UNICAST2.stable_interval is the JGroups default (60 seconds), that would explain the timeout.
Most of the messages are received in < 40ms, but there are a couple response messages that arrive only after 30 seconds. That suggests there is some re-transmission involved, so you could try increasing the GET_STATUS timeout (configured with transport().distributedSyncTimeout()).
{code}
05:37:57,709 BYTEMAN (transport-thread-20): Sending hyperion988-43314 -> null: CacheTopologyControlCommand{cache=null, type=GET_STATUS, sender=hyperion988-43314, joinInfo=null, topologyId=0, currentCH=null, pendingCH=null, throwable=null, viewId=33}
... lots of other responses ...
05:37:57,748 BYTEMAN (OOB-1110,hyperion988-43314): Received rsp from hyperion1006-37219
05:38:05,231 BYTEMAN (OOB-1114,hyperion988-43314): Received rsp from hyperion1005-58850
05:38:30,750 BYTEMAN (OOB-1114,hyperion988-43314): Received rsp from hyperion1016-23122
05:38:33,252 BYTEMAN (OOB-1114,hyperion988-43314): Received rsp from hyperion1015-10195
{code}
It looks like the cluster doesn't have much activity except for the rebalancing, because there are a lot of log messages about UNICAST2 connections being reaped after 60 seconds of inactivity. There are a couple JGroups bugs that might be relevant: JGRP-1563 (which should actually be fixed in JGroups 3.2.6.Final, the version we use in 5.2.0.CR3), and JGRP-1586 (which won't be fixed until JGroups 3.3). So you should definitely run your test with {{UNICAST2.conn_expiry_timeout=0}}.
We still need to get some logging from UNICAST2, though - we need to see when the messages are actually sent, when they are re-transmitted, when the connections are created etc. If you think enabling TRACE for UNICAST2 isn't feasible with log4j, then we should definitely look into log4j2.
> Cache recovery fails due to missing responses
> ---------------------------------------------
>
> Key: ISPN-2802
> URL: https://issues.jboss.org/browse/ISPN-2802
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 5.2.0.CR3
> Reporter: Radim Vansa
> Assignee: Mircea Markus
>
> When the cache recovery is started, the new coordinator sends CacheTopologyControlCommand.GET_STATUS to all nodes and waits for responses. However, I have a reproducible test-case where it always times out waiting for the responses.
> Here are the logs (TRACE is not doable here, but I added some byteman traces - see topology.btm in the archive): http://dl.dropbox.com/u/103079234/recovery.zip
> The problematic spot is on node3 at 05:37:57 receiving cluster view 34.
> All nodes (except the one which is killed, in this case node1) respond quickly to the GET_STATUS command (see BYTEMAN Receiving - Received pairs, these are bound to command execution in CommandAwareRpcDispatcher), but some responses are not received on node3 (look for Receiving rsp bound to GroupRequest).
> JGroups tracing could be useful here but it is not available (intensive logging often blocks on internal log4j locks and the node becomes unresponsive).
> As mentioned above, the case is reproducible, therefore if you can suggest any particular BYTEMAN hook, I can try it.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 2 months