[JBoss JIRA] (ISPN-11466) NettyTransportConnectionStats should not rely on the JMX support
by Nistor Adrian (Jira)
Nistor Adrian created ISPN-11466:
------------------------------------
Summary: NettyTransportConnectionStats should not rely on the JMX support
Key: ISPN-11466
URL: https://issues.redhat.com/browse/ISPN-11466
Project: Infinispan
Issue Type: Bug
Components: Server
Affects Versions: 10.1.0.Final
Reporter: Nistor Adrian
Fix For: 11.0.0.Final
Looking up those transports via jmx only works with jmx enabled. Should work regardless of that.
Keep in mind that these stats are exported as MP metrics. And jmx is not always available.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years
[JBoss JIRA] (ISPN-11465) Upgrade to JGroups 4.2.1.Final
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-11465?page=com.atlassian.jira.plugi... ]
Dan Berindei commented on ISPN-11465:
-------------------------------------
JGRP-2448 was also fixed in 4.2.0, and the uncaught exception appears as a failure in Jenkins, e.g.
{noformat}
org.infinispan.xsite.ImplicitBackupCacheStoppedTest[null, tx=false].Uncaught (from (empty))
java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0
at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248)
at java.base/java.util.Objects.checkIndex(Objects.java:372)
at java.base/java.util.ArrayList.get(ArrayList.java:458)
at java.base/java.util.Collections$SynchronizedList.get(Collections.java:2426)
at org.jgroups.util.Util.pickNext(Util.java:2688)
at org.jgroups.protocols.FD_SOCK.determinePingDest(FD_SOCK.java:762)
at org.jgroups.protocols.FD_SOCK.run(FD_SOCK.java:408)
at java.base/java.lang.Thread.run(Thread.java:834)
{noformat}
> Upgrade to JGroups 4.2.1.Final
> ------------------------------
>
> Key: ISPN-11465
> URL: https://issues.redhat.com/browse/ISPN-11465
> Project: Infinispan
> Issue Type: Component Upgrade
> Components: Dependency
> Affects Versions: 11.0.0.Alpha2
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 11.0.0.Dev03
>
>
> JGroups 4.2.1 includes a fix for JGRP-2435, which should fix the random failure in {{InitialClusterSizeTest}}.
> JGRP-2451 also adds a new failure detection protocol, {{FD3_ALL}}, which doesn't need extra heartbeats as long as the nodes are sending other broadcast messages.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years
[JBoss JIRA] (ISPN-7439) InitialClusterSizeTest can hang during teardown
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-7439?page=com.atlassian.jira.plugin... ]
Dan Berindei closed ISPN-7439.
------------------------------
Resolution: Cannot Reproduce
{{InitialClusterSizeTest}} has been failing again lately because of JGRP-2435, but it never hanged, so JGRP-2262 likely fixed the problem.
> InitialClusterSizeTest can hang during teardown
> -----------------------------------------------
>
> Key: ISPN-7439
> URL: https://issues.redhat.com/browse/ISPN-7439
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite
> Affects Versions: 9.0.0.Beta2
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Labels: testsuite_stability
>
> Test {{testInitialClusterSizeFail}} expects the nodes to time out in {{JGroupsTransport.waitForInitialNodes()}}, but in at least one case the timeout didn't happen. The test then tried to shut down the cache managers, but it hanged because another thread was holding the {{GlobalComponentRegistry}} lock:
> {noformat}
> "testng-InitialClusterSizeTest" #13 prio=5 os_prio=0 tid=0x00007f1874d1f000 nid=0x1778 waiting for monitor entry [0x00007f181bafc000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at org.infinispan.factories.GlobalComponentRegistry.stop(GlobalComponentRegistry.java:280)
> - waiting to lock <0x0000000093c7afe0> (a org.infinispan.factories.GlobalComponentRegistry)
> at org.infinispan.manager.DefaultCacheManager.stop(DefaultCacheManager.java:701)
> - locked <0x000000008a005b80> (a org.infinispan.manager.DefaultCacheManager)
> at org.infinispan.test.TestingUtil.killCacheManagers(TestingUtil.java:656)
> at org.infinispan.test.MultipleCacheManagersTest.clearContent(MultipleCacheManagersTest.java:138)
> at sun.reflect.GeneratedMethodAccessor175.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:84)
> at org.testng.internal.Invoker.invokeConfigurationMethod(Invoker.java:564)
> at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:213)
> at org.testng.internal.Invoker.invokeMethod(Invoker.java:786)
> at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901)
> at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231)
> at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
> at org.testng.TestRunner.privateRun(TestRunner.java:767)
> at org.testng.TestRunner.run(TestRunner.java:617)
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:348)
> at org.testng.SuiteRunner.access$000(SuiteRunner.java:38)
> at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:382)
> at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> "ForkThread-4,InitialClusterSizeTest" #167842 prio=5 os_prio=0 tid=0x00007f1824163800 nid=0x3316 waiting on condition [0x00007f17e62b9000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.jgroups.util.Util.sleep(Util.java:1818)
> at org.jgroups.protocols.pbcast.ClientGmsImpl.firstOfAllClients(ClientGmsImpl.java:181)
> at org.jgroups.protocols.pbcast.ClientGmsImpl.joinInternal(ClientGmsImpl.java:97)
> at org.jgroups.protocols.pbcast.ClientGmsImpl.join(ClientGmsImpl.java:41)
> at org.jgroups.protocols.pbcast.GMS.down(GMS.java:1066)
> at org.jgroups.protocols.tom.TOA.down(TOA.java:73)
> at org.jgroups.protocols.FlowControl.down(FlowControl.java:302)
> at org.jgroups.protocols.RSVP.down(RSVP.java:102)
> at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:900)
> at org.jgroups.JChannel.down(JChannel.java:644)
> at org.jgroups.JChannel._connect(JChannel.java:873)
> at org.jgroups.JChannel.connect(JChannel.java:369)
> - locked <0x0000000093c7aea0> (a org.jgroups.JChannel)
> at org.jgroups.JChannel.connect(JChannel.java:360)
> - locked <0x0000000093c7aea0> (a org.jgroups.JChannel)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded(JGroupsTransport.java:221)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.start(JGroupsTransport.java:211)
> at sun.reflect.GeneratedMethodAccessor109.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:168)
> at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:867)
> at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:633)
> at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:622)
> at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:547)
> - locked <0x0000000093c7afe0> (a org.infinispan.factories.GlobalComponentRegistry)
> at org.infinispan.factories.GlobalComponentRegistry.start(GlobalComponentRegistry.java:244)
> - locked <0x0000000093c7afe0> (a org.infinispan.factories.GlobalComponentRegistry)
> at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:666)
> at org.infinispan.remoting.transport.InitialClusterSizeTest.lambda$testInitialClusterSize$799(InitialClusterSizeTest.java:47)
> at org.infinispan.remoting.transport.InitialClusterSizeTest$$Lambda$2092/593962598.run(Unknown Source)
> at org.infinispan.test.AbstractInfinispanTest$RunnableWrapper.run(AbstractInfinispanTest.java:510)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> "ForkThread-2,InitialClusterSizeTest" #167840 prio=5 os_prio=0 tid=0x00007f1824164800 nid=0x3314 waiting on condition [0x00007f17eec40000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.jgroups.util.Util.sleep(Util.java:1818)
> at org.jgroups.protocols.pbcast.ClientGmsImpl.firstOfAllClients(ClientGmsImpl.java:181)
> at org.jgroups.protocols.pbcast.ClientGmsImpl.joinInternal(ClientGmsImpl.java:97)
> at org.jgroups.protocols.pbcast.ClientGmsImpl.join(ClientGmsImpl.java:41)
> at org.jgroups.protocols.pbcast.GMS.down(GMS.java:1066)
> at org.jgroups.protocols.tom.TOA.down(TOA.java:73)
> at org.jgroups.protocols.FlowControl.down(FlowControl.java:302)
> at org.jgroups.protocols.RSVP.down(RSVP.java:102)
> at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:900)
> at org.jgroups.JChannel.down(JChannel.java:644)
> at org.jgroups.JChannel._connect(JChannel.java:873)
> at org.jgroups.JChannel.connect(JChannel.java:369)
> - locked <0x0000000093c7b7f0> (a org.jgroups.JChannel)
> at org.jgroups.JChannel.connect(JChannel.java:360)
> - locked <0x0000000093c7b7f0> (a org.jgroups.JChannel)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded(JGroupsTransport.java:221)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.start(JGroupsTransport.java:211)
> at sun.reflect.GeneratedMethodAccessor109.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:168)
> at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:867)
> at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:633)
> at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:622)
> at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:547)
> - locked <0x0000000093c7b930> (a org.infinispan.factories.GlobalComponentRegistry)
> at org.infinispan.factories.GlobalComponentRegistry.start(GlobalComponentRegistry.java:244)
> - locked <0x0000000093c7b930> (a org.infinispan.factories.GlobalComponentRegistry)
> at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:666)
> at org.infinispan.remoting.transport.InitialClusterSizeTest.lambda$testInitialClusterSize$799(InitialClusterSizeTest.java:47)
> at org.infinispan.remoting.transport.InitialClusterSizeTest$$Lambda$2092/593962598.run(Unknown Source)
> at org.infinispan.test.AbstractInfinispanTest$RunnableWrapper.run(AbstractInfinispanTest.java:510)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> See the full thread dump here: http://ci.infinispan.org/viewLog.html?buildId=49393&buildTypeId=bt9
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years
[JBoss JIRA] (ISPN-4846) State transfer keeps trying to fetch transaction data after the cache was stopped
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-4846?page=com.atlassian.jira.plugin... ]
Dan Berindei closed ISPN-4846.
------------------------------
> State transfer keeps trying to fetch transaction data after the cache was stopped
> ---------------------------------------------------------------------------------
>
> Key: ISPN-4846
> URL: https://issues.redhat.com/browse/ISPN-4846
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.0.CR1
> Reporter: Dan Berindei
> Priority: Major
> Fix For: 9.0.0.Final
>
>
> StateConsumerImpl doesn't check if the cache is stopped while fetching transaction data, it only stops when it's no longer able to find providers for transactions.
> However, JGroupsTransport throws a generic CacheException when the channel is stopped. The state transfer thread can enter a busy-wait loop, retrying to get the transaction data and immediately getting the CacheException, filling the log with messages like this:
> {noformat}
> 19:32:28,237 WARN (remote-thread-NodeN-p42592-t1:) [StateConsumerImpl] ISPN000209: Failed to retrieve transactions for segments [10, 11, 12, 13, 14, 15, 17, 16, 19, 18, 21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 42, 43, 40, 41, 46, 47, 44, 45, 51, 50, 49, 48, 55, 54, 53, 52, 59, 58, 57, 56] of cache testCache from node NodeM-53416
> org.infinispan.commons.CacheException: java.lang.IllegalStateException: channel is not connected
> at org.infinispan.commons.util.Util.rewrapAsCacheException(Util.java:655)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:176)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:536)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:290)
> at org.infinispan.statetransfer.StateConsumerImpl.getTransactions(StateConsumerImpl.java:766)
> at org.infinispan.statetransfer.StateConsumerImpl.requestTransactions(StateConsumerImpl.java:685)
> at org.infinispan.statetransfer.StateConsumerImpl.addTransfers(StateConsumerImpl.java:629)
> at org.infinispan.statetransfer.StateConsumerImpl.onTopologyUpdate(StateConsumerImpl.java:331)
> at org.infinispan.statetransfer.StateTransferManagerImpl.doTopologyUpdate(StateTransferManagerImpl.java:195)
> at org.infinispan.statetransfer.StateTransferManagerImpl.access$000(StateTransferManagerImpl.java:43)
> at org.infinispan.statetransfer.StateTransferManagerImpl$1.rebalance(StateTransferManagerImpl.java:116)
> {noformat}
> We should check is the cache is stopped before retrying in StateConsumerImpl.requestTransactions. I also think we should change the stop order - it would make sense to stop the remote executor threads and the RpcDispatcher before we stop the channel.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years
[JBoss JIRA] (ISPN-11465) Upgrade to JGroups 4.2.1.Final
by Dan Berindei (Jira)
Dan Berindei created ISPN-11465:
-----------------------------------
Summary: Upgrade to JGroups 4.2.1.Final
Key: ISPN-11465
URL: https://issues.redhat.com/browse/ISPN-11465
Project: Infinispan
Issue Type: Component Upgrade
Components: Dependency
Affects Versions: 11.0.0.Alpha2
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 11.0.0.Dev03
JGroups 4.2.1 includes a fix for JGRP-2435, which should fix the random failure in {{InitialClusterSizeTest}}.
JGRP-2451 also adds a new failure detection protocol, {{FD3_ALL}}, which doesn't need extra heartbeats as long as the nodes are sending other broadcast messages.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years