[JBoss JIRA] (ISPN-4846) State transfer keeps trying to fetch transaction data after the cache was stopped
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-4846?page=com.atlassian.jira.plugin... ]
Dan Berindei resolved ISPN-4846.
--------------------------------
Fix Version/s: 9.0.0.Final
Resolution: Done
Fixed in commit 68902e62
> State transfer keeps trying to fetch transaction data after the cache was stopped
> ---------------------------------------------------------------------------------
>
> Key: ISPN-4846
> URL: https://issues.redhat.com/browse/ISPN-4846
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.0.CR1
> Reporter: Dan Berindei
> Priority: Major
> Fix For: 9.0.0.Final
>
>
> StateConsumerImpl doesn't check if the cache is stopped while fetching transaction data, it only stops when it's no longer able to find providers for transactions.
> However, JGroupsTransport throws a generic CacheException when the channel is stopped. The state transfer thread can enter a busy-wait loop, retrying to get the transaction data and immediately getting the CacheException, filling the log with messages like this:
> {noformat}
> 19:32:28,237 WARN (remote-thread-NodeN-p42592-t1:) [StateConsumerImpl] ISPN000209: Failed to retrieve transactions for segments [10, 11, 12, 13, 14, 15, 17, 16, 19, 18, 21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 42, 43, 40, 41, 46, 47, 44, 45, 51, 50, 49, 48, 55, 54, 53, 52, 59, 58, 57, 56] of cache testCache from node NodeM-53416
> org.infinispan.commons.CacheException: java.lang.IllegalStateException: channel is not connected
> at org.infinispan.commons.util.Util.rewrapAsCacheException(Util.java:655)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:176)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:536)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:290)
> at org.infinispan.statetransfer.StateConsumerImpl.getTransactions(StateConsumerImpl.java:766)
> at org.infinispan.statetransfer.StateConsumerImpl.requestTransactions(StateConsumerImpl.java:685)
> at org.infinispan.statetransfer.StateConsumerImpl.addTransfers(StateConsumerImpl.java:629)
> at org.infinispan.statetransfer.StateConsumerImpl.onTopologyUpdate(StateConsumerImpl.java:331)
> at org.infinispan.statetransfer.StateTransferManagerImpl.doTopologyUpdate(StateTransferManagerImpl.java:195)
> at org.infinispan.statetransfer.StateTransferManagerImpl.access$000(StateTransferManagerImpl.java:43)
> at org.infinispan.statetransfer.StateTransferManagerImpl$1.rebalance(StateTransferManagerImpl.java:116)
> {noformat}
> We should check is the cache is stopped before retrying in StateConsumerImpl.requestTransactions. I also think we should change the stop order - it would make sense to stop the remote executor threads and the RpcDispatcher before we stop the channel.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years
[JBoss JIRA] (ISPN-10891) JGroupsTransport registers the channel in JMX ignoring the cacheManagerName
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-10891?page=com.atlassian.jira.plugi... ]
Dan Berindei resolved ISPN-10891.
---------------------------------
Fix Version/s: 11.0.0.Alpha2
10.1.3.Final
Resolution: Done
Fixed with ISPN-11174
> JGroupsTransport registers the channel in JMX ignoring the cacheManagerName
> ---------------------------------------------------------------------------
>
> Key: ISPN-10891
> URL: https://issues.redhat.com/browse/ISPN-10891
> Project: Infinispan
> Issue Type: Bug
> Components: Core, JMX, reporting and management
> Affects Versions: 9.4.16.Final, 10.0.1.Final
> Reporter: Dan Berindei
> Priority: Major
> Fix For: 11.0.0.Alpha2, 10.1.3.Final
>
>
> {{JGroupsTransport}} registers the JGroups channel in JMX with a name like {{<jmx-domain>:type=channel,cluster={{cluster-name>}}.
> If two managers have a different {{cacheManagerName}}, all the cache manager and cache components can be registered along each other in the same JMX domain. The channel object name however doesn't include the manager name, so the 2nd cache manager fails to register its channel, and because of JGRP-2393 the cause of the error is hidden.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years
[JBoss JIRA] (IPROTO-134) Support for arrays of primitive
by Gustavo Fernandes (Jira)
Gustavo Fernandes created IPROTO-134:
----------------------------------------
Summary: Support for arrays of primitive
Key: IPROTO-134
URL: https://issues.redhat.com/browse/IPROTO-134
Project: Infinispan ProtoStream
Issue Type: Enhancement
Affects Versions: 4.3.2.Final
Reporter: Gustavo Fernandes
Add support to marshall and unmarshall arrays of primitives, like Short[]/short[], Byte[]/byte[], etc
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years
[JBoss JIRA] (ISPN-11127) Protocol servers don't pre-start caches when transport is disabled
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-11127?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-11127:
-----------------------------------
Fix Version/s: 10.1.5.Final
(was: 10.1.4.Final)
> Protocol servers don't pre-start caches when transport is disabled
> ------------------------------------------------------------------
>
> Key: ISPN-11127
> URL: https://issues.redhat.com/browse/ISPN-11127
> Project: Infinispan
> Issue Type: Bug
> Components: Server
> Affects Versions: 10.1.0.Final
> Reporter: Tristan Tarrant
> Assignee: Tristan Tarrant
> Priority: Major
> Fix For: 10.1.5.Final
>
>
> The various protocol servers have different initialization steps to ensures that caches are pre-started which don't work in all cases. In particular the REST server doesn't pre-start caches and the Hot Rod server pre-starts only if the transport is enabled.
> This should be handled uniformly.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years
[JBoss JIRA] (ISPN-10270) DroppedConnectionsTest.testClosedConnection random failures
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-10270?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-10270:
-----------------------------------
Fix Version/s: 10.1.5.Final
(was: 10.1.4.Final)
> DroppedConnectionsTest.testClosedConnection random failures
> -----------------------------------------------------------
>
> Key: ISPN-10270
> URL: https://issues.redhat.com/browse/ISPN-10270
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite
> Affects Versions: 10.0.0.Beta3
> Reporter: Dan Berindei
> Assignee: Tristan Tarrant
> Priority: Major
> Labels: testsuite_stability
> Fix For: 11.0.0.Dev03, 10.1.5.Final
>
> Attachments: ISPN-10137_Injection_without_reflection_20190605-1157_DroppedConnectionsTest-infinispan-client-hotrod.log.gz
>
>
> {noformat}
> 12:03:26,084 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.client.hotrod.DroppedConnectionsTest.testClosedConnection
> java.lang.AssertionError: expected:<1> but was:<0>
> at org.testng.AssertJUnit.fail(AssertJUnit.java:59) ~[testng-6.14.3.jar:?]
> at org.testng.AssertJUnit.failNotEquals(AssertJUnit.java:364) ~[testng-6.14.3.jar:?]
> at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:80) ~[testng-6.14.3.jar:?]
> at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:245) ~[testng-6.14.3.jar:?]
> at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:252) ~[testng-6.14.3.jar:?]
> at org.infinispan.client.hotrod.DroppedConnectionsTest.testClosedConnection(DroppedConnectionsTest.java:78) ~[test-classes/:?]
> {noformat}
> Full trace log attached
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years
[JBoss JIRA] (ISPN-11113) ScatteredDelayedAvailabilityUpdateTest takes too long
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-11113?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-11113:
-----------------------------------
Fix Version/s: 10.1.5.Final
(was: 10.1.4.Final)
> ScatteredDelayedAvailabilityUpdateTest takes too long
> -----------------------------------------------------
>
> Key: ISPN-11113
> URL: https://issues.redhat.com/browse/ISPN-11113
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite
> Affects Versions: 10.1.0.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Minor
> Fix For: 11.0.0.Final, 10.1.5.Final
>
>
> Partition handling tests use {{LOCAL_PING.setClusterName()}} with a unique name to disable discovery, otherwise partitions would try to merge while they are supposed to stay separate.
> But {{LOCAL_PING}} uses the cluster name on stop to remove the node from the static discovery map. If the test doesn't change the cluster name back, {{LOCAL_PING}} doesn't remove the node, the next test method sees an existing coordinator, and tries to connect to it. When a test has lots of test methods, like {{ScatteredDelayedAvailabilityUpdateTest}}, each test method leaves one more coordinator in the discovery map, and each test method takes longer to start the first method.
> {noformat}
> 09:08:52,758 DEBUG (testng:[]) [GMS] address=NodeA-30899, cluster=org.infinispan.partitionhandling.ScatteredDelayedAvailabilityUpdateTest[SCATTERED_SYNC, bias=NEVER, DENY_READ_WRITES], physical address=127.0.0.1:51941
> 09:08:52,774 TRACE (testng:[]) [GMS] NodeA-30899: discovery took 0 ms, members: 21 rsps (5 coords) [done]
> 09:08:52,774 DEBUG (testng:[]) [GMS] NodeA-30899: found multiple coords: [NodeA-2608, NodeA-5606, NodeA-17288, NodeA-64297, NodeA-48475]
> 09:08:52,774 DEBUG (testng:[]) [GMS] NodeA-30899: sending JOIN(NodeA-30899) to NodeA-5606
> 09:08:54,774 WARN (testng:[]) [GMS] NodeA-30899: JOIN(NodeA-30899) sent to NodeA-5606 timed out (after 2000 ms), on try 0
> 09:08:54,774 DEBUG (testng:[]) [GMS] NodeA-30899: sending JOIN(NodeA-30899) to NodeA-64297
> 09:08:56,775 WARN (testng:[]) [GMS] NodeA-30899: JOIN(NodeA-30899) sent to NodeA-64297 timed out (after 2000 ms), on try 0
> 09:08:56,775 DEBUG (testng:[]) [GMS] NodeA-30899: sending JOIN(NodeA-30899) to NodeA-48475
> 09:08:58,775 WARN (testng:[]) [GMS] NodeA-30899: JOIN(NodeA-30899) sent to NodeA-48475 timed out (after 2000 ms), on try 0
> 09:08:58,775 DEBUG (testng:[]) [GMS] NodeA-30899: sending JOIN(NodeA-30899) to NodeA-17288
> 09:09:00,776 WARN (testng:[]) [GMS] NodeA-30899: JOIN(NodeA-30899) sent to NodeA-17288 timed out (after 2000 ms), on try 0
> 09:09:00,776 DEBUG (testng:[]) [GMS] NodeA-30899: sending JOIN(NodeA-30899) to NodeA-2608
> 09:09:02,776 WARN (testng:[]) [GMS] NodeA-30899: JOIN(NodeA-30899) sent to NodeA-2608 timed out (after 2000 ms), on try 0
> 09:09:02,776 TRACE (testng:[]) [GMS] NodeA-30899: discovery took 0 ms, members: 21 rsps (5 coords) [done]
> 09:09:02,776 DEBUG (testng:[]) [GMS] NodeA-30899: found multiple coords: [NodeA-2608, NodeA-5606, NodeA-17288, NodeA-64297, NodeA-48475]
> 09:09:02,776 DEBUG (testng:[]) [GMS] NodeA-30899: sending JOIN(NodeA-30899) to NodeA-5606
> 09:09:04,776 WARN (testng:[]) [GMS] NodeA-30899: JOIN(NodeA-30899) sent to NodeA-5606 timed out (after 2000 ms), on try 1
> ...
> 09:09:12,777 TRACE (testng:[]) [GMS] NodeA-30899: discovery took 0 ms, members: 21 rsps (5 coords) [done]
> 09:09:12,778 DEBUG (testng:[]) [GMS] NodeA-30899: found multiple coords: [NodeA-2608, NodeA-5606, NodeA-17288, NodeA-64297, NodeA-48475]
> 09:09:12,778 DEBUG (testng:[]) [GMS] NodeA-30899: sending JOIN(NodeA-30899) to NodeA-2608
> 09:09:14,778 WARN (testng:[]) [GMS] NodeA-30899: JOIN(NodeA-30899) sent to NodeA-2608 timed out (after 2000 ms), on try 2
> ...
> 09:09:22,780 WARN (testng:[]) [GMS] NodeA-30899: too many JOIN attempts (3): becoming singleton
> {noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years
[JBoss JIRA] (ISPN-11126) Health check fails when authz is enabled and cache is not yet started
by Tristan Tarrant (Jira)
[ https://issues.redhat.com/browse/ISPN-11126?page=com.atlassian.jira.plugi... ]
Tristan Tarrant updated ISPN-11126:
-----------------------------------
Fix Version/s: 10.1.5.Final
(was: 10.1.4.Final)
> Health check fails when authz is enabled and cache is not yet started
> ---------------------------------------------------------------------
>
> Key: ISPN-11126
> URL: https://issues.redhat.com/browse/ISPN-11126
> Project: Infinispan
> Issue Type: Bug
> Components: JMX, reporting and management, REST
> Affects Versions: 10.1.0.Final
> Reporter: Tristan Tarrant
> Assignee: Tristan Tarrant
> Priority: Major
> Fix For: 11.0.0.Dev03, 10.1.5.Final
>
>
> When using predefined caches on a container with authorization, and the caches have not been started, the health handler for the cachemanager resource fails with an authorization error.
> The call should be wrapped in a SecurityAction.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years