[JBoss JIRA] (ISPN-6402) Default GMS.join_timeout is too long
by Dan Berindei (JIRA)
Dan Berindei created ISPN-6402:
----------------------------------
Summary: Default GMS.join_timeout is too long
Key: ISPN-6402
URL: https://issues.jboss.org/browse/ISPN-6402
Project: Infinispan
Issue Type: Task
Components: Core, Server, Test Suite - Server
Reporter: Dan Berindei
Assignee: Dan Berindei
Priority: Minor
{{GMS.join_timeout}} is used by JGroups for two purposes:
# Wait for {{FIND_INITIAL_MBRS}} responses. If other nodes are running, but they don't answer within {{join_timeout}} ms, the node will start a new partition by itself.
# If no other nodes are running when the request is sent, but another node starts and sends its own discovery request within {{join_timeout}}, the initial cluster view will contain both nodes, but this isn't really useful in Infinispan (we have {{gcb.transport().initialClusterSize()}} instead).
# Once a coordinator is located, the node sends a join request and waits for a response for {{join_timeout}} ms. After a timeout, the node re-sends the join request (up to a maximum of {{max_join_attempts}}, which defaults to 10).
The default {{GMS.join_timeout}} in Infinispan is 15000, vs. 2000 in JGroups (actually 3000 in {{GMS}} itself, but 2000 in the example configurations).
The higher timeout will only help us when a node is running, but it's inaccessible (e.g. because of a long GC) at the exact time a node is joining. I'd argue that applications that can tolerate multi-second pauses would be better served by {{gcb.transport().initialClusterSize(2)}} and/or an external discovery mechanism (e.g. {{FILE_PING}}, or something based on the WildFly domain controller). For most applications, the current default means just a 15s delay every time the cluster is (re)started.
In particular, because our integration tests use the default configuration, it means a delay of 15s for every test that starts a cluster.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years
[JBoss JIRA] (ISPN-6383) Add forEach method to CacheStream that takes BiConsumer
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-6383?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-6383:
----------------------------------
Status: Resolved (was: Pull Request Sent)
Fix Version/s: 9.0.0.Final
Resolution: Done
> Add forEach method to CacheStream that takes BiConsumer
> -------------------------------------------------------
>
> Key: ISPN-6383
> URL: https://issues.jboss.org/browse/ISPN-6383
> Project: Infinispan
> Issue Type: Enhancement
> Components: Distributed Execution and Map/Reduce
> Reporter: William Burns
> Assignee: William Burns
> Fix For: 9.0.0.Final, 9.0.0.Alpha1
>
>
> Now that we have ISPN-6349 & ISPN-6272 we can start adding some more non standard methods. One example is to use forEach where it takes a BiConsumer instead. This will be easier than using the CacheAware interface.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years
[JBoss JIRA] (ISPN-6399) Timeout updating the JGroups view after killing one node
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-6399?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-6399:
-------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request: https://github.com/infinispan/infinispan/pull/4146
> Timeout updating the JGroups view after killing one node
> --------------------------------------------------------
>
> Key: ISPN-6399
> URL: https://issues.jboss.org/browse/ISPN-6399
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 8.2.0.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
>
> {{GMS}} can sometimes delay the processing of a join/leave request because of JGRP-2028.
> Joiners retry automatically after {{GMS.join_timeout}}, so it's not that bad. Leavers, however, don't resend their leave requests, so the delay can be worse.
> Normally, the FD/FD_ALL/FD_SOCK protocols would wake up the ViewHandler thread. But we remove the FD* protocols from the stack in most of our tests, unless the test uses {{DISCARD}}. That means the leave request can be delayed until another node leaves:
> {noformat}
> 16:35:56,247 TRACE (testng-ClusterListenerDistAddListenerTest:) [GMS] NodeB-8309: sending LEAVE request to NodeA-45395
> 16:35:56,268 TRACE (OOB-1,NodeA-45395:) [TCP_NIO2] NodeA-45395: received [dst: NodeA-45395, src: NodeB-8309 (3 headers), size=0 bytes, flags=OOB], headers are GMS: GmsHeader[LEAVE_REQ]: mbr=NodeB-8309, UNICAST3: DATA, seqno=22, TP: [cluster_name=ISPN]
> 16:35:56,268 TRACE (OOB-1,NodeA-45395:) [UNICAST3] NodeA-45395: delivering NodeB-8309#22
> 16:36:07,263 ERROR (testng-ClusterListenerDistAddListenerTest:) [UnitTestTestNGListener] Test testMemberJoinsAndRetrievesClusterListenersButMainListenerNodeDiesBeforeInstalled(org.infinispan.notifications.cachelistener.cluster.ClusterListenerDistAddListenerTest) failed.
> org.infinispan.util.concurrent.TimeoutException: Timed out before caches had complete views. Expected 3 members in each view. Views are as follows: [[NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165], [NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165], [NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165]]
> 16:37:07,341 TRACE (testng-ClusterListenerDistAddListenerTest:) [GMS] NodeD-55165: sending LEAVE request to NodeA-45395
> 16:37:07,361 TRACE (OOB-4,NodeA-45395:) [TCP_NIO2] NodeA-45395: received [dst: NodeA-45395, src: NodeD-55165 (3 headers), size=0 bytes, flags=OOB], headers are GMS: GmsHeader[LEAVE_REQ]: mbr=NodeD-55165, UNICAST3: DATA, seqno=21, TP: [cluster_name=ISPN]
> 16:37:07,361 TRACE (OOB-4,NodeA-45395:) [UNICAST3] NodeA-45395: delivering NodeD-55165#21
> 16:37:07,361 TRACE (ViewHandler,NodeA-45395:) [GMS] NodeA-45395: joiners=[], suspected=[], leaving=[NodeB-8309], new view: [NodeA-45395|4] (3) [NodeA-45395, NodeC-53222, NodeD-55165]
> {noformat}
> {{FD_ALL}} is pretty cheap: it just sends a message every second, without opening any new sockets. So I think we should enable it by default, and only enable {{FD_SOCK}} with {{TransportFlags.withFD(true)}}.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years
[JBoss JIRA] (ISPN-5123) MultiNodeDistributedTest deadlock
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5123?page=com.atlassian.jira.plugin.... ]
Dan Berindei edited comment on ISPN-5123 at 3/18/16 7:45 AM:
-------------------------------------------------------------
The failure in the first comment could also be related to ISPN-6399/JGRP-2028, because {{TestableCluster.killNode(Cache)}} waits only for the rehash and doesn't check the JGroups view.
Update: Normally the JGroups view shouldn't matter, because the {{REBALANCE_START}}/{{CH_UPDATE}} commands are sent asynchronously, and the coordinator doesn't wait for responses from nodes that don't have the cache. But here it is the coordinator that's leaving, and it is responsible for installing the new view for the new coordinator. So it's possible that the remaining nodes would send their {{REBALANCE_CONFIRM}} commands to the old coordinator, which wouldn't be able to process them.
was (Author: dan.berindei):
The failure in the first comment could also be related to ISPN-6399/JGRP-2028, because {{TestableCluster.killNode(Cache)}} waits only for the rehash and doesn't check the JGroups view.
> MultiNodeDistributedTest deadlock
> ---------------------------------
>
> Key: ISPN-5123
> URL: https://issues.jboss.org/browse/ISPN-5123
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Query
> Affects Versions: 7.1.0.Alpha1
> Reporter: Gustavo Fernandes
> Assignee: Gustavo Fernandes
> Attachments: infinispan-infinispan-query.log, stack.zip, trace.tar.gz
>
>
> I've been seeing this intermittent problem in my environment. Sometimes the query suite hangs for 30min (and then proceeds). See attached stack trace.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years
[JBoss JIRA] (ISPN-6398) Duplicate infinispan-tasks dependency in infinispan-server-infinispan POM
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-6398?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-6398:
----------------------------------
Status: Resolved (was: Pull Request Sent)
Fix Version/s: 9.0.0.Final
8.2.1.Final
Resolution: Done
> Duplicate infinispan-tasks dependency in infinispan-server-infinispan POM
> -------------------------------------------------------------------------
>
> Key: ISPN-6398
> URL: https://issues.jboss.org/browse/ISPN-6398
> Project: Infinispan
> Issue Type: Bug
> Components: Build process, Server
> Affects Versions: 8.2.0.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Trivial
> Fix For: 9.0.0.Final, 9.0.0.Alpha1, 8.2.1.Final
>
>
> {noformat}
> [WARNING] Some problems were encountered while building the effective model for org.infinispan.server:infinispan-server-infinispan:jar:9.0.0-SNAPSHOT
> [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must be unique: org.infinispan:infinispan-tasks:jar -> version (?) vs ${version.org.infinispan} @ org.infinispan.server:infinispan-server-infinispan:[unknown-version], /tmp/privatebuild/infinispan/server/integration/infinispan/pom.xml, line 137, column 21
> {noformat}
> {noformat}
> <dependency>
> <groupId>org.infinispan</groupId>
> <artifactId>infinispan-tasks</artifactId>
> </dependency>
> ...
> <dependency>
> <groupId>org.infinispan</groupId>
> <artifactId>infinispan-tasks</artifactId>
> <version>${version.org.infinispan}</version>
> </dependency>
> {noformat}
> The version should be specified in the infinispan-server-versions POM.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years
[JBoss JIRA] (ISPN-6022) Unable to query cache when data is preloaded via AdvancedCacheLoader#process
by Dan Siviter (JIRA)
[ https://issues.jboss.org/browse/ISPN-6022?page=com.atlassian.jira.plugin.... ]
Dan Siviter edited comment on ISPN-6022 at 3/18/16 6:06 AM:
------------------------------------------------------------
Here you go:
{code:java}
GlobalConfigurationBuilder global = new GlobalConfigurationBuilder();
global.globalJmxStatistics().allowDuplicateDomains(true);
Configuration config = new ConfigurationBuilder().persistence().addStore(CustomStoreConfigurationBuilder.class)
.customStoreClass(loaderType)
.purgeOnStartup(true)
.preload(true)
.indexing()
.index(Index.LOCAL)
.addIndexedEntity(type).build();
{code}
Nothing is distributed at the moment, everything is loaded and held locally. As per the description {{loaderType}} is of a {{org.infinispan.persistence.spi.AdvancedCacheLoader}}
was (Author: dansiviter):
Here you go:
{code:java}
GlobalConfigurationBuilder global = new GlobalConfigurationBuilder();
global.globalJmxStatistics().allowDuplicateDomains(true);
Configuration config = new ConfigurationBuilder().persistence().addStore(CustomStoreConfigurationBuilder.class)
.customStoreClass(loaderType)
.purgeOnStartup(true)
.preload(true)
.indexing()
.index(Index.LOCAL)
.addIndexedEntity(type).build();
{code}
Nothing is distributed at the moment, everything is loaded and held locally.
> Unable to query cache when data is preloaded via AdvancedCacheLoader#process
> ----------------------------------------------------------------------------
>
> Key: ISPN-6022
> URL: https://issues.jboss.org/browse/ISPN-6022
> Project: Infinispan
> Issue Type: Bug
> Components: Embedded Querying, Loaders and Stores
> Affects Versions: 8.1.0.Final, 8.1.1.Final
> Reporter: Dan Siviter
>
> When preloading from a {{AdvancedCacheLoader}} the index doesn't get updated. Therefore it is only possible to query items that have been {{#put(...)}} into the cache. I am able to get preloaded items from the cache using their key which leads me to think the index is never built on pre-load.
> I've seen no implicit rebuilding of caches in any of the existing {{AdvancedCacheLoader#process(...)}} which leads me to think this will not work with any of them.
> I've verified this reindexing using {{searchManager.getMassIndexer().start()}} and the query will then return results.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years