[JBoss JIRA] (ISPN-5123) MultiNodeDistributedTest deadlock
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5123?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-5123:
------------------------------------
The initial failure could also be related to ISPN-6399/JGRP-2028, because {{TestableCluster.killNode(Cache)}} waits only for the rehash and doesn't check the JGroups view.
> MultiNodeDistributedTest deadlock
> ---------------------------------
>
> Key: ISPN-5123
> URL: https://issues.jboss.org/browse/ISPN-5123
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Query
> Affects Versions: 7.1.0.Alpha1
> Reporter: Gustavo Fernandes
> Assignee: Gustavo Fernandes
> Attachments: infinispan-infinispan-query.log, stack.zip, trace.tar.gz
>
>
> I've been seeing this intermittent problem in my environment. Sometimes the query suite hangs for 30min (and then proceeds). See attached stack trace.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years
[JBoss JIRA] (ISPN-5123) MultiNodeDistributedTest deadlock
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5123?page=com.atlassian.jira.plugin.... ]
Dan Berindei edited comment on ISPN-5123 at 3/18/16 4:31 AM:
-------------------------------------------------------------
The failure in the first comment could also be related to ISPN-6399/JGRP-2028, because {{TestableCluster.killNode(Cache)}} waits only for the rehash and doesn't check the JGroups view.
was (Author: dan.berindei):
The initial failure could also be related to ISPN-6399/JGRP-2028, because {{TestableCluster.killNode(Cache)}} waits only for the rehash and doesn't check the JGroups view.
> MultiNodeDistributedTest deadlock
> ---------------------------------
>
> Key: ISPN-5123
> URL: https://issues.jboss.org/browse/ISPN-5123
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Query
> Affects Versions: 7.1.0.Alpha1
> Reporter: Gustavo Fernandes
> Assignee: Gustavo Fernandes
> Attachments: infinispan-infinispan-query.log, stack.zip, trace.tar.gz
>
>
> I've been seeing this intermittent problem in my environment. Sometimes the query suite hangs for 30min (and then proceeds). See attached stack trace.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years
[JBoss JIRA] (ISPN-6399) Timeout updating the JGroups view after killing one node
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-6399?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-6399:
-------------------------------
Summary: Timeout updating the JGroups view after killing one node (was: Enable FD_ALL by default in the testsuite)
> Timeout updating the JGroups view after killing one node
> --------------------------------------------------------
>
> Key: ISPN-6399
> URL: https://issues.jboss.org/browse/ISPN-6399
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 8.2.0.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
>
> {{GMS}} can sometimes delay the processing of a join/leave request because of JGRP-2028.
> Joiners retry automatically after {{GMS.join_timeout}}, so it's not that bad. Leavers, however, don't resend their leave requests, so the delay can be worse.
> Normally, the FD/FD_ALL/FD_SOCK protocols would wake up the ViewHandler thread. But we remove the FD* protocols from the stack in most of our tests, unless the test uses {{DISCARD}}. That means the leave request can be delayed until another node leaves:
> {noformat}
> 16:35:56,247 TRACE (testng-ClusterListenerDistAddListenerTest:) [GMS] NodeB-8309: sending LEAVE request to NodeA-45395
> 16:35:56,268 TRACE (OOB-1,NodeA-45395:) [TCP_NIO2] NodeA-45395: received [dst: NodeA-45395, src: NodeB-8309 (3 headers), size=0 bytes, flags=OOB], headers are GMS: GmsHeader[LEAVE_REQ]: mbr=NodeB-8309, UNICAST3: DATA, seqno=22, TP: [cluster_name=ISPN]
> 16:35:56,268 TRACE (OOB-1,NodeA-45395:) [UNICAST3] NodeA-45395: delivering NodeB-8309#22
> 16:36:07,263 ERROR (testng-ClusterListenerDistAddListenerTest:) [UnitTestTestNGListener] Test testMemberJoinsAndRetrievesClusterListenersButMainListenerNodeDiesBeforeInstalled(org.infinispan.notifications.cachelistener.cluster.ClusterListenerDistAddListenerTest) failed.
> org.infinispan.util.concurrent.TimeoutException: Timed out before caches had complete views. Expected 3 members in each view. Views are as follows: [[NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165], [NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165], [NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165]]
> 16:37:07,341 TRACE (testng-ClusterListenerDistAddListenerTest:) [GMS] NodeD-55165: sending LEAVE request to NodeA-45395
> 16:37:07,361 TRACE (OOB-4,NodeA-45395:) [TCP_NIO2] NodeA-45395: received [dst: NodeA-45395, src: NodeD-55165 (3 headers), size=0 bytes, flags=OOB], headers are GMS: GmsHeader[LEAVE_REQ]: mbr=NodeD-55165, UNICAST3: DATA, seqno=21, TP: [cluster_name=ISPN]
> 16:37:07,361 TRACE (OOB-4,NodeA-45395:) [UNICAST3] NodeA-45395: delivering NodeD-55165#21
> 16:37:07,361 TRACE (ViewHandler,NodeA-45395:) [GMS] NodeA-45395: joiners=[], suspected=[], leaving=[NodeB-8309], new view: [NodeA-45395|4] (3) [NodeA-45395, NodeC-53222, NodeD-55165]
> {noformat}
> {{FD_ALL}} is pretty cheap: it just sends a message every second, without opening any new sockets. So I think we should enable it by default, and only enable {{FD_SOCK}} with {{TransportFlags.withFD(true)}}.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years
[JBoss JIRA] (ISPN-6399) Timeout updating the JGroups view after killing one node
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-6399?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-6399:
------------------------------------
Actually, it might be better to disable {{GMS.view_bundling}} in the tests.
> Timeout updating the JGroups view after killing one node
> --------------------------------------------------------
>
> Key: ISPN-6399
> URL: https://issues.jboss.org/browse/ISPN-6399
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 8.2.0.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
>
> {{GMS}} can sometimes delay the processing of a join/leave request because of JGRP-2028.
> Joiners retry automatically after {{GMS.join_timeout}}, so it's not that bad. Leavers, however, don't resend their leave requests, so the delay can be worse.
> Normally, the FD/FD_ALL/FD_SOCK protocols would wake up the ViewHandler thread. But we remove the FD* protocols from the stack in most of our tests, unless the test uses {{DISCARD}}. That means the leave request can be delayed until another node leaves:
> {noformat}
> 16:35:56,247 TRACE (testng-ClusterListenerDistAddListenerTest:) [GMS] NodeB-8309: sending LEAVE request to NodeA-45395
> 16:35:56,268 TRACE (OOB-1,NodeA-45395:) [TCP_NIO2] NodeA-45395: received [dst: NodeA-45395, src: NodeB-8309 (3 headers), size=0 bytes, flags=OOB], headers are GMS: GmsHeader[LEAVE_REQ]: mbr=NodeB-8309, UNICAST3: DATA, seqno=22, TP: [cluster_name=ISPN]
> 16:35:56,268 TRACE (OOB-1,NodeA-45395:) [UNICAST3] NodeA-45395: delivering NodeB-8309#22
> 16:36:07,263 ERROR (testng-ClusterListenerDistAddListenerTest:) [UnitTestTestNGListener] Test testMemberJoinsAndRetrievesClusterListenersButMainListenerNodeDiesBeforeInstalled(org.infinispan.notifications.cachelistener.cluster.ClusterListenerDistAddListenerTest) failed.
> org.infinispan.util.concurrent.TimeoutException: Timed out before caches had complete views. Expected 3 members in each view. Views are as follows: [[NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165], [NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165], [NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165]]
> 16:37:07,341 TRACE (testng-ClusterListenerDistAddListenerTest:) [GMS] NodeD-55165: sending LEAVE request to NodeA-45395
> 16:37:07,361 TRACE (OOB-4,NodeA-45395:) [TCP_NIO2] NodeA-45395: received [dst: NodeA-45395, src: NodeD-55165 (3 headers), size=0 bytes, flags=OOB], headers are GMS: GmsHeader[LEAVE_REQ]: mbr=NodeD-55165, UNICAST3: DATA, seqno=21, TP: [cluster_name=ISPN]
> 16:37:07,361 TRACE (OOB-4,NodeA-45395:) [UNICAST3] NodeA-45395: delivering NodeD-55165#21
> 16:37:07,361 TRACE (ViewHandler,NodeA-45395:) [GMS] NodeA-45395: joiners=[], suspected=[], leaving=[NodeB-8309], new view: [NodeA-45395|4] (3) [NodeA-45395, NodeC-53222, NodeD-55165]
> {noformat}
> {{FD_ALL}} is pretty cheap: it just sends a message every second, without opening any new sockets. So I think we should enable it by default, and only enable {{FD_SOCK}} with {{TransportFlags.withFD(true)}}.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years
[JBoss JIRA] (ISPN-6399) Enable FD_ALL by default in the testsuite
by Dan Berindei (JIRA)
Dan Berindei created ISPN-6399:
----------------------------------
Summary: Enable FD_ALL by default in the testsuite
Key: ISPN-6399
URL: https://issues.jboss.org/browse/ISPN-6399
Project: Infinispan
Issue Type: Bug
Components: Test Suite - Core
Affects Versions: 8.2.0.Final
Reporter: Dan Berindei
Assignee: Dan Berindei
{{GMS}} can sometimes delay the processing of a join/leave request because of JGRP-2028.
Joiners retry automatically after {{GMS.join_timeout}}, so it's not that bad. Leavers, however, don't resend their leave requests, so the delay can be worse.
Normally, the FD/FD_ALL/FD_SOCK protocols would wake up the ViewHandler thread. But we remove the FD* protocols from the stack in most of our tests, unless the test uses {{DISCARD}}. That means the leave request can be delayed until another node leaves:
{noformat}
16:35:56,247 TRACE (testng-ClusterListenerDistAddListenerTest:) [GMS] NodeB-8309: sending LEAVE request to NodeA-45395
16:35:56,268 TRACE (OOB-1,NodeA-45395:) [TCP_NIO2] NodeA-45395: received [dst: NodeA-45395, src: NodeB-8309 (3 headers), size=0 bytes, flags=OOB], headers are GMS: GmsHeader[LEAVE_REQ]: mbr=NodeB-8309, UNICAST3: DATA, seqno=22, TP: [cluster_name=ISPN]
16:35:56,268 TRACE (OOB-1,NodeA-45395:) [UNICAST3] NodeA-45395: delivering NodeB-8309#22
16:36:07,263 ERROR (testng-ClusterListenerDistAddListenerTest:) [UnitTestTestNGListener] Test testMemberJoinsAndRetrievesClusterListenersButMainListenerNodeDiesBeforeInstalled(org.infinispan.notifications.cachelistener.cluster.ClusterListenerDistAddListenerTest) failed.
org.infinispan.util.concurrent.TimeoutException: Timed out before caches had complete views. Expected 3 members in each view. Views are as follows: [[NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165], [NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165], [NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165]]
16:37:07,341 TRACE (testng-ClusterListenerDistAddListenerTest:) [GMS] NodeD-55165: sending LEAVE request to NodeA-45395
16:37:07,361 TRACE (OOB-4,NodeA-45395:) [TCP_NIO2] NodeA-45395: received [dst: NodeA-45395, src: NodeD-55165 (3 headers), size=0 bytes, flags=OOB], headers are GMS: GmsHeader[LEAVE_REQ]: mbr=NodeD-55165, UNICAST3: DATA, seqno=21, TP: [cluster_name=ISPN]
16:37:07,361 TRACE (OOB-4,NodeA-45395:) [UNICAST3] NodeA-45395: delivering NodeD-55165#21
16:37:07,361 TRACE (ViewHandler,NodeA-45395:) [GMS] NodeA-45395: joiners=[], suspected=[], leaving=[NodeB-8309], new view: [NodeA-45395|4] (3) [NodeA-45395, NodeC-53222, NodeD-55165]
{noformat}
{{FD_ALL}} is pretty cheap: it just sends a message every second, without opening any new sockets. So I think we should enable it by default, and only enable {{FD_SOCK}} with {{TransportFlags.withFD(true)}}.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years
[JBoss JIRA] (ISPN-6399) Enable FD_ALL by default in the testsuite
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-6399?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-6399:
-------------------------------
Status: Open (was: New)
> Enable FD_ALL by default in the testsuite
> -----------------------------------------
>
> Key: ISPN-6399
> URL: https://issues.jboss.org/browse/ISPN-6399
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 8.2.0.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
>
> {{GMS}} can sometimes delay the processing of a join/leave request because of JGRP-2028.
> Joiners retry automatically after {{GMS.join_timeout}}, so it's not that bad. Leavers, however, don't resend their leave requests, so the delay can be worse.
> Normally, the FD/FD_ALL/FD_SOCK protocols would wake up the ViewHandler thread. But we remove the FD* protocols from the stack in most of our tests, unless the test uses {{DISCARD}}. That means the leave request can be delayed until another node leaves:
> {noformat}
> 16:35:56,247 TRACE (testng-ClusterListenerDistAddListenerTest:) [GMS] NodeB-8309: sending LEAVE request to NodeA-45395
> 16:35:56,268 TRACE (OOB-1,NodeA-45395:) [TCP_NIO2] NodeA-45395: received [dst: NodeA-45395, src: NodeB-8309 (3 headers), size=0 bytes, flags=OOB], headers are GMS: GmsHeader[LEAVE_REQ]: mbr=NodeB-8309, UNICAST3: DATA, seqno=22, TP: [cluster_name=ISPN]
> 16:35:56,268 TRACE (OOB-1,NodeA-45395:) [UNICAST3] NodeA-45395: delivering NodeB-8309#22
> 16:36:07,263 ERROR (testng-ClusterListenerDistAddListenerTest:) [UnitTestTestNGListener] Test testMemberJoinsAndRetrievesClusterListenersButMainListenerNodeDiesBeforeInstalled(org.infinispan.notifications.cachelistener.cluster.ClusterListenerDistAddListenerTest) failed.
> org.infinispan.util.concurrent.TimeoutException: Timed out before caches had complete views. Expected 3 members in each view. Views are as follows: [[NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165], [NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165], [NodeA-45395|3] (4) [NodeA-45395, NodeB-8309, NodeC-53222, NodeD-55165]]
> 16:37:07,341 TRACE (testng-ClusterListenerDistAddListenerTest:) [GMS] NodeD-55165: sending LEAVE request to NodeA-45395
> 16:37:07,361 TRACE (OOB-4,NodeA-45395:) [TCP_NIO2] NodeA-45395: received [dst: NodeA-45395, src: NodeD-55165 (3 headers), size=0 bytes, flags=OOB], headers are GMS: GmsHeader[LEAVE_REQ]: mbr=NodeD-55165, UNICAST3: DATA, seqno=21, TP: [cluster_name=ISPN]
> 16:37:07,361 TRACE (OOB-4,NodeA-45395:) [UNICAST3] NodeA-45395: delivering NodeD-55165#21
> 16:37:07,361 TRACE (ViewHandler,NodeA-45395:) [GMS] NodeA-45395: joiners=[], suspected=[], leaving=[NodeB-8309], new view: [NodeA-45395|4] (3) [NodeA-45395, NodeC-53222, NodeD-55165]
> {noformat}
> {{FD_ALL}} is pretty cheap: it just sends a message every second, without opening any new sockets. So I think we should enable it by default, and only enable {{FD_SOCK}} with {{TransportFlags.withFD(true)}}.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years
[JBoss JIRA] (ISPN-6398) Duplicate infinispan-tasks dependency in infinispan-server-infinispan POM
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-6398?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-6398:
-------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request: https://github.com/infinispan/infinispan/pull/4143
> Duplicate infinispan-tasks dependency in infinispan-server-infinispan POM
> -------------------------------------------------------------------------
>
> Key: ISPN-6398
> URL: https://issues.jboss.org/browse/ISPN-6398
> Project: Infinispan
> Issue Type: Bug
> Components: Build process, Server
> Affects Versions: 8.2.0.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Trivial
> Fix For: 9.0.0.Alpha1
>
>
> {noformat}
> [WARNING] Some problems were encountered while building the effective model for org.infinispan.server:infinispan-server-infinispan:jar:9.0.0-SNAPSHOT
> [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must be unique: org.infinispan:infinispan-tasks:jar -> version (?) vs ${version.org.infinispan} @ org.infinispan.server:infinispan-server-infinispan:[unknown-version], /tmp/privatebuild/infinispan/server/integration/infinispan/pom.xml, line 137, column 21
> {noformat}
> {noformat}
> <dependency>
> <groupId>org.infinispan</groupId>
> <artifactId>infinispan-tasks</artifactId>
> </dependency>
> ...
> <dependency>
> <groupId>org.infinispan</groupId>
> <artifactId>infinispan-tasks</artifactId>
> <version>${version.org.infinispan}</version>
> </dependency>
> {noformat}
> The version should be specified in the infinispan-server-versions POM.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years