[jboss-jira] [JBoss JIRA] (JGRP-2372) LeaveTest fails frequently

Bela Ban (Jira) issues at jboss.org
Wed Sep 4 07:43:00 EDT 2019


    [ https://issues.jboss.org/browse/JGRP-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13779112#comment-13779112 ] 

Bela Ban commented on JGRP-2372:
--------------------------------

An attempt to fix this issue was to get rid of GMS.leave_timeout and have a member simply wait (on a graceful leave) until a LEAVE response was received from the coord, or until stop() was called.
However, this introduces the following problem:
* In ASYM_ENCRYPT, if we have \{A,B,C\} and C is excluded by A and B, then we have view \{A,B\} in A and B, and view \{A,B,C\} in C
* C keeps sending the LEAVE request to A, but A discards it, as C is not a member
* MERGE3 will not be able to help, as A and C won't be able to decrypt each other's messages because view \{A,B\} installed a new shared group key
* C will therefore block forever!

Perhaps we should add GMS.leave_timeout back!

> LeaveTest fails frequently
> --------------------------
>
>                 Key: JGRP-2372
>                 URL: https://issues.jboss.org/browse/JGRP-2372
>             Project: JGroups
>          Issue Type: Task
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>            Priority: Major
>             Fix For: 4.1.5
>
>
> Ditto for ASYM_ENCRYPT_LeaveTest and ASYM_ENCRYPT_LeaveTestKeyExchange. Multiple members leaving seems to leave some members behind; the view is never correct.
> This happens only when running the entire test suite; running a test individually, or running all encryption tests ({{ant encrypt}}) almost never reproduces the errors.
> This is possibly caused by the high load of running a lot of tests concurrently, and the subsequent delays resulting from it. Nevertheless, these tests should not fail.
> Error message:
> {noformat}
> 	Timeout 30000 kicked in, views are: 9: [7|15] (4) [7, 8, 9, 10] 10: [7|15] (4) [7, 8, 9, 10]
> java.util.concurrent.TimeoutException
> at org.jgroups.util.Util.waitUntilAllChannelsHaveSameView(Util.java:293)
> at org.jgroups.tests.BaseLeaveTest.testConcurrentLeaves(BaseLeaveTest.java:189)
> at org.jgroups.tests.BaseLeaveTest.testLeaveOfFirstNMembers(BaseLeaveTest.java:214)
> at org.jgroups.tests.BaseLeaveTest.testLeaveOfCoordAndNext8(BaseLeaveTest.java:146)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:124)
> at org.testng.internal.Invoker.invokeMethod(Invoker.java:583)
> at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:719)
> at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:989)
> at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834) 
> {noformat}



--
This message was sent by Atlassian Jira
(v7.13.5#713005)


More information about the jboss-jira mailing list