[jboss-jira] [JBoss JIRA] (JGRP-2293) Graceful concurrent leaving of coordinator(s) leaves the cluster with stale views

Dan Berindei (Jira) issues at jboss.org
Wed Feb 6 07:07:01 EST 2019


    [ https://issues.jboss.org/browse/JGRP-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691718#comment-13691718 ] 

Dan Berindei edited comment on JGRP-2293 at 2/6/19 7:06 AM:
------------------------------------------------------------

[~belaban] I ran our test suite a few times without reproducing the failure. Then I got the idea to repeat the offending test 100 times, and I got it to fail both with JGroups 4.0.15.Final and with 4.0.17-SNAPSHOT. Finally I analyzed the logs and I think it is a problem with the test itself, so the fix is good for me.

I still haven't managed to run {{LeaveTest}} successfully from the command line though, the nodes never form a cluster because they don't see each other's MPING requests. If I run without {{<jvmarg value="-Djava.net.preferIPv4Stack=true"/>}} I get a sendto error (see below), but with it I get no error message, the nodes just don't see each other. I'd say it's a problem with my environment, but the same test using the same mcast address (230.5.6.7) passes when run from the IDE.

{noformat}
12:55:16,177 ERROR (main:[]) [MPING] JGRP000200: failed sending discovery request
java.io.IOException: Invalid argument (sendto failed)
	at java.net.PlainDatagramSocketImpl.send(Native Method) ~[?:1.8.0_171]
	at java.net.DatagramSocket.send(DatagramSocket.java:693) ~[?:1.8.0_171]
	at org.jgroups.protocols.MPING.sendMcastDiscoveryRequest(MPING.java:306) [classes/:?]
	at org.jgroups.protocols.PING.sendDiscoveryRequest(PING.java:64) [classes/:?]
	at org.jgroups.protocols.PING.findMembers(PING.java:32) [classes/:?]
{noformat}


was (Author: dan.berindei):
[~belaban] I ran our test suite a few times without reproducing the failure. Then I got the idea to repeat the offending test 100 times, and I got it to fail both with JGroups 4.0.15.Final and with 4.0.17-SNAPSHOT. Finally I analyzed the logs and I think it is a problem with the test itself, so the fix is good for me.

I still haven't managed to run {{LeaveTest}} successfully from the command line though, the nodes never form a cluster because they don't see each other's MPING requests. If I run without {{<jvmarg value="-Djava.net.preferIPv4Stack=true"/>}} I get a sendto error (see below), but with it I get no error message, the nodes just don't see each other. I'd say it's a problem with my environment, but the same test using the same mcast address (230.0.5.6.7) passes when run from the IDE.

{noformat}
12:55:16,177 ERROR (main:[]) [MPING] JGRP000200: failed sending discovery request
java.io.IOException: Invalid argument (sendto failed)
	at java.net.PlainDatagramSocketImpl.send(Native Method) ~[?:1.8.0_171]
	at java.net.DatagramSocket.send(DatagramSocket.java:693) ~[?:1.8.0_171]
	at org.jgroups.protocols.MPING.sendMcastDiscoveryRequest(MPING.java:306) [classes/:?]
	at org.jgroups.protocols.PING.sendDiscoveryRequest(PING.java:64) [classes/:?]
	at org.jgroups.protocols.PING.findMembers(PING.java:32) [classes/:?]
{noformat}

> Graceful concurrent leaving of coordinator(s) leaves the cluster with stale views
> ---------------------------------------------------------------------------------
>
>                 Key: JGRP-2293
>                 URL: https://issues.jboss.org/browse/JGRP-2293
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 4.0.14
>            Reporter: Radoslav Husar
>            Assignee: Bela Ban
>            Priority: Critical
>             Fix For: 4.0.17
>
>         Attachments: IMG_20190123_124154.jpg
>
>
> JGroups does not handle concurrent leaving of nodes correctly. This is a typical use case in cloud environment when scaled down with an autoscaler/manually which we need to handle.
> A simple test can be devised which fails first n (where n>1) nodes from a cluster, reproducer PR https://github.com/belaban/JGroups/pull/397



--
This message was sent by Atlassian Jira
(v7.12.1#712002)


More information about the jboss-jira mailing list