[jboss-jira] [JBoss JIRA] (JGRP-1944) jgroups does not recover properly when using UDP after ifdown / ifup

Tue Aug 4 04:57:02 EDT 2015

    [ https://issues.jboss.org/browse/JGRP-1944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095280#comment-13095280 ] 

Bram Klein Gunnewiek commented on JGRP-1944:
--------------------------------------------

I will try to figure out what part of the ifup / ifdown scripts cause this behavior on bridge devices. We could fix the problem in the scripts, problem is other parties using our software on regular Linux installations might run into the same problem. Only thing we can currently do is give them a big fat warning that our software needs to be restarted when something like this has been done.

I understand its a difficult one to fix in JGroups especially if you have to rely on a pretty generic IOException. Maybe I can think of a way to detect these problems in a custom protocol or something and pass it up so I can restart the JGroups cluster in our application.

In https://issues.jboss.org/browse/JGRP-1804 changes where made to the UDP protocol that now mask the problems that occur after ifdown / ifup (it had the same kind of Exceptions as the MPING protocol). I think the changes made because of that issue should be reverted since complete silence is even worse?

> jgroups does not recover properly when using UDP after ifdown / ifup
> --------------------------------------------------------------------
>
>                 Key: JGRP-1944
>                 URL: https://issues.jboss.org/browse/JGRP-1944
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.6.4
>         Environment: Linux Ubutun 14.04 where the network cards are configured as bridges:
> auto bridge0
>     iface bridge0 inet dhcp
>     bridge_ports eth1
>     bridge_stp off
>     bridge_fd 0
>            Reporter: Bram Klein Gunnewiek
>            Assignee: Bela Ban
>             Fix For: 3.6.5
>
>         Attachments: AutoRecoverMulticast.java
>
>
>  When we bring the interface down and back up in a complete (udp.xml) configuration everything *seems* to be fine, however multicast traffic from the node that had the interface brought down is not received by other nodes. The node also doesn't receive any data from the other nodes. No exceptions are logged. I don't think the previous test was done correctly by me ... sorry .
> When we use TCP + MPING we see the stacktraces we had previously with UDP:
> 12:13:51.624 50644 [Timer-3,debug,shockvm-tn3-42192] ERROR unknown.jul.logger - failed sending discovery request
> java.io.IOException: Invalid argument
>         at java.net.PlainDatagramSocketImpl.send(Native Method) ~[na:1.7.0_79]
>         at java.net.DatagramSocket.send(DatagramSocket.java:697) ~[na:1.7.0_79]
>         at org.jgroups.protocols.MPING.sendMcastDiscoveryRequest(MPING.java:295) ~[jar:rsrc:jgroups-3.6.4.Final.jar!/:na]
>         at org.jgroups.protocols.PING.sendDiscoveryRequest(PING.java:61) [jar:rsrc:jgroups-3.6.4.Final.jar!/:na]
>         at org.jgroups.protocols.PING.findMembers(PING.java:31) [jar:rsrc:jgroups-3.6.4.Final.jar!/:na]
>         at org.jgroups.protocols.Discovery.findMembers(Discovery.java:244) [jar:rsrc:jgroups-3.6.4.Final.jar!/:na]
>         at org.jgroups.protocols.Discovery.down(Discovery.java:387) [jar:rsrc:jgroups-3.6.4.Final.jar!/:na]
>         at org.jgroups.protocols.MERGE3$InfoSender.run(MERGE3.java:382) [jar:rsrc:jgroups-3.6.4.Final.jar!/:na]
>         at org.jgroups.util.TimeScheduler3$Task.run(TimeScheduler3.java:287) [jar:rsrc:jgroups-3.6.4.Final.jar!/:na]
>         at org.jgroups.util.TimeScheduler3$RecurringTask.run(TimeScheduler3.java:321) [jar:rsrc:jgroups-3.6.4.Final.jar!/:na]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
> (The exact message differs whether or not the -Djava.net.preferIPv4Stack=true argument is configured)
> A configuration that uses MPING also doesn't recover from ifdown/ifup.

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)