]
Bogdan Sikora commented on MODCLUSTER-487:
------------------------------------------
There are warnings in eap log when using local-link (ffx2) multicast address
{noformat}
[2016-05-24 08:55:34,518 WARN [org.jboss.modcluster] (ServerService Thread Pool -- 64)
MODCLUSTER000031: Could not bind multicast socket to /ff02:0:0:0:0:0:0:a (IPv6 address):
Invalid argument; make sure your multicast address is of the same type as the IP stack
(IPv4 or IPv6). Multicast socket will not be bound to an address, but this may lead to
cross talking (see
for details).
2016-05-24 08:55:41,790 WARN [org.jgroups.protocols.UDP] (MSC service thread 1-8) could
not bind to /ff02:0:0:0:0:0:0:15 (IPv6 address); make sure your mcast_addr is of the same
type as the preferred IP stack (IPv4 or IPv6) by checking the value of the system
properties java.net.preferIPv4Stack and java.net.preferIPv6Addresses.]
{noformat}
When changed multicast address of modcluster to site-local (ffx5).
Modcluster warning has disappear
{noformat}
04:36:51,745 INFO [org.jboss.modcluster] (ServerService Thread Pool -- 64)
MODCLUSTER000001: Initializing mod_cluster version 1.3.2.Final-redhat-1
04:36:51,815 INFO [org.jboss.modcluster] (ServerService Thread Pool -- 64)
MODCLUSTER000032: Listening to proxy advertisements on /ff05:0:0:0:0:0:0:a:23364
04:36:52,328 INFO [org.jboss.as.connector.subsystems.datasources] (MSC service thread
1-2) WFLYJCA0001: Bound data source [java:jboss/datasources/ExampleDS]
04:36:52,767 INFO [org.jboss.as.server.deployment.scanner] (MSC service thread 1-1)
WFLYDS0013: Started FileSystemDeploymentService for directory
/mnt/hudson_workspace/mod_cluster-eap7/jboss-eap-7.0/standalone/deployments
04:36:52,806 INFO [org.jboss.as.server.deployment] (MSC service thread 1-8) WFLYSRV0027:
Starting deployment of "clusterbench.war" (runtime-name:
"clusterbench.war")
04:36:53,565 INFO [org.jboss.ws.common.management] (MSC service thread 1-5) JBWS022052:
Starting JBossWS 5.1.3.SP1-redhat-1 (Apache CXF 3.1.4.redhat-1)
04:36:53,654 WARN [org.jboss.metadata.parser.jbossweb.JBossWebMetaDataParser] (MSC
service thread 1-6) <replication-trigger/> is no longer supported and will be
ignored
04:36:55,584 WARN [org.jgroups.protocols.UDP] (MSC service thread 1-8) could not bind to
/ff02:0:0:0:0:0:0:14 (IPv6 address); make sure your mcast_addr is of the same type as the
preferred IP stack (IPv4 or IPv6) by checking the value of the system properties
java.net.preferIPv4Stack and java.net.preferIPv6Addresses.
{noformat}
Default AdvertiseBindAddress value should not be NULL (UDP Multicast
on Linux systems with more NICs)
-----------------------------------------------------------------------------------------------------
Key: MODCLUSTER-487
URL:
https://issues.jboss.org/browse/MODCLUSTER-487
Project: mod_cluster
Issue Type: Bug
Components: Native (httpd modules)
Affects Versions: 1.2.11.Final, 1.3.1.Final
Environment: Linux, multiple NICs environment
Reporter: Michal Karm Babacek
Assignee: Michal Karm Babacek
Priority: Critical
Attachments: advertise-linux3_x86_64.zip, advertise-windows_x86.zip,
Advertize.class
Credit where it's due: the issue was first spotted by [~rhatlapa].
h3. Problem
It appears that trying to send to all interfaces with {{NULL}} or {{"0.0.0.0"}}
-- the default {{bindaddr}} when no {{AdvertiseBindAddress}} is set -- in the following
statement actually picks the first non-loopback interface and sends to it.
{code}
if ((rv = apr_sockaddr_info_get(&ma_listen_sa, bindaddr,
ma_mgroup_sa->family, bindport,
APR_UNSPEC, pool)) != APR_SUCCESS) {
ap_log_error(APLOG_MARK, APLOG_ERR, rv, s,
"mod_advertise: ma_group_join apr_sockaddr_info_get(%s:%d)
failed", bindaddr, bindport);
{code}
The result is that there is no datagram on other interfaces. Surprisingly, this is not
deterministic though: After dozens or hundreds of messages, eventually one datagram
reaches another interface.
h3. Impact
Picture this simple scenario: There are two interfaces, e.g.
{noformat}
enp1s0 10.16.88.187
enp2s0 172.18.0.1
{noformat}
listed in this exact order with {{ip addr show}}.
One has an EAP 7 (Wildfly 10) instance with mod_cluster bound to {{172.18.0.1}} IP
address, which implies {{enp2s0}} interface.
Furthermore, one has an Apache HTTP Server instance with mod_cluster bound to
{{172.18.0.1}} IP address, i.e. MCMP VirtualHost and main VirtualHost all Listen on this
IP address.
Result: Without advertising, using an explicit {{proxy-list}}, all is well. MCMP works,
requests work, balancing works.
On the other hand, relying on advertisement, it could take EAP 7 (Wildfly 10) *minutes*
to register with the balancer.
The reason is that a vast majority of UDP Multicast datagrams arrives at enp1s0 and EAP 7
(Wildfly 10) doesn't see them.
h3. Reproducer
Lemme demonstrate with a recently refactored
[
advertise.c|https://github.com/Karm/mod_cluster/tree/advertise-native-tes...]
utility for sending datagrams and the well known
[
Advertize.java|https://raw.githubusercontent.com/modcluster/mod_cluster/m...]
utility for receiving them.
Your your convenience, here are binaries built from the aforementioned sources:
* Advertize java utility: [^Advertize.class]
* advertise native utility (Linux3 x86_64): [^advertise-linux3_x86_64.zip]
* advertise native utility (WIndows x86): [^advertise-windows_x86.zip]
h3. Demonstration on Linux
h4. System
{noformat}
[mbabacek@perf09 ~]$ ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
link/ether 00:18:8b:7a:46:04 brd ff:ff:ff:ff:ff:ff
inet 10.16.88.187/21 brd 10.16.95.255 scope global enp1s0
valid_lft forever preferred_lft forever
inet 10.16.93.253/21 brd 10.16.95.255 scope global secondary enp1s0
valid_lft forever preferred_lft forever
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
link/ether 00:18:8b:7a:46:05 brd ff:ff:ff:ff:ff:ff
inet 172.17.72.254/19 brd 172.17.95.255 scope global enp2s0
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
link/ether 02:42:07:ab:74:f9 brd ff:ff:ff:ff:ff:ff
inet 172.18.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
{noformat}
h4. Java
{noformat}
[mbabacek@perf09 ~]$ java -version
openjdk version "1.8.0_71"
OpenJDK Runtime Environment (build 1.8.0_71-b15)
OpenJDK 64-Bit Server VM (build 25.71-b15, mixed mode)
{noformat}
h4.Advertise SENT
{noformat}
[mbabacek@perf09 ~]$ date;./advertise -a 224.0.1.102 -p 33364
Mon Mar 21 12:39:51 EDT 2016
UDP Multicast address to send datagrams to. Value: 224.0.1.102
UDP Multicast port. Value: 33364
IP address of the NIC to bound to. Value: NULL
apr_socket_bind on 0.0.0.0:0
apr_mcast_join on 0.0.0.0:0
apr_socket_sendto to 224.0.1.102:33364
{noformat}
h4. Advertize RECEIVED
YES (/)
{noformat}
[mbabacek@perf09 ~]$ java Advertize 224.0.1.102 33364
Linux like OS
ready waiting...
received: Advertize !!! Mon, 21 Mar 2016 16:39:51 GMT
received from /10.16.88.187:38907
{noformat}
YES (/)
{noformat}
[mbabacek@perf09 ~]$ java Advertize 224.0.1.102 33364 10.16.88.187
Linux like OS
ready waiting...
received: Advertize !!! Mon, 21 Mar 2016 16:39:51 GMT
received from /10.16.88.187:38907
{noformat}
NO (x)
{noformat}
[mbabacek@perf09 ~]$ java Advertize 224.0.1.102 33364 172.17.72.254
Linux like OS
ready waiting...
{noformat}
YES (/)
{noformat}
[mbabacek@perf09 ~]$ java Advertize 224.0.1.102 33364 0.0.0.0
Linux like OS
ready waiting...
received: Advertize !!! Mon, 21 Mar 2016 16:39:51 GMT
received from /10.16.88.187:38907
{noformat}
And now let's take a look at {{172.17.72.254}}, i.e. {{enp2s0}}
h4. Advertise SENT
{noformat}
[mbabacek@perf09 ~]$ date;./advertise -a 224.0.1.102 -p 33364 -n 172.17.72.254
Mon Mar 21 12:42:57 EDT 2016
UDP Multicast address to send datagrams to. Value: 224.0.1.102
UDP Multicast port. Value: 33364
IP address of the NIC to bound to. Value: 172.17.72.254
apr_socket_bind on 172.17.72.254:0
apr_mcast_join on 172.17.72.254:0
apr_socket_sendto to 224.0.1.102:33364
{noformat}
h4. Advertize RECEIVED
NO (x)
{noformat}
[mbabacek@perf09 ~]$ java Advertize 224.0.1.102 33364
Linux like OS
ready waiting...
{noformat}
NO (x)
{noformat}
[mbabacek@perf09 ~]$ java Advertize 224.0.1.102 33364 10.16.88.187
Linux like OS
ready waiting...
{noformat}
YES (/)
{noformat}
[mbabacek@perf09 ~]$ java Advertize 224.0.1.102 33364 172.17.72.254
Linux like OS
ready waiting...
received: Advertize !!! Mon, 21 Mar 2016 16:42:57 GMT
received from /172.17.72.254:35452
{noformat}
NO (x)
{noformat}
[mbabacek@perf09 ~]$ java Advertize 224.0.1.102 33364 0.0.0.0
Linux like OS
ready waiting...
{noformat}
h3. Demonstration on Windows
One could note that the problem doesn't exist on Windows. All interfaces receive
advertising.
h4. Advertise SENT
{noformat}
C:\Users\karm\advertise-build
λ advertise.exe -a 224.0.1.102 -p 33364
UDP Multicast address to send datagrams to. Value: 224.0.1.102
UDP Multicast port. Value: 33364
IP address of the NIC to bound to. Value: NULL
apr_socket_bind on 0.0.0.0:0
apr_mcast_join on 0.0.0.0:0
apr_socket_sendto to 224.0.1.102:33364
{noformat}
h4. Advertize RECEIVED
YES (/)
{noformat}
C:\Users\karm\WORKSPACE
λ "C:\Program Files\Java\jdk1.8.0_74\bin\java" Advertize 224.0.1.102 33364
ready waiting...
received: Advertize !!! Mon, 21 Mar 2016 18:07:50 GMT
received from /192.168.122.52:61805
{noformat}
YES (/)
{noformat}
C:\Users\karm\WORKSPACE
λ "C:\Program Files\Java\jdk1.8.0_74\bin\java" Advertize 224.0.1.102 33364
192.168.122.52
ready waiting...
received: Advertize !!! Mon, 21 Mar 2016 18:07:50 GMT
received from /192.168.122.52:61805
{noformat}
YES (/)
{noformat}
C:\Users\karm\WORKSPACE
λ "C:\Program Files\Java\jdk1.8.0_74\bin\java" Advertize 224.0.1.102 33364
192.168.122.199
ready waiting...
received: Advertize !!! Mon, 21 Mar 2016 18:07:50 GMT
received from /192.168.122.52:61805
{noformat}
h4. Advertise SENT
{noformat}
C:\Users\karm\advertise-build
λ advertise.exe -a 224.0.1.102 -p 33364 -n 192.168.122.199
UDP Multicast address to send datagrams to. Value: 224.0.1.102
UDP Multicast port. Value: 33364
IP address of the NIC to bound to. Value: 192.168.122.199
apr_socket_bind on 192.168.122.199:0
apr_mcast_join on 192.168.122.199:0
apr_socket_sendto to 224.0.1.102:33364
{noformat}
h4. Advertize RECEIVED
YES (/)
{noformat}
C:\Users\karm\WORKSPACE
λ "C:\Program Files\Java\jdk1.8.0_74\bin\java" Advertize 224.0.1.102 33364
ready waiting...
received: Advertize !!! Mon, 21 Mar 2016 18:09:55 GMT
received from /192.168.122.199:52781
{noformat}
YES (/)
{noformat}
C:\Users\karm\WORKSPACE
λ "C:\Program Files\Java\jdk1.8.0_74\bin\java" Advertize 224.0.1.102 33364
192.168.122.52
ready waiting...
received: Advertize !!! Mon, 21 Mar 2016 18:09:55 GMT
received from /192.168.122.199:52781
{noformat}
YES (/)
{noformat}
C:\Users\karm\WORKSPACE
λ "C:\Program Files\Java\jdk1.8.0_74\bin\java" Advertize 224.0.1.102 33364
192.168.122.199
ready waiting...
received: Advertize !!! Mon, 21 Mar 2016 18:09:55 GMT
received from /192.168.122.199:52781
{noformat}
h3. Suggestion
Ideas? :) [~jfclere], [~rhusar]
I suggest setting {{bindaddr}} (AdvertiseBindAddress) default to main_server's
address or MCMP enabled vhost instead of NULL. I'll post a PR for evaluation.