]
Bela Ban resolved JGRP-1422.
----------------------------
Resolution: Done
TP: reduce "no physical address for X; dropping message"
warnings
-----------------------------------------------------------------
Key: JGRP-1422
URL:
https://issues.jboss.org/browse/JGRP-1422
Project: JGroups
Issue Type: Enhancement
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 3.0.5, 3.1
This warning is caused by A attempting to send a unicast message to B, but the physical
(IP) address of B is not in the cache (TP.logical_addr_cache).
The logical_addr_cache is populated at startup, during the discovery phase. However, when
we have 20 members and (PING.)num_initial_members is 3, we'll return after 3
responses.
If we for example have {A,B,C,D,E,F,G,H} with A being the coordinator, a new joiner X
returns after reception of (say) discovery responses of F, B and G. If X next tries to
send a JOIN request to A, it will fail and drop the (unicast) request as the IP address
for A is not yet present.
Event worse: if X attempts to invoke a unicast RPC to any member P for which it
doesn't have the IP address, and does *not* send any more messages (e.g. in a separate
thread to P), then the RPC will timeout, unless we use UNICAST, which - using a positive
ack scheme - keeps retransmitting the request until P sends an ack.
The problem is that - when we don't have an IP address for P - we send a discovery
request to fetch the IP address, but *drop* the current request.
There are 2 levels at which we can fix this problem:
#1 Make sure we receive at least the IP address of the coordinator at startup
This is done by making sure (in the above example) that A's IP address is part of the
response set before we return from the discovery phase. Note that when we send a discovery
request, everybody will reply, but if we return after reception of the replies from B, F
and G, we won't have the coordinator's address to send a JOIN request to. So #1
makes sure that we only return after having the coordinator's IP address.
Note that this can still lead to problems when trying to send a unicast message to a
different member, whose IP address we don't yet have ! This is solved in #2 below.
#2 When asking for the IP address of P, don't drop the current message to P, but loop
for a short time until the address has been fetched
We don't block here, but simply loop for a limited time, in order to wait for the IP
address. In most cases, this is not even necessary because #1 reduces the chances of an IP
address not being available, but if it is, usually fetching an IP address takes a few
millisconds.
Looping just reduces the chances that we have to run into a timeout with a blocking
unicast RPC, or wait until stability flushes the pending unicast, causing it to be
retransmitted.
Note that we increase the wait time on every loop iteration, to prevent discovery storms.
Plus, we also stagger the discovery request: if 2 threads T1 and T2 trigger a discovery at
time 30 and 120 respectively, then T2 will *not* send a discovery request, as it also has
the IP address by means of the discovery request triggered by T1.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: