]
Bela Ban commented on JGRP-130:
-------------------------------
Using MPING (for multicast discovery) and setting the TCP.start_port to 0 (so the
operating system picks a port) might help avoiding this problem, until logical addresses
are available.
Problems with reincarnation
---------------------------
Key: JGRP-130
URL:
http://jira.jboss.com/jira/browse/JGRP-130
Project: JGroups
Issue Type: Feature Request
Affects Versions: 2.2.9
Reporter: Bela Ban
Assigned To: Bela Ban
Fix For: 2.6
Problems with reincarnation
===========================
Author: Bela Ban
Version: $Id$
The identity of a JGroups member is always the IP address and a port. The port is usually
chosen by the OS, unless
bind_port is set (not set by default).
Let's say a member's address is hostA:5000. When that member dies and is
restarted, the OS will likely assign a
higher port, say 5002. This depends on how many other processes requested a port in
between the start and restart
of the member.
JGroups relies on the fact that the assignment of ports by the OS is always (not
necessarily monotonically)
*increasing* across a single machine. If this is not the case, then the following
problems can occur:
1. Restart:
When a member P crashes and then is restarted, if FD is used and P is restarted *before*
it is excluded,
then we have a new member *under the same old address* ! Since it lost all of its state
(e.g. retransmission table),
retransmission requests sent to the new P will fail.
2. Shunning:
Regarding shunning: a member keeps its last N (default is 100) ports used, and makes sure
it doesn't reuse one of
those already-used ports when it is shunned. However, this is process-wide and *not*
machine-wide, e.g. when we have
processes P1 on A:5000 and P2 on A:5002 (on machine A), and both of them are shunned at
the same time,
when they rejoin, P1 does not use port 5000, but might use port 5002, and P2 doesn't
use 5002, but might use 5000, so
they could assume each other's identity !
Both problems cannot be solved by remembering the last 100 ports: in case #1, this list
is lost because we start a
new process and in case #2, the list is process-wide, but not machine-wide.
Again, these problems occur *only* when the OS reuses previously assigned ports.
SOLUTION:
A: Use temporary storage (per host) to store the last N addresses assigned on a given
host. This makes sure we
don't reuse previous addresses
B: Use logical addresses, such as java.rmi.VMID or java.rmi.server.UID, which are unique
over time for a given host.
Then, it doesn't matter what ports we use because the ports are not used to determine
a member's identity.
The JIRA task for logical addresses is
http://jira.jboss.com/jira/browse/JGRP-129.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: