[jboss-jira] [JBoss JIRA] Commented: (JGRP-130) Problems with reincarnation

Fri Aug 11 04:12:13 EDT 2006

    [ http://jira.jboss.com/jira/browse/JGRP-130?page=comments#action_12341056 ] 

Ortwin Glück commented on JGRP-130:
-----------------------------------

This bug is annoying our team very much. We have 10 developers running a local JBoss instance for development. They need to listen on their LAN address, as they need to be accessed via the Internet for testing. We also have an integration test machine in the same network, which is running 24/7. The developers may kill their JBoss instances every now and then because of the EJB3 redeployment memory leak. So we see this bug a lot!

The problem is everytime, to find the JGroups coordinator node (which is mostly the integration test machine as it's the one running the for longest time) and restart that. That's really, really annoying. Please fix this bug earlier than planned.

> Problems with reincarnation
> ---------------------------
>
>                 Key: JGRP-130
>                 URL: http://jira.jboss.com/jira/browse/JGRP-130
>             Project: JGroups
>          Issue Type: Feature Request
>    Affects Versions: 2.2.9
>            Reporter: Bela Ban
>         Assigned To: Bela Ban
>             Fix For: 2.5
>
>
> Problems with reincarnation
> ===========================
> Author: Bela Ban
> Version: $Id$
> The identity of a JGroups member is always the IP address and a port. The port is usually chosen by the OS, unless
> bind_port is set (not set by default).
> Let's say a member's address is hostA:5000. When that member dies and is restarted, the OS will likely assign a
> higher port, say 5002. This depends on how many other processes requested a port in between the start and restart
> of the member.
> JGroups relies on the fact that the assignment of ports by the OS is always (not necessarily monotonically)
> *increasing* across a single machine. If this is not the case, then the following problems can occur:
> 1. Restart:
> When a member P crashes and then is restarted, if FD is used and P is restarted *before* it is excluded,
> then we have a new member *under the same old address* ! Since it lost all of its state (e.g. retransmission table),
> retransmission requests sent to the new P will fail.
> 2. Shunning:
> Regarding shunning: a member keeps its last N (default is 100) ports used, and makes sure it doesn't reuse one of
> those already-used ports when it is shunned. However, this is process-wide and *not* machine-wide, e.g. when we have
> processes P1 on A:5000 and P2 on A:5002 (on machine A), and both of them are shunned at the same time,
> when they rejoin, P1 does not use port 5000, but might use port 5002, and P2 doesn't use 5002, but might use 5000, so
> they could assume each other's identity !
> Both problems cannot be solved by remembering the last 100 ports: in case #1, this list is lost because we start a
> new process and in case #2, the list is process-wide, but not machine-wide.
> Again, these problems occur *only* when the OS reuses previously assigned ports.
> SOLUTION:
> A: Use temporary storage (per host) to store the last N addresses assigned on a given host. This makes sure we
> don't reuse previous addresses
> B: Use logical addresses, such as java.rmi.VMID or java.rmi.server.UID, which are unique over time for a given host.
> Then, it doesn't matter what ports we use because the ports are not used to determine a member's identity.
> The JIRA task for logical addresses is http://jira.jboss.com/jira/browse/JGRP-129. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira