[
https://issues.jboss.org/browse/JGRP-1915?page=com.atlassian.jira.plugin....
]
Bela Ban commented on JGRP-1915:
--------------------------------
I don't see where you found that the discovery responses are sorted and the first
picked. As a matter of fact, in [1] we add all coordinators of the discovery responses to
a list (not a sorted set) and then iterate through that list. So whoever coordinator sent
their response first is asked to join the new member.
Note that I added shuffling of that list on line 104, so that crashed and not removed
coords are skipped randomly.
I'll see if I can remove crashed (via kill -9) coordinators automatically, but see [2]
for an explanation of why this is difficult. In a nutshell, if we have [A,B,C,D] and this
splits into [A,B] and [C,D], and new coords remove od coords' files, then A would
remove C.list and C would remove A.list.
[1]
https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/...
[2]
https://github.com/belaban/JGroups/blob/master/doc/design/CloudBasedDisco...
JDBC_PING discovery fails when stale entries are found in the
database
----------------------------------------------------------------------
Key: JGRP-1915
URL:
https://issues.jboss.org/browse/JGRP-1915
Project: JGroups
Issue Type: Bug
Affects Versions: 3.6.1
Reporter: Patrick Haas
Assignee: Bela Ban
Node: "CHQ-PATRICKH-55008"
Database contains two rows.. other node is dead but was unable to remove the JDBC entry.
1) JChannel.connect(...)
2) JChannel.down(Event[CONNECT_WITH_STATE_TRANSFER_USE_FLUSH])
3) STATE_TRANSFER -> FRAG2 -> MFC -> UFC -> GMS
4) GMS.down(...) calls out to joinWithStateTransfer -> joinInternal(...)
JDBC pulls the node list from the database table.
Ping Data:
- CHQ-PATRICKH-3895, name=CHQ-PATRICKH-3895, addr=10.1.130.228:55503, server
- CHQ-PATRICKH-55008, name=CHQ-PATRICKH-55008, addr=10.1.130.228:57489
joinInternal is a never-terminating while loop:
- down: Event.FIND_INITIAL_MBRS_EVT
- inspect responses -- no valid join responses
- responses are NOT empty -> does not become singletonMember
- gets all coordinators (none)
- Sorts all nodes by GUID in a TreeSet
- Is first of all joiners?
- No, another joiner is listed first
... repeat forever
When the process is restarted and a node ID < than the existing db entry is generated,
it successfully takes over as owner.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)