[
https://issues.jboss.org/browse/JGRP-1915?page=com.atlassian.jira.plugin....
]
Patrick Haas commented on JGRP-1915:
------------------------------------
I understand that this situation looks identical to a network partition. What bothers me
is that the behavior is random. New groups are formed solely based on the random GUID
that's generated.
I'd expect one of two outcomes:
- A new node always fails to connect when a stale entry remains in the database
- A new node always creates a new cluster, after trying to contact the existing nodes a
sufficient number of times (i.e. when max_retries is reached, form a group with a sole
member and advertise it).
I found the implemented behavior very surprising. Is there some larger deterministic
algorithm that I'm not seeing?
JDBC_PING discovery fails when stale entries are found in the
database
----------------------------------------------------------------------
Key: JGRP-1915
URL:
https://issues.jboss.org/browse/JGRP-1915
Project: JGroups
Issue Type: Bug
Affects Versions: 3.6.1
Reporter: Patrick Haas
Assignee: Bela Ban
Node: "CHQ-PATRICKH-55008"
Database contains two rows.. other node is dead but was unable to remove the JDBC entry.
1) JChannel.connect(...)
2) JChannel.down(Event[CONNECT_WITH_STATE_TRANSFER_USE_FLUSH])
3) STATE_TRANSFER -> FRAG2 -> MFC -> UFC -> GMS
4) GMS.down(...) calls out to joinWithStateTransfer -> joinInternal(...)
JDBC pulls the node list from the database table.
Ping Data:
- CHQ-PATRICKH-3895, name=CHQ-PATRICKH-3895, addr=10.1.130.228:55503, server
- CHQ-PATRICKH-55008, name=CHQ-PATRICKH-55008, addr=10.1.130.228:57489
joinInternal is a never-terminating while loop:
- down: Event.FIND_INITIAL_MBRS_EVT
- inspect responses -- no valid join responses
- responses are NOT empty -> does not become singletonMember
- gets all coordinators (none)
- Sorts all nodes by GUID in a TreeSet
- Is first of all joiners?
- No, another joiner is listed first
... repeat forever
When the process is restarted and a node ID < than the existing db entry is generated,
it successfully takes over as owner.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)