Radim Vansa created JGRP-1529:
---------------------------------
Summary: RELAY2: Intra-site view not being accepted upon inter-site
installation failure
Key: JGRP-1529
URL:
https://issues.jboss.org/browse/JGRP-1529
Project: JGroups
Issue Type: Bug
Reporter: Radim Vansa
Assignee: Bela Ban
When a node becomes coordinator, it sends the VIEW_CHANGE event up the stack. This should
result in Receiver.viewAccepted(...) method call. However, when RELAY2 is in stack and the
coordinator cannot be reached, it blocks the thread (sending discovery pings) and,
therefore, the viewAccepted event is postponed.
In my opinion the inter-site stack should be created and handled in different thread.
Context:
In my case, the coordinator for both local cluster and the global (inter-site) cluster was
killed. The FD_SOCK on inter-site stack somehow failed to notice that the coordinator has
crashed (more investigation required) and the nodes in global cluster still reported the
crashed node as the global coordinator.
Therefore, the new coordinator of local cluster failed to join the global cluster
(obviously got no response from the dead global coordinator).
The restarted node joined the local cluster and then tried to join the local Infinispan
cache with a new local view ID. However, the coordinator failed to notice (in Infinispan
viewAccepted handler which was not called) that he had already installed a new JGroups
view and it did not respond to the cache join request because it was waiting until it got
the new JGroups view (again, which was installed in JGroups but the viewAccepted did not
notified Infinispan about that).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira