[jboss-jira] [JBoss JIRA] (JGRP-2245) JGroup JDBC_PING is not clearing the crashed members
Sibin Karnavar (JIRA)
issues at jboss.org
Fri Feb 2 08:38:00 EST 2018
[ https://issues.jboss.org/browse/JGRP-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527820#comment-13527820 ]
Sibin Karnavar commented on JGRP-2245:
--------------------------------------
I was trying to add this dependency through Maven but was not able to download 4.0.10.Final. Do I need to wait some more time to make it available to maven central?
> JGroup JDBC_PING is not clearing the crashed members
> ----------------------------------------------------
>
> Key: JGRP-2245
> URL: https://issues.jboss.org/browse/JGRP-2245
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.8
> Reporter: Sibin Karnavar
> Assignee: Bela Ban
> Fix For: 4.0.10
>
>
> 1) In AWS cloud environments, IP address will be different when a node crashes and when a new cluster node gets recreated.
> 2) In this situation, JGroup is not clearing logical_addr_cache and it gets confused, when we restart the cluster nodes.
> 3)logical_addr_cache_max_size and the eviction did not work because, the cache is again getting updated from the ping and it never getting marked as removable.
> I think the issue is
> handleView method is always re writing the entire cache on view change to the db. So even if we clear the table with the help of above mentioned flags (remove_all_data_on_view_change && remove_old_coords_on_view_change) , its getting re written to the table.
> {code:java}
> // remove all files which are not from the current members
> protected void handleView(View new_view, View old_view, boolean coord_changed) {
> if(is_coord) {
> if(coord_changed) {
> if(remove_all_data_on_view_change)
> removeAll(cluster_name);
> else if(remove_old_coords_on_view_change) {
> Address old_coord=old_view != null? old_view.getCreator() : null;
> if(old_coord != null)
> remove(cluster_name, old_coord);
> }
> }
> if(coord_changed || View.diff(old_view, new_view)[1].length > 0) {
> writeAll();
> if(remove_all_data_on_view_change || remove_old_coords_on_view_change)
> startInfoWriter();
> }
> }
> else if(coord_changed) // I'm no longer the coordinator
> remove(cluster_name, local_addr);
> }
> {code}
> 4) Because of the crashed members (non existing ip address), we are getting lot of socket timeouts
> sendToMembers of TP is trying to send messages to old crashed members and writing error logs while startup.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
More information about the jboss-jira
mailing list