[jboss-jira] [JBoss JIRA] (JGRP-2245) JGroup JDBC_PING is not clearing the crashed members

Fri Feb 2 08:38:00 EST 2018

    [ https://issues.jboss.org/browse/JGRP-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527820#comment-13527820 ] 

Sibin Karnavar commented on JGRP-2245:
--------------------------------------

I was trying to add this dependency through Maven but was not able to download 4.0.10.Final. Do I need to wait some more time to make it available to maven central?

> JGroup JDBC_PING is not clearing the crashed members
> ----------------------------------------------------
>
>                 Key: JGRP-2245
>                 URL: https://issues.jboss.org/browse/JGRP-2245
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 4.0.8
>            Reporter: Sibin Karnavar
>            Assignee: Bela Ban
>             Fix For: 4.0.10
>
>
> 1) In AWS cloud environments, IP address will be different when a node crashes and when a new cluster node gets recreated.
> 2) In this situation, JGroup is not clearing logical_addr_cache and it gets confused, when we restart the cluster nodes.
> 3)logical_addr_cache_max_size and the eviction did not work because, the cache is again getting updated from the ping and it never getting marked as removable.
> I think the issue is 
> handleView method is always re writing the entire cache on view change to the db. So even if we clear the table with the help of above mentioned flags (remove_all_data_on_view_change && remove_old_coords_on_view_change) , its getting re written to the table.
> {code:java}
>  // remove all files which are not from the current members
>     protected void handleView(View new_view, View old_view, boolean coord_changed) {
>         if(is_coord) {
>             if(coord_changed) {
>                 if(remove_all_data_on_view_change)
>                     removeAll(cluster_name);
>                 else if(remove_old_coords_on_view_change) {
>                     Address old_coord=old_view != null? old_view.getCreator() : null;
>                     if(old_coord != null)
>                         remove(cluster_name, old_coord);
>                 }
>             }
>             if(coord_changed || View.diff(old_view, new_view)[1].length > 0) {
>                 writeAll();
>                 if(remove_all_data_on_view_change || remove_old_coords_on_view_change)
>                     startInfoWriter();
>             }
>         }
>         else if(coord_changed) // I'm no longer the coordinator
>             remove(cluster_name, local_addr);
>     }
> {code}
> 4) Because of the crashed members  (non existing ip address), we are getting lot of socket timeouts
> sendToMembers of TP is trying to send messages to old crashed members and writing error logs while startup.

--
This message was sent by Atlassian JIRA
(v7.5.0#75005)