]
Sibin Karnavar commented on JGRP-2245:
--------------------------------------
I have done a quick test on my local box. Initial test results are looking good.
I can see that, its Removing the left members when the coordinator changes. and I can see
that it writes only the current members to DB on view change. I can also see that, when a
node crashes, the crashed members are still in the database till the co coordinator
changes (That looks fine to me)
I will be deploying it in AWS and doing some more tests cases.
Thank You
JGroup JDBC_PING is not clearing the crashed members
----------------------------------------------------
Key: JGRP-2245
URL:
https://issues.jboss.org/browse/JGRP-2245
Project: JGroups
Issue Type: Bug
Affects Versions: 4.0.8
Reporter: Sibin Karnavar
Assignee: Bela Ban
Fix For: 4.0.10
1) In AWS cloud environments, IP address will be different when a node crashes and when a
new cluster node gets recreated.
2) In this situation, JGroup is not clearing logical_addr_cache and it gets confused,
when we restart the cluster nodes.
3)logical_addr_cache_max_size and the eviction did not work because, the cache is again
getting updated from the ping and it never getting marked as removable.
I think the issue is
handleView method is always re writing the entire cache on view change to the db. So even
if we clear the table with the help of above mentioned flags
(remove_all_data_on_view_change && remove_old_coords_on_view_change) , its getting
re written to the table.
{code:java}
// remove all files which are not from the current members
protected void handleView(View new_view, View old_view, boolean coord_changed) {
if(is_coord) {
if(coord_changed) {
if(remove_all_data_on_view_change)
removeAll(cluster_name);
else if(remove_old_coords_on_view_change) {
Address old_coord=old_view != null? old_view.getCreator() : null;
if(old_coord != null)
remove(cluster_name, old_coord);
}
}
if(coord_changed || View.diff(old_view, new_view)[1].length > 0) {
writeAll();
if(remove_all_data_on_view_change || remove_old_coords_on_view_change)
startInfoWriter();
}
}
else if(coord_changed) // I'm no longer the coordinator
remove(cluster_name, local_addr);
}
{code}
4) Because of the crashed members (non existing ip address), we are getting lot of
socket timeouts
sendToMembers of TP is trying to send messages to old crashed members and writing error
logs while startup.