[jboss-jira] [JBoss JIRA] (JGRP-1956) S3_PING / FILE_PING: remove failed members

Wed Sep 2 02:48:05 EDT 2015

    [ https://issues.jboss.org/browse/JGRP-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104412#comment-13104412 ] 

Bela Ban edited comment on JGRP-1956 at 9/2/15 2:47 AM:
--------------------------------------------------------

Can you try with the following attributes enabled: ?
* {{remove_old_coords_on_view_change}}
* {{remove_all_files_on_view_change}}

The reason old members are not immediately removed is that these members could have been split away, in a network partition, rather than crashed. If we want a merge to succeed in such a case, it is better to leave information about them in the store.
Note that {{TP.logical_addr_cache_max_size}} and {{TP.logical_addr_cache_expiration}} govern when stale entries will be removed. By default, you won't have more than 2000 stale elements in the cache.
Take a look at https://issues.jboss.org/browse/JGRP-1917 for details.
Doc: http://www.jgroups.org/manual/index.html#FILE_PING (removal of zombie files)

was (Author: belaban):
Can you try with the following attributes enabled: ?
* {{remove_old_coords_on_view_change}}
* {{remove_all_files_on_view_change}}

The reason old members are not immediately removed is that these members could have been split away, in a network partition, rather than crashed. If we want a merge to succeed in such a case, it is better to leave information about them in the store.
Note that {{TP.logical_addr_cache_max_size}} and {{TP.logical_addr_cache_expiration}} govern when stale entries will be removed. By default, you won't have more than 2000 stale elements in the cache.
Take a look at https://issues.jboss.org/browse/JGRP-1917 for details

> S3_PING / FILE_PING: remove failed members
> ------------------------------------------
>
>                 Key: JGRP-1956
>                 URL: https://issues.jboss.org/browse/JGRP-1956
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.6.4
>            Reporter: Karsten Ohme
>            Assignee: Bela Ban
>             Fix For: 3.6.5
>
>
> When we terminate a member (EC2's "terminate" function) or kill -9 it, then the file (or bucket data in S3) won't get removed. This leads to stale data. On EC2, I expect that virtualized instances are often simply terminated, so this problem is compounded there.
> SOLUTION:
> - Periodically write own data to the file system (FILE_PING) or S3 (S3_PING)
> - On a view change: remove all data that's not in the current view

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)