[jboss-jira] [JBoss JIRA] (JGRP-1956) S3_PING / FILE_PING: remove failed members

Fri Aug 28 20:54:05 EDT 2015

    [ https://issues.jboss.org/browse/JGRP-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103401#comment-13103401 ] 

Karsten Ohme edited comment on JGRP-1956 at 8/28/15 8:53 PM:
-------------------------------------------------------------

This seems to be open again. My developer system is running on local host and is working in single mode.
When starting the server a new file in the S3 bucket with the server name plus a random number is created. When the server is restarted, this old address is read from the bucket and a new one is generated. So e.g. after 7 restarts there a 7 servers address stored in the bucket which are tried to be reached when the server is starting up to find other members. I have set the timeout to one second to limit the effect, but the server still tries to connect 10 times before it is switching to single mode.

The stale files should be removed somehow, also if the server is crashing or the method for the unique server name calculation should be deterministic. This was working with lower versions than 3.6.4

was (Author: k_o_):
This seems to be open again. When starting the server a new file in the S3 bucket with the single DNS name plus a random number is created. When the server is restarted, this old address is read from the bucket and a new one generated. After 7 restarts there a 7 servers address stored in the bucket which are tried to be reached. I have set the timeout to one second to limit the effect, but the server still tries to connect 10 times before it is switching to single mode.

The stale files should be removed somehow, also if the server is crashing or the method for the unique server name calculation should be deterministic. This was working with lower versions than 3.6.4

> S3_PING / FILE_PING: remove failed members
> ------------------------------------------
>
>                 Key: JGRP-1956
>                 URL: https://issues.jboss.org/browse/JGRP-1956
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.6.4
>            Reporter: Karsten Ohme
>            Assignee: Bela Ban
>
> When we terminate a member (EC2's "terminate" function) or kill -9 it, then the file (or bucket data in S3) won't get removed. This leads to stale data. On EC2, I expect that virtualized instances are often simply terminated, so this problem is compounded there.
> SOLUTION:
> - Periodically write own data to the file system (FILE_PING) or S3 (S3_PING)
> - On a view change: remove all data that's not in the current view

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)