[jboss-jira] [JBoss JIRA] (JGRP-2253) FD_SOCK is not working in AWS environment

Sibin Karnavar (JIRA) issues at jboss.org
Fri Mar 23 09:48:02 EDT 2018


     [ https://issues.jboss.org/browse/JGRP-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sibin Karnavar reopened JGRP-2253:
----------------------------------


I would like to reopen this case. In AWS environment sometimes the FD_SOCK  is not working when we kill the process using kill-9 or while terminating the EC2 instance from the console. 

Most of the times, It detects the connection breakage immediately with in 3 seconds post crashing the node ( e.g kill -9 -> not gracefully shutting down a node from the cluster). 

Sometimes, it do not detect immediately. it takes up to 12 to 13 seconds. So I am assuming that FD configurations are helping to detect the node failure instead of FD SOCK in this case.

There are no warnings related to FD SOCK in the log files.

TCP: Configurations

jgroup-ext-addr is the IP address of the each nodes.

<TCP 
             external_addr="${jgroup-ext-addr}"
             bind_addr="match-interface:eth0"
             bind_port="7803"
             port_range="0"
             diagnostics_port="7806"
             recv_buf_size="20M"
             send_buf_size="10M"
             max_bundle_size="64k"
             enable_diagnostics="true"
             thread_naming_pattern="cl"
 
             thread_pool.enabled="true"
             thread_pool.min_threads="2"
             thread_pool.max_threads="16"
             thread_pool.keep_alive_time="5000" />

FD protocols:

<FD_SOCK start_port="7804" external_addr="${jgroup-ext-addr}" bind_addr="match-interface:eth0" client_bind_port="7805"/>
<FD timeout="3000" max_tries="3" />
<VERIFY_SUSPECT timeout="3000" />

Thanks




> FD_SOCK is not working in AWS environment
> -----------------------------------------
>
>                 Key: JGRP-2253
>                 URL: https://issues.jboss.org/browse/JGRP-2253
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 4.0.10
>         Environment: AWS - EC2
>            Reporter: Sibin Karnavar
>            Assignee: Bela Ban
>
> We have our failure detection defined like below. 
>  <FD_SOCK  external_port="7804" />
>  <FD timeout="3000" max_tries="3" />
> <VERIFY_SUSPECT timeout="3000" />
> Please note that we have used FD instead of FD_ALL in AWS. We will be changing it to FD_ALL later after detailed testing.
> In my local, this is working perfect. As soon as I kill my node, I was able to see that view change was happening immediately with FD_SOCK.
> We were not mentioning the external_port in the FD_SOCK but later I thought it may be an issue with the port and defined it as 7804 and added the same port to the security group that allows to access this port among all the nodes.  So no issue with the port.
> Can you please let us know if we need any additional configurations to make FD_SOCK works well in AWS.
> Thanks,
> Sibin



--
This message was sent by Atlassian JIRA
(v7.5.0#75005)


More information about the jboss-jira mailing list