[jboss-jira] [JBoss JIRA] (JGRP-2253) FD_SOCK is not working in AWS environment
Sibin Karnavar (JIRA)
issues at jboss.org
Wed May 2 18:06:00 EDT 2018
[ https://issues.jboss.org/browse/JGRP-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570045#comment-13570045 ]
Sibin Karnavar commented on JGRP-2253:
--------------------------------------
Looks like I am unable to attach my complete logs. I was getting an error like 'token missing' while uploading my log file.
During Startup:
monitoringInterval: 30000 and queueSizeAlertThreshold: 100
2018-05-02 21:52:16.017 TRACE 1108 --- [jgroups-13,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: who-has-sock ip-10-91-133-143-60500
2018-05-02 21:52:16.021 TRACE 1108 --- [jgroups-13,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: received SUSPECT message from ip-10-91-135-48-43121: suspects=[ip-10-91-133-143-60500]
2018-05-02 21:52:16.022 TRACE 1108 --- [jgroups-13,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: received SUSPECT message from ip-10-91-135-48-43121: suspects=[ip-10-91-137-163-65450]
2018-05-02 21:52:16.024 TRACE 1108 --- [jgroups-3,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: received UNSUSPECT message from ip-10-91-135-48-43121: mbrs=[ip-10-91-133-143-60500]
2018-05-02 21:52:16.025 TRACE 1108 --- [jgroups-4,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: received UNSUSPECT message from ip-10-91-135-48-43121: mbrs=[ip-10-91-137-163-65450]
2018-05-02 21:52:16.030 TRACE 1108 --- [jgroups-17,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: received UNSUSPECT message from ip-10-91-137-163-65450: mbrs=[ip-10-91-133-143-60500]
2018-05-02 21:52:16.030 INFO 1108 --- [localhost-startStop-1] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-91-137-163-65450|2] (3) [ip-10-91-137-163-65450, ip-10-91-135-48-43121, ip-10-91-133-143-60500]
2018-05-02 21:52:16.030 INFO 1108 --- [localhost-startStop-1] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-05-02 21:52:16.031 INFO 1108 --- [localhost-startStop-1] c.w.s.c.ServiceClusterCoordinator : - - ip-10-91-137-163-65450 is the master for service SOM-SKS
2018-05-02 21:52:16.032 INFO 1108 --- [localhost-startStop-1] c.w.s.c.ServiceClusterCoordinator : - - Successfully connected to clusterName: 'SOM-SKS-stage_SJX_290420181509_XSJ'
2018-05-02 21:52:16.032 TRACE 1108 --- [FD_SOCK pinger-14,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: got cache from ip-10-91-137-163-65450: cache is ip-10-91-133-143-60500: 10.91.133.143:7804 (0 ms old)
ip-10-91-135-48-43121: 10.91.135.48:7804 (0 ms old)
ip-10-91-137-163-65450: 10.91.137.163:7804 (0 ms old)
2018-05-02 21:52:16.032 TRACE 1108 --- [FD_SOCK pinger-14,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: pinger_thread started
2018-05-02 21:52:16.033 DEBUG 1108 --- [FD_SOCK pinger-14,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: pingable_mbrs=[ip-10-91-137-163-65450, ip-10-91-135-48-43121, ip-10-91-133-143-60500], ping_dest=ip-10-91-137-163-65450
2018-05-02 21:52:16.035 TRACE 1108 --- [FD_SOCK pinger-14,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: ping_dest=ip-10-91-137-163-65450, ping_sock=Socket[addr=/10.91.137.163,port=7804,localport=7805], cache=ip-10-91-133-143-60500: 10.91.133.143:7804 (2 ms old)
ip-10-91-135-48-43121: 10.91.135.48:7804 (2 ms old)
ip-10-91-137-163-65450: 10.91.137.163:7804 (2 ms old)
{color:red}Below logs is after terminating the master node from AWS console. I have terminated the EC2 instance. This logs are printed only post 13 seconds after killing the other node.{color}
2018-05-02 21:54:40.040 TRACE 1108 --- [jgroups-17,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.VERIFY_SUSPECT : - - verifying that [ip-10-91-137-163-65450] is dead
2018-05-02 21:54:43.041 TRACE 1108 --- [VERIFY_SUSPECT.TimerThread-19,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.VERIFY_SUSPECT : - - [ip-10-91-137-163-65450] is dead (passing up SUSPECT event)
2018-05-02 21:54:43.202 DEBUG 1108 --- [FD_SOCK pinger-14,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: socket to ip-10-91-137-163-65450 was closed gracefully
2018-05-02 21:54:43.203 DEBUG 1108 --- [FD_SOCK pinger-14,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: pingable_mbrs=[ip-10-91-135-48-43121, ip-10-91-133-143-60500], ping_dest=ip-10-91-135-48-43121
2018-05-02 21:54:43.203 DEBUG 1108 --- [FD_SOCK pinger-14,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: failed connecting to ip-10-91-135-48-43121: Address already in use (Bind failed)
2018-05-02 21:54:43.203 DEBUG 1108 --- [FD_SOCK pinger-14,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: broadcasting suspect(ip-10-91-135-48-43121)
2018-05-02 21:54:43.203 TRACE 1108 --- [jgroups-20,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: received SUSPECT message from ip-10-91-133-143-60500: suspects=[ip-10-91-135-48-43121]
2018-05-02 21:54:43.204 DEBUG 1108 --- [jgroups-20,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: suspecting [ip-10-91-135-48-43121]
2018-05-02 21:54:43.204 TRACE 1108 --- [FD_SOCK pinger-14,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: pinger thread terminated
2018-05-02 21:54:43.206 TRACE 1108 --- [jgroups-20,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.VERIFY_SUSPECT : - - verifying that [ip-10-91-135-48-43121] is dead
2018-05-02 21:54:43.208 TRACE 1108 --- [jgroups-20,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.VERIFY_SUSPECT : - - member ip-10-91-135-48-43121 was unsuspected
2018-05-02 21:54:43.208 DEBUG 1108 --- [jgroups-20,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: broadcasting unsuspect(ip-10-91-135-48-43121)
2018-05-02 21:54:43.208 TRACE 1108 --- [jgroups-18,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] org.jgroups.protocols.FD_SOCK : - - ip-10-91-133-143-60500: received UNSUSPECT message from ip-10-91-133-143-60500: mbrs=[ip-10-91-135-48-43121]
2018-05-02 21:54:43.208 INFO 1108 --- [jgroups-17,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-91-135-48-43121|3] (2) [ip-10-91-135-48-43121, ip-10-91-133-143-60500]
2018-05-02 21:54:43.209 INFO 1108 --- [jgroups-17,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-05-02 21:54:43.209 INFO 1108 --- [jgroups-17,SOM-SKS-stage_SJX_290420181509_XSJ,ip-10-91-133-143-60500] c.w.s.c.ServiceClusterCoordinator : - - ip-10-91-135-48-43121 is the master for service SOM-SKS
> FD_SOCK is not working in AWS environment
> -----------------------------------------
>
> Key: JGRP-2253
> URL: https://issues.jboss.org/browse/JGRP-2253
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.10
> Environment: AWS - EC2
> Reporter: Sibin Karnavar
> Assignee: Bela Ban
> Fix For: 4.0.12
>
>
> We have our failure detection defined like below.
> <FD_SOCK external_port="7804" />
> <FD timeout="3000" max_tries="3" />
> <VERIFY_SUSPECT timeout="3000" />
> Please note that we have used FD instead of FD_ALL in AWS. We will be changing it to FD_ALL later after detailed testing.
> In my local, this is working perfect. As soon as I kill my node, I was able to see that view change was happening immediately with FD_SOCK.
> We were not mentioning the external_port in the FD_SOCK but later I thought it may be an issue with the port and defined it as 7804 and added the same port to the security group that allows to access this port among all the nodes. So no issue with the port.
> Can you please let us know if we need any additional configurations to make FD_SOCK works well in AWS.
> Thanks,
> Sibin
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
More information about the jboss-jira
mailing list