[JBoss JIRA] (WFCORE-3590) Hang in ServerStartFailureTestCase
by Richard Opalka (JIRA)
[ https://issues.jboss.org/browse/WFCORE-3590?page=com.atlassian.jira.plugi... ]
Richard Opalka commented on WFCORE-3590:
----------------------------------------
"MSC service thread 1-8" was blocked because MSC queue executor have been terminated prematurely
and any attempt of its core thread to schedule new tasks (while executing current task) was rejected.
MSC ServiceControllerImpl source code forces current thread to handle RejectedExecutionException
by simply executing rejected tasks on its own.
In such scenario "MSC service thread 1-8" acquired "Lockable read lock" in one task in
rejected tasks queue and then it was trying to acquire "Lockable write lock" in another rejected task.
Since "MSC service thread 1-8" acquired read lock and because rejected tasks are chaining
the thread will not free read lock before moving to another task.
So thread will wait for read lock to be released forever.
The proper fix is to avoid queue executor shutdown until all scheduled tasks have been completed.
> Hang in ServerStartFailureTestCase
> ----------------------------------
>
> Key: WFCORE-3590
> URL: https://issues.jboss.org/browse/WFCORE-3590
> Project: WildFly Core
> Issue Type: Bug
> Components: Domain Management, Server
> Affects Versions: 4.0.0.Alpha9
> Reporter: Brian Stansberry
> Assignee: Richard Opalka
> Priority: Critical
> Attachments: WFCORE-3590-threads.txt
>
>
> Hang observed in https://ci.wildfly.org/viewLog.html?buildId=88611&buildTypeId=WildFlyCore...
> I'll attach the thread dump.
> [~dmlloyd] I assigned this to you mostly as a form of ping, as I want to talk to you about it and you are away today.
> Interesting parts of the thread dump:
> {code}
> "Thread-2" #11 prio=5 os_prio=0 tid=0xe13f0400 nid=0x4c49 waiting on condition [0xde4ed000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0xe5ea9de8> (a java.util.concurrent.CountDownLatch$Sync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> at org.jboss.as.server.BootstrapImpl$ShutdownHook.shutdown(BootstrapImpl.java:276)
> at org.jboss.as.server.BootstrapImpl$ShutdownHook.run(BootstrapImpl.java:240)
> "Controller Boot Thread" #25 prio=5 os_prio=0 tid=0xe0ca4c00 nid=0x4c35 waiting for monitor entry [0xdf3fe000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at java.lang.Shutdown.exit(Shutdown.java:212)
> - waiting to lock <0xe31d5e18> (a java.lang.Class for java.lang.Shutdown)
> at java.lang.Runtime.exit(Runtime.java:109)
> at java.lang.System.exit(System.java:971)
> at org.jboss.as.server.SystemExiter$DefaultExiter.exit(SystemExiter.java:117)
> at org.jboss.as.server.SystemExiter.logAndExit(SystemExiter.java:98)
> at org.jboss.as.server.ServerService.boot(ServerService.java:405)
> at org.jboss.as.controller.AbstractControllerService$1.run(AbstractControllerService.java:370)
> at java.lang.Thread.run(Thread.java:748)
> "MSC service thread 1-8" #20 prio=5 os_prio=0 tid=0x087b8c00 nid=0x4c2f in Object.wait() [0xe03ba000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xe32a3f28> (a org.jboss.msc.service.ServiceRegistrationImpl)
> at java.lang.Object.wait(Object.java:502)
> at org.jboss.msc.service.Lockable.acquireWrite(Lockable.java:97)
> at org.jboss.msc.service.ServiceControllerImpl$RemoveTask.execute(ServiceControllerImpl.java:1865)
> - locked <0xe32a3f28> (a org.jboss.msc.service.ServiceRegistrationImpl)
> at org.jboss.msc.service.ServiceControllerImpl$ControllerTask.run(ServiceControllerImpl.java:1527)
> at org.jboss.msc.service.ServiceControllerImpl.doExecute(ServiceControllerImpl.java:788)
> at org.jboss.msc.service.ServiceControllerImpl$ControllerTask.run(ServiceControllerImpl.java:1537)
> at org.jboss.msc.service.ServiceControllerImpl.doExecute(ServiceControllerImpl.java:788)
> at org.jboss.msc.service.ServiceControllerImpl$ControllerTask.run(ServiceControllerImpl.java:1537)
> at org.jboss.msc.service.ServiceControllerImpl.doExecute(ServiceControllerImpl.java:788)
> at org.jboss.msc.service.ServiceControllerImpl$ControllerTask.run(ServiceControllerImpl.java:1537)
> at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1979)
> at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1481)
> at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1374)
> at java.lang.Thread.run(Thread.java:748)
> "main" #1 prio=5 os_prio=0 tid=0xf6509000 nid=0x4c02 in Object.wait() [0xf6685000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xe32b58e8> (a org.jboss.as.server.BootstrapImpl$ShutdownHook)
> at java.lang.Thread.join(Thread.java:1252)
> - locked <0xe32b58e8> (a org.jboss.as.server.BootstrapImpl$ShutdownHook)
> at java.lang.Thread.join(Thread.java:1326)
> at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
> at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
> at java.lang.Shutdown.runHooks(Shutdown.java:123)
> at java.lang.Shutdown.sequence(Shutdown.java:167)
> at java.lang.Shutdown.exit(Shutdown.java:212)
> - locked <0xe31d5e18> (a java.lang.Class for java.lang.Shutdown)
> at java.lang.Runtime.exit(Runtime.java:109)
> at java.lang.System.exit(System.java:971)
> at org.jboss.as.server.SystemExiter$DefaultExiter.exit(SystemExiter.java:117)
> at org.jboss.as.server.SystemExiter.logAndExit(SystemExiter.java:98)
> at org.jboss.as.server.DomainServerMain.main(DomainServerMain.java:183)
> at java.lang.invoke.LambdaForm$DMH/7468253.invokeStatic_L_V(LambdaForm$DMH)
> at java.lang.invoke.LambdaForm$MH/7742980.invokeExact_MT(LambdaForm$MH)
> at org.jboss.modules.Module.runMainMethod(Module.java:348)
> at org.jboss.modules.Module.run(Module.java:328)
> at org.jboss.modules.Main.main(Main.java:557)
> {code}
> This is a domain server. The "main" thread has recognized that its ProcessController has closed its stdin, so it is shutting down via System.exit.
> "Thread-2" is running BootstrapImpl.ShutdownHook, waiting on a latch for the MSC ServiceContainer to complete termination. So the SC not completing termination is the basic issue.
> "Controller Boot Thread" is there because this termination occurred during boot. That caused some problem during boot (not surprising) so it is responding to that problem by trying to terminate the process, via System.exit. It's blocking waiting for "main" which has done the same. This thread should not be preventing MSC terminating though; it's not, for example called as part of a StartContext.asynchronous thing. IOW I don't think this thread is relevant to the problem.
> "MSC service thread 1-8" is the most interesting one to me. An MSC thread is blocked but it's not clear to me why. An interesting frame in the stack is org.jboss.msc.service.ServiceControllerImpl.doExecute(ServiceControllerImpl.java:788). That shows that ServiceControllerImpl$RemoveTask was passed to the executor but a RejectedExecutionException was thrown, so the task is being run from the thread that attempted to pass it to the executor. Should the MSC executor be rejecting tasks before all service controllers are removed?
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (WFLY-9375) META-INF/services are always accessible from dependencies
by Jan Stourac (JIRA)
[ https://issues.jboss.org/browse/WFLY-9375?page=com.atlassian.jira.plugin.... ]
Jan Stourac commented on WFLY-9375:
-----------------------------------
I have added 'affects release notes' as we should probably mention this fix as some users might have missing {{services="import"}} attribute in their jboss-deployment-structure as it has been working just fine without it until this fix. After this fix their applications might stop working/be broken, so I think it would be nice to make some note in release notes about this fix.
> META-INF/services are always accessible from dependencies
> ---------------------------------------------------------
>
> Key: WFLY-9375
> URL: https://issues.jboss.org/browse/WFLY-9375
> Project: WildFly
> Issue Type: Bug
> Components: Web (Undertow)
> Affects Versions: 10.0.0.Final, 10.1.0.Final, 11.0.0.CR1
> Reporter: Alexander Kudrevatykh
> Assignee: Stuart Douglas
> Fix For: 12.0.0.Beta1
>
>
> I found regression in Wildfly 10.1 (10.0 and 11 CR1 also affected) - META-INF/services entries are accesible from war dependencies, even if not marked as "import"
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (WFLY-9880) ServletContainerInitializer@onStartup is not invoked for deployment
by Jan Stourac (JIRA)
[ https://issues.jboss.org/browse/WFLY-9880?page=com.atlassian.jira.plugin.... ]
Jan Stourac commented on WFLY-9880:
-----------------------------------
Thank you, Stuart, you are right. After adding the {{services="import"}} attribute in jboss-deployment-structure.xml as you said, it starts to work.
I was able to find that this change in behaviour is becaus of this fix WFLY-9375. I think we should make some note in release notes about this change as some customers might have similarly broken deployments too. I'll put a comment in that original jira.
> ServletContainerInitializer@onStartup is not invoked for deployment
> -------------------------------------------------------------------
>
> Key: WFLY-9880
> URL: https://issues.jboss.org/browse/WFLY-9880
> Project: WildFly
> Issue Type: Bug
> Components: Web (Undertow)
> Affects Versions: 12.0.0.Beta1
> Reporter: Jan Stourac
> Assignee: Stuart Douglas
> Attachments: deployment-with-dep.war, module.xml, servlet-container-init.jar
>
>
> The {{ServletContainerInitializer@onStartup}} is not invoked for deployment. See {{Steps to Reproduce}}.
> Problematic part seems to be on [this line|https://github.com/undertow-io/undertow/blob/1706a5f41adb8f1a719617c...] in Undertow
> {code}
> //then run the SCI's
> ---> for (final ServletContainerInitializerInfo sci : deploymentInfo.getServletContainerInitializers()) {
> final InstanceHandle<? extends ServletContainerInitializer> instance = sci.getInstanceFactory().createInstance();
> try {
> instance.getInstance().onStartup(sci.getHandlesTypes(), servletContext);
> } finally {
> instance.release();
> }
> }
> {code}
> It looks like calling {{deploymentInfo.getServletContainerInitializers()}} does not include our deployment and simply ignores it.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (JGRP-2253) FD_SOCK is not working in AWS environment
by Sibin Karnavar (JIRA)
[ https://issues.jboss.org/browse/JGRP-2253?page=com.atlassian.jira.plugin.... ]
Sibin Karnavar edited comment on JGRP-2253 at 2/22/18 12:16 PM:
----------------------------------------------------------------
All timestamps are in UTC
Members in the cluster: ip-10-93-136-91, ip-10-93-133-149 and ip-10-93-135-215
+Node 0: (Master Node / Leader)+
ip-10-93-136-91
This node was the leader node and I have killed it at 2018-02-22 16:19:58.186 UTC time. If you see the FD_SOCK Trace timestamp, its not detecting the TCP socket connection break immediately.
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,ABC_test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
+_Node 1:_+
ip-10-93-133-149
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|17] (2) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458]
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - ip-10-93-135-215-41546 is the master for service ABC
2018-02-22 16:21:19.408 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.409 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
2018-02-22 16:21:19.434 TRACE 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.434 DEBUG 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:19.435 DEBUG 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:19.435 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.436 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.437 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: who-has-sock ip-10-93-133-149-13458
2018-02-22 16:21:49.429 DEBUG 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:49.430 DEBUG 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:49.430 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.431 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.432 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.437 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-136-91-22320:
mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.527 INFO 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: MergeView::[ip-10-93-135-215-41546|20] (3)
[ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320], 2 subgroups: [ip-10-93-136-91-22320|19] (1) [ip-10-93-136-91-22320], [ip-10-93-135-215-41546|18] (3) [ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-
91-22320]
+Node-2+
ip-10-93-135-215
2018-02-22 16:21:19.403 INFO 19074 --- [jgroups-24,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.426 TRACE 19074 --- [jgroups-27,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (1599 secs old)
)
2018-02-22 16:21:19.430 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.430 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting []
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.432 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting [ip-10-93-133-149-13458]
2018-02-22 16:21:19.433 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: broadcasting unsuspect(ip-10-93-133-149-13458)
2018-02-22 16:21:19.433 TRACE 19074 --- [jgroups-27,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
was (Author: sibin.karnavar):
All timestamps are in UTC
Members in the cluster: ip-10-93-136-91, ip-10-93-133-149 and ip-10-93-135-215
+Node 0: (Master Node / Leader)+
ip-10-93-136-91
This node was the leader node and I have killed it at 2018-02-22 16:19:58.186 UTC time. If you see the FD_SOCK Trace timestamp, its not detecting the TCP socket connection break immediately.
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,ABC_test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
+_Node 1:_+
ip-10-93-133-149
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|17] (2) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458]
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - ip-10-93-135-215-41546 is the master for service ABC
2018-02-22 16:21:19.408 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.409 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
2018-02-22 16:21:19.434 TRACE 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.434 DEBUG 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:19.435 DEBUG 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:19.435 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.436 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.437 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: who-has-sock ip-10-93-133-149-13458
2018-02-22 16:21:49.429 DEBUG 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:49.430 DEBUG 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:49.430 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.431 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.432 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.437 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-136-91-22320:
mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.527 INFO 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: MergeView::[ip-10-93-135-215-41546|20] (3)
[ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320], 2 subgroups: [ip-10-93-136-91-22320|19] (1) [ip-10-93-136-91-22320], [ip-10-93-135-215-41546|18] (3) [ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-
91-22320]
+
Node-2+
ip-10-93-135-215
2018-02-22 16:21:19.403 INFO 19074 --- [jgroups-24,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.426 TRACE 19074 --- [jgroups-27,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (1599 secs old)
)
2018-02-22 16:21:19.430 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.430 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting []
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.432 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting [ip-10-93-133-149-13458]
2018-02-22 16:21:19.433 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: broadcasting unsuspect(ip-10-93-133-149-13458)
2018-02-22 16:21:19.433 TRACE 19074 --- [jgroups-27,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
> FD_SOCK is not working in AWS environment
> -----------------------------------------
>
> Key: JGRP-2253
> URL: https://issues.jboss.org/browse/JGRP-2253
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.10
> Environment: AWS - EC2
> Reporter: Sibin Karnavar
> Assignee: Bela Ban
>
> We have our failure detection defined like below.
> <FD_SOCK external_port="7804" />
> <FD timeout="3000" max_tries="3" />
> <VERIFY_SUSPECT timeout="3000" />
> Please note that we have used FD instead of FD_ALL in AWS. We will be changing it to FD_ALL later after detailed testing.
> In my local, this is working perfect. As soon as I kill my node, I was able to see that view change was happening immediately with FD_SOCK.
> We were not mentioning the external_port in the FD_SOCK but later I thought it may be an issue with the port and defined it as 7804 and added the same port to the security group that allows to access this port among all the nodes. So no issue with the port.
> Can you please let us know if we need any additional configurations to make FD_SOCK works well in AWS.
> Thanks,
> Sibin
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (JGRP-2253) FD_SOCK is not working in AWS environment
by Sibin Karnavar (JIRA)
[ https://issues.jboss.org/browse/JGRP-2253?page=com.atlassian.jira.plugin.... ]
Sibin Karnavar edited comment on JGRP-2253 at 2/22/18 12:15 PM:
----------------------------------------------------------------
All timestamps are in UTC
Members in the cluster: ip-10-93-136-91, ip-10-93-133-149 and ip-10-93-135-215
+Node 0: (Master Node / Leader)+
ip-10-93-136-91
This node was the leader node and I have killed it at 2018-02-22 16:19:58.186 UTC time. If you see the FD_SOCK Trace timestamp, its not detecting the TCP socket connection break immediately.
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,ABC_test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
_Node 1:_
ip-10-93-133-149
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|17] (2) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458]
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - ip-10-93-135-215-41546 is the master for service ABC
2018-02-22 16:21:19.408 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.409 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
2018-02-22 16:21:19.434 TRACE 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.434 DEBUG 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:19.435 DEBUG 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:19.435 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.436 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.437 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: who-has-sock ip-10-93-133-149-13458
2018-02-22 16:21:49.429 DEBUG 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:49.430 DEBUG 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:49.430 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.431 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.432 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.437 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-136-91-22320:
mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.527 INFO 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: MergeView::[ip-10-93-135-215-41546|20] (3)
[ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320], 2 subgroups: [ip-10-93-136-91-22320|19] (1) [ip-10-93-136-91-22320], [ip-10-93-135-215-41546|18] (3) [ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-
91-22320]
+
Node-2+
ip-10-93-135-215
2018-02-22 16:21:19.403 INFO 19074 --- [jgroups-24,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.426 TRACE 19074 --- [jgroups-27,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (1599 secs old)
)
2018-02-22 16:21:19.430 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.430 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting []
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.432 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting [ip-10-93-133-149-13458]
2018-02-22 16:21:19.433 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: broadcasting unsuspect(ip-10-93-133-149-13458)
2018-02-22 16:21:19.433 TRACE 19074 --- [jgroups-27,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
was (Author: sibin.karnavar):
All timestamps are in UTC
Members in the cluster: ip-10-93-136-91, ip-10-93-133-149 and ip-10-93-135-215
Node 0: (Master Node / Leader)
ip-10-93-136-91
This node was the leader node and I have killed it at 2018-02-22 16:19:58.186 UTC time. If you see the FD_SOCK Trace timestamp, its not detecting the TCP socket connection break immediately.
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,ABC_test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
Node 1:
ip-10-93-133-149
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|17] (2) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458]
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - ip-10-93-135-215-41546 is the master for service ABC
2018-02-22 16:21:19.408 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.409 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
2018-02-22 16:21:19.434 TRACE 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.434 DEBUG 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:19.435 DEBUG 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:19.435 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.436 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.437 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: who-has-sock ip-10-93-133-149-13458
2018-02-22 16:21:49.429 DEBUG 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:49.430 DEBUG 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:49.430 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.431 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.432 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.437 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-136-91-22320:
mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.527 INFO 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: MergeView::[ip-10-93-135-215-41546|20] (3)
[ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320], 2 subgroups: [ip-10-93-136-91-22320|19] (1) [ip-10-93-136-91-22320], [ip-10-93-135-215-41546|18] (3) [ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-
91-22320]
Node-2
ip-10-93-135-215
2018-02-22 16:21:19.403 INFO 19074 --- [jgroups-24,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.426 TRACE 19074 --- [jgroups-27,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (1599 secs old)
)
2018-02-22 16:21:19.430 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.430 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting []
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.432 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting [ip-10-93-133-149-13458]
2018-02-22 16:21:19.433 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: broadcasting unsuspect(ip-10-93-133-149-13458)
2018-02-22 16:21:19.433 TRACE 19074 --- [jgroups-27,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
> FD_SOCK is not working in AWS environment
> -----------------------------------------
>
> Key: JGRP-2253
> URL: https://issues.jboss.org/browse/JGRP-2253
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.10
> Environment: AWS - EC2
> Reporter: Sibin Karnavar
> Assignee: Bela Ban
>
> We have our failure detection defined like below.
> <FD_SOCK external_port="7804" />
> <FD timeout="3000" max_tries="3" />
> <VERIFY_SUSPECT timeout="3000" />
> Please note that we have used FD instead of FD_ALL in AWS. We will be changing it to FD_ALL later after detailed testing.
> In my local, this is working perfect. As soon as I kill my node, I was able to see that view change was happening immediately with FD_SOCK.
> We were not mentioning the external_port in the FD_SOCK but later I thought it may be an issue with the port and defined it as 7804 and added the same port to the security group that allows to access this port among all the nodes. So no issue with the port.
> Can you please let us know if we need any additional configurations to make FD_SOCK works well in AWS.
> Thanks,
> Sibin
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (JGRP-2253) FD_SOCK is not working in AWS environment
by Sibin Karnavar (JIRA)
[ https://issues.jboss.org/browse/JGRP-2253?page=com.atlassian.jira.plugin.... ]
Sibin Karnavar edited comment on JGRP-2253 at 2/22/18 12:15 PM:
----------------------------------------------------------------
All timestamps are in UTC
Members in the cluster: ip-10-93-136-91, ip-10-93-133-149 and ip-10-93-135-215
+Node 0: (Master Node / Leader)+
ip-10-93-136-91
This node was the leader node and I have killed it at 2018-02-22 16:19:58.186 UTC time. If you see the FD_SOCK Trace timestamp, its not detecting the TCP socket connection break immediately.
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,ABC_test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
+_Node 1:_+
ip-10-93-133-149
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|17] (2) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458]
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - ip-10-93-135-215-41546 is the master for service ABC
2018-02-22 16:21:19.408 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.409 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
2018-02-22 16:21:19.434 TRACE 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.434 DEBUG 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:19.435 DEBUG 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:19.435 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.436 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.437 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: who-has-sock ip-10-93-133-149-13458
2018-02-22 16:21:49.429 DEBUG 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:49.430 DEBUG 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:49.430 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.431 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.432 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.437 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-136-91-22320:
mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.527 INFO 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: MergeView::[ip-10-93-135-215-41546|20] (3)
[ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320], 2 subgroups: [ip-10-93-136-91-22320|19] (1) [ip-10-93-136-91-22320], [ip-10-93-135-215-41546|18] (3) [ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-
91-22320]
+
Node-2+
ip-10-93-135-215
2018-02-22 16:21:19.403 INFO 19074 --- [jgroups-24,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.426 TRACE 19074 --- [jgroups-27,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (1599 secs old)
)
2018-02-22 16:21:19.430 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.430 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting []
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.432 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting [ip-10-93-133-149-13458]
2018-02-22 16:21:19.433 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: broadcasting unsuspect(ip-10-93-133-149-13458)
2018-02-22 16:21:19.433 TRACE 19074 --- [jgroups-27,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
was (Author: sibin.karnavar):
All timestamps are in UTC
Members in the cluster: ip-10-93-136-91, ip-10-93-133-149 and ip-10-93-135-215
+Node 0: (Master Node / Leader)+
ip-10-93-136-91
This node was the leader node and I have killed it at 2018-02-22 16:19:58.186 UTC time. If you see the FD_SOCK Trace timestamp, its not detecting the TCP socket connection break immediately.
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,ABC_test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
_Node 1:_
ip-10-93-133-149
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|17] (2) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458]
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - ip-10-93-135-215-41546 is the master for service ABC
2018-02-22 16:21:19.408 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.409 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
2018-02-22 16:21:19.434 TRACE 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.434 DEBUG 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:19.435 DEBUG 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:19.435 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.436 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.437 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: who-has-sock ip-10-93-133-149-13458
2018-02-22 16:21:49.429 DEBUG 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:49.430 DEBUG 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:49.430 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.431 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.432 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.437 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-136-91-22320:
mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.527 INFO 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: MergeView::[ip-10-93-135-215-41546|20] (3)
[ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320], 2 subgroups: [ip-10-93-136-91-22320|19] (1) [ip-10-93-136-91-22320], [ip-10-93-135-215-41546|18] (3) [ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-
91-22320]
+
Node-2+
ip-10-93-135-215
2018-02-22 16:21:19.403 INFO 19074 --- [jgroups-24,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.426 TRACE 19074 --- [jgroups-27,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (1599 secs old)
)
2018-02-22 16:21:19.430 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.430 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting []
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.432 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting [ip-10-93-133-149-13458]
2018-02-22 16:21:19.433 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: broadcasting unsuspect(ip-10-93-133-149-13458)
2018-02-22 16:21:19.433 TRACE 19074 --- [jgroups-27,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
> FD_SOCK is not working in AWS environment
> -----------------------------------------
>
> Key: JGRP-2253
> URL: https://issues.jboss.org/browse/JGRP-2253
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.10
> Environment: AWS - EC2
> Reporter: Sibin Karnavar
> Assignee: Bela Ban
>
> We have our failure detection defined like below.
> <FD_SOCK external_port="7804" />
> <FD timeout="3000" max_tries="3" />
> <VERIFY_SUSPECT timeout="3000" />
> Please note that we have used FD instead of FD_ALL in AWS. We will be changing it to FD_ALL later after detailed testing.
> In my local, this is working perfect. As soon as I kill my node, I was able to see that view change was happening immediately with FD_SOCK.
> We were not mentioning the external_port in the FD_SOCK but later I thought it may be an issue with the port and defined it as 7804 and added the same port to the security group that allows to access this port among all the nodes. So no issue with the port.
> Can you please let us know if we need any additional configurations to make FD_SOCK works well in AWS.
> Thanks,
> Sibin
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (JGRP-2253) FD_SOCK is not working in AWS environment
by Sibin Karnavar (JIRA)
[ https://issues.jboss.org/browse/JGRP-2253?page=com.atlassian.jira.plugin.... ]
Sibin Karnavar edited comment on JGRP-2253 at 2/22/18 12:14 PM:
----------------------------------------------------------------
All timestamps are in UTC
Members in the cluster: ip-10-93-136-91, ip-10-93-133-149 and ip-10-93-135-215
Node 0: (Master Node / Leader)
ip-10-93-136-91
This node was the leader node and I have killed it at 2018-02-22 16:19:58.186 UTC time. If you see the FD_SOCK Trace timestamp, its not detecting the TCP socket connection break immediately.
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,ABC_test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
Node 1:
ip-10-93-133-149
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|17] (2) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458]
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - ip-10-93-135-215-41546 is the master for service ABC
2018-02-22 16:21:19.408 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.409 INFO 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
2018-02-22 16:21:19.434 TRACE 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.434 DEBUG 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:19.435 DEBUG 23603 --- [jgroups-13,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:19.435 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.436 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.437 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: who-has-sock ip-10-93-133-149-13458
2018-02-22 16:21:49.429 DEBUG 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:49.430 DEBUG 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:49.430 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.431 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.432 TRACE 23603 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.437 TRACE 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-136-91-22320:
mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.527 INFO 23603 --- [jgroups-16,ABC-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: MergeView::[ip-10-93-135-215-41546|20] (3)
[ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320], 2 subgroups: [ip-10-93-136-91-22320|19] (1) [ip-10-93-136-91-22320], [ip-10-93-135-215-41546|18] (3) [ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-
91-22320]
Node-2
ip-10-93-135-215
2018-02-22 16:21:19.403 INFO 19074 --- [jgroups-24,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.426 TRACE 19074 --- [jgroups-27,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (1599 secs old)
)
2018-02-22 16:21:19.430 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.430 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting []
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.432 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting [ip-10-93-133-149-13458]
2018-02-22 16:21:19.433 DEBUG 19074 --- [jgroups-22,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: broadcasting unsuspect(ip-10-93-133-149-13458)
2018-02-22 16:21:19.433 TRACE 19074 --- [jgroups-27,ABC-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
was (Author: sibin.karnavar):
All timestamps are in UTC
Members in the cluster: ip-10-93-136-91, ip-10-93-133-149 and ip-10-93-135-215
Node 0: (Master Node / Leader)
ip-10-93-136-91
This node was the leader node and I have killed it at 2018-02-22 16:19:58.186 UTC time. If you see the FD_SOCK Trace timestamp, its not detecting the TCP socket connection break immediately.
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
Node 1:
ip-10-93-133-149
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|17] (2) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458]
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - ip-10-93-135-215-41546 is the master for service SOM-SKS
2018-02-22 16:21:19.408 INFO 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.409 INFO 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
2018-02-22 16:21:19.434 TRACE 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.434 DEBUG 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:19.435 DEBUG 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:19.435 TRACE 23603 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.436 TRACE 23603 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.437 TRACE 23603 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-16,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: who-has-sock ip-10-93-133-149-13458
2018-02-22 16:21:49.429 DEBUG 23603 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:49.430 DEBUG 23603 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:49.430 TRACE 23603 --- [jgroups-16,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.431 TRACE 23603 --- [jgroups-16,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.432 TRACE 23603 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.437 TRACE 23603 --- [jgroups-16,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-136-91-22320:
mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.527 INFO 23603 --- [jgroups-16,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: MergeView::[ip-10-93-135-215-41546|20] (3)
[ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320], 2 subgroups: [ip-10-93-136-91-22320|19] (1) [ip-10-93-136-91-22320], [ip-10-93-135-215-41546|18] (3) [ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-
91-22320]
Node-2
ip-10-93-135-215
2018-02-22 16:21:19.403 INFO 19074 --- [jgroups-24,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.426 TRACE 19074 --- [jgroups-27,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (1599 secs old)
)
2018-02-22 16:21:19.430 TRACE 19074 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.430 DEBUG 19074 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting []
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.432 DEBUG 19074 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting [ip-10-93-133-149-13458]
2018-02-22 16:21:19.433 DEBUG 19074 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: broadcasting unsuspect(ip-10-93-133-149-13458)
2018-02-22 16:21:19.433 TRACE 19074 --- [jgroups-27,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
> FD_SOCK is not working in AWS environment
> -----------------------------------------
>
> Key: JGRP-2253
> URL: https://issues.jboss.org/browse/JGRP-2253
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.10
> Environment: AWS - EC2
> Reporter: Sibin Karnavar
> Assignee: Bela Ban
>
> We have our failure detection defined like below.
> <FD_SOCK external_port="7804" />
> <FD timeout="3000" max_tries="3" />
> <VERIFY_SUSPECT timeout="3000" />
> Please note that we have used FD instead of FD_ALL in AWS. We will be changing it to FD_ALL later after detailed testing.
> In my local, this is working perfect. As soon as I kill my node, I was able to see that view change was happening immediately with FD_SOCK.
> We were not mentioning the external_port in the FD_SOCK but later I thought it may be an issue with the port and defined it as 7804 and added the same port to the security group that allows to access this port among all the nodes. So no issue with the port.
> Can you please let us know if we need any additional configurations to make FD_SOCK works well in AWS.
> Thanks,
> Sibin
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (JGRP-2253) FD_SOCK is not working in AWS environment
by Sibin Karnavar (JIRA)
[ https://issues.jboss.org/browse/JGRP-2253?page=com.atlassian.jira.plugin.... ]
Sibin Karnavar commented on JGRP-2253:
--------------------------------------
All timestamps are in UTC
Members in the cluster: ip-10-93-136-91, ip-10-93-133-149 and ip-10-93-135-215
Node 0: (Master Node / Leader)
ip-10-93-136-91
This node was the leader node and I have killed it at 2018-02-22 16:19:58.186 UTC time. If you see the FD_SOCK Trace timestamp, its not detecting the TCP socket connection break immediately.
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
Node 1:
ip-10-93-133-149
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|17] (2) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458]
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:20:26.917 INFO 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - ip-10-93-135-215-41546 is the master for service SOM-SKS
2018-02-22 16:21:19.408 INFO 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.409 INFO 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - I am not the master!
2018-02-22 16:21:19.430 TRACE 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (559 secs old)
)
2018-02-22 16:21:19.434 TRACE 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.434 DEBUG 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:19.435 DEBUG 23603 --- [jgroups-13,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:19.435 TRACE 23603 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.436 TRACE 23603 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.437 TRACE 23603 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.429 TRACE 23603 --- [jgroups-16,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: who-has-sock ip-10-93-133-149-13458
2018-02-22 16:21:49.429 DEBUG 23603 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: suspecting [ip-10-93-135-215-41546]
2018-02-22 16:21:49.430 DEBUG 23603 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: broadcasting unsuspect(ip-10-93-135-215-41546)
2018-02-22 16:21:49.430 TRACE 23603 --- [jgroups-16,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:49.431 TRACE 23603 --- [jgroups-16,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.432 TRACE 23603 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.437 TRACE 23603 --- [jgroups-16,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] org.jgroups.protocols.FD_SOCK : - - ip-10-93-133-149-13458: received UNSUSPECT message from ip-10-93-136-91-22320:
mbrs=[ip-10-93-133-149-13458]
2018-02-22 16:21:49.527 INFO 23603 --- [jgroups-16,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-133-149-13458] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: MergeView::[ip-10-93-135-215-41546|20] (3)
[ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320], 2 subgroups: [ip-10-93-136-91-22320|19] (1) [ip-10-93-136-91-22320], [ip-10-93-135-215-41546|18] (3) [ip-10-93-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-
91-22320]
Node-2
ip-10-93-135-215
2018-02-22 16:21:19.403 INFO 19074 --- [jgroups-24,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] c.w.s.c.ServiceClusterCoordinator : - - Detected change in view membership: [ip-10-93-135-215-41546|18] (3) [ip-10-93
-135-215-41546, ip-10-93-133-149-13458, ip-10-93-136-91-22320]
2018-02-22 16:21:19.426 TRACE 19074 --- [jgroups-27,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: i-have-sock: ip-10-93-136-91-22320 -->
10.93.136.91:7804 (cache is ip-10-93-133-149-13458: 10.93.133.149:7804 (559 secs old)
ip-10-93-136-91-22320: 10.93.136.91:7804 (0 ms old)
ip-10-93-135-215-41546: 10.93.135.215:7804 (1599 secs old)
)
2018-02-22 16:21:19.430 TRACE 19074 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.430 DEBUG 19074 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting []
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-133-149-
13458: mbrs=[ip-10-93-135-215-41546]
2018-02-22 16:21:19.432 TRACE 19074 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received SUSPECT message from ip-10-93-136-91-22320:
suspects=[ip-10-93-133-149-13458]
2018-02-22 16:21:19.432 DEBUG 19074 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: suspecting [ip-10-93-133-149-13458]
2018-02-22 16:21:19.433 DEBUG 19074 --- [jgroups-22,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: broadcasting unsuspect(ip-10-93-133-149-13458)
2018-02-22 16:21:19.433 TRACE 19074 --- [jgroups-27,SOM-SKS-test_SJX_080220180246_XSJ,ip-10-93-135-215-41546] org.jgroups.protocols.FD_SOCK : - - ip-10-93-135-215-41546: received UNSUSPECT message from ip-10-93-135-215-
41546: mbrs=[ip-10-93-133-149-13458]
> FD_SOCK is not working in AWS environment
> -----------------------------------------
>
> Key: JGRP-2253
> URL: https://issues.jboss.org/browse/JGRP-2253
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.10
> Environment: AWS - EC2
> Reporter: Sibin Karnavar
> Assignee: Bela Ban
>
> We have our failure detection defined like below.
> <FD_SOCK external_port="7804" />
> <FD timeout="3000" max_tries="3" />
> <VERIFY_SUSPECT timeout="3000" />
> Please note that we have used FD instead of FD_ALL in AWS. We will be changing it to FD_ALL later after detailed testing.
> In my local, this is working perfect. As soon as I kill my node, I was able to see that view change was happening immediately with FD_SOCK.
> We were not mentioning the external_port in the FD_SOCK but later I thought it may be an issue with the port and defined it as 7804 and added the same port to the security group that allows to access this port among all the nodes. So no issue with the port.
> Can you please let us know if we need any additional configurations to make FD_SOCK works well in AWS.
> Thanks,
> Sibin
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months