[JBoss JIRA] (JGRP-2253) FD_SOCK is not working in AWS environment
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2253?page=com.atlassian.jira.plugin.... ]
Bela Ban commented on JGRP-2253:
--------------------------------
Sorry for the delay!
Well, first of all, external_addr for FD_SOCK seems not necessary since you're only using internal IP addresses.
Secondly, terminating an EC2 instance apparently doesn't (always) close the sockets of the killed process, so it is not the same as {{kill -3/-9}}. In that case, FD_ALL (or FD) acts as second line of defense.
IIRC, AWS allows you to add a hook (script) to the termination process, in that hook, you could kill the process. But I haven't used AWS for months, so maybe this has changed...
The best way would be to shut down the cluster node *gracefully*, ie. via {{JChannel.close()}}; this would install a new view in the remaining members quickly.
> FD_SOCK is not working in AWS environment
> -----------------------------------------
>
> Key: JGRP-2253
> URL: https://issues.jboss.org/browse/JGRP-2253
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.10
> Environment: AWS - EC2
> Reporter: Sibin Karnavar
> Assignee: Bela Ban
> Fix For: 4.0.12
>
>
> We have our failure detection defined like below.
> <FD_SOCK external_port="7804" />
> <FD timeout="3000" max_tries="3" />
> <VERIFY_SUSPECT timeout="3000" />
> Please note that we have used FD instead of FD_ALL in AWS. We will be changing it to FD_ALL later after detailed testing.
> In my local, this is working perfect. As soon as I kill my node, I was able to see that view change was happening immediately with FD_SOCK.
> We were not mentioning the external_port in the FD_SOCK but later I thought it may be an issue with the port and defined it as 7804 and added the same port to the security group that allows to access this port among all the nodes. So no issue with the port.
> Can you please let us know if we need any additional configurations to make FD_SOCK works well in AWS.
> Thanks,
> Sibin
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 5 months
[JBoss JIRA] (WFCORE-3382) Further Enhance Elytron Permission Configuration
by Jeff Mesnil (JIRA)
[ https://issues.jboss.org/browse/WFCORE-3382?page=com.atlassian.jira.plugi... ]
Jeff Mesnil updated WFCORE-3382:
--------------------------------
Fix Version/s: 5.0.0.Alpha5
(was: 5.0.0.Alpha4)
> Further Enhance Elytron Permission Configuration
> ------------------------------------------------
>
> Key: WFCORE-3382
> URL: https://issues.jboss.org/browse/WFCORE-3382
> Project: WildFly Core
> Issue Type: Feature Request
> Components: Security
> Reporter: Darran Lofthouse
> Priority: Blocker
> Fix For: 5.0.0.Alpha5
>
>
> This has currently been simplified to a single resource for the out of the box configuration, however this brings issues as now permissions are duplicated so modifications need to be replicated instead of to a single location.
> Finding a way for the default required permissions to be defined in one location could help eliminate the duplication.
> We could also consider going one step further and subsystems register the default permissions that should be granted.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 5 months
[JBoss JIRA] (WFCORE-1649) RBAC constraint config modifications will fail in a mixed domain if the modified constraint is not present in the legacy slave
by Jeff Mesnil (JIRA)
[ https://issues.jboss.org/browse/WFCORE-1649?page=com.atlassian.jira.plugi... ]
Jeff Mesnil updated WFCORE-1649:
--------------------------------
Fix Version/s: 5.0.0.Alpha5
(was: 5.0.0.Alpha4)
> RBAC constraint config modifications will fail in a mixed domain if the modified constraint is not present in the legacy slave
> ------------------------------------------------------------------------------------------------------------------------------
>
> Key: WFCORE-1649
> URL: https://issues.jboss.org/browse/WFCORE-1649
> Project: WildFly Core
> Issue Type: Bug
> Components: Management
> Reporter: Brian Stansberry
> Assignee: Brian Stansberry
> Priority: Critical
> Labels: domain-mode
> Fix For: 5.0.0.Alpha5
>
>
> The management model for RBAC constraints is maintained using synthetic resources, with resources only existing for those items (SensitivityClassification and ApplicationClassification) that are registered in the current process. Operations that touch classifications unknown to that process will fail due to missing resource problems.
> This is a big problem in the following scenarios:
> 1) Mixed domain, where legacy slaves do not know about newly introduced classifications.
> 2) Slimming scenarios where slaves are ignoring unrelated parts of the domain wide config and also don't have some extension installed, resulting in classifications registered by those extensions not being present.
> A partial workaround to 1) is for the kernel to register transformers for newly introduced classifications (e.g. SERVER_SSL added in EAP 6.4.7 and EAP 7). But:
> -- that doesn't help with problem 2)
> -- only the kernel can register kernel transformers, so if extensions add new classifications there is no way for them to register the transformer.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 5 months
[JBoss JIRA] (JGRP-2261) NPE in FD_ALL2
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2261?page=com.atlassian.jira.plugin.... ]
Bela Ban resolved JGRP-2261.
----------------------------
Resolution: Done
This is caused by filter HAS_HEADER not checking for null messages. Null messages can happen when MessageBatch.remove(msg) is called.
Fixed by adding a null check to the HAS_HEADER filter.
> NPE in FD_ALL2
> --------------
>
> Key: JGRP-2261
> URL: https://issues.jboss.org/browse/JGRP-2261
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.10
> Environment: WildFly 12.0.0.Final
> Reporter: Rich DiCroce
> Assignee: Bela Ban
> Fix For: 4.0.12
>
>
> I'm seeing a NPE in FD_ALL2 from time to time. Not consistent but the reason isn't hard to see. Stack trace:
> {code}
> 16:08:06,244 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) Exception in thread "thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)" java.lang.NullPointerException
> 16:08:06,244 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.protocols.FD_ALL2.lambda$new$0(FD_ALL2.java:83)
> 16:08:06,245 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.util.MessageBatch.replaceIf(MessageBatch.java:220)
> 16:08:06,245 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.protocols.FD_ALL2.up(FD_ALL2.java:186)
> 16:08:06,245 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.stack.Protocol.up(Protocol.java:372)
> 16:08:06,245 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.stack.Protocol.up(Protocol.java:372)
> 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.stack.Protocol.up(Protocol.java:372)
> 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.protocols.TP.passBatchUp(TP.java:1274)
> 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.util.SubmitToThreadPool$BatchHandler.passBatchUp(SubmitToThreadPool.java:140)
> 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.util.SubmitToThreadPool$BatchHandler.run(SubmitToThreadPool.java:136)
> 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 16:08:06,247 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jboss.as.clustering.jgroups.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:52)
> 16:08:06,247 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at java.lang.Thread.run(Thread.java:748)
> {code}
> HAS_HEADER is assuming msg is non-null, but MessageBatch makes it clear that it's valid for elements of the batch to be null, and replaceIf() doesn't perform a null check.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 5 months
[JBoss JIRA] (JGRP-2262) "Frozen" coordinator causes the whole cluster to hang
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2262?page=com.atlassian.jira.plugin.... ]
Bela Ban resolved JGRP-2262.
----------------------------
Resolution: Done
FrozenCoordinatorTest was changed; another member C joins after B becomes the singleton cluster coordinator, this is successful. I only tested with FILE_PING but the logic is there anyway, and all subclasses (such as JDBC_PING) inherit it.
Let me know if this works, and I can backport to changes to the 3.6 branch.
> "Frozen" coordinator causes the whole cluster to hang
> -----------------------------------------------------
>
> Key: JGRP-2262
> URL: https://issues.jboss.org/browse/JGRP-2262
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.7
> Reporter: Pietro Paolini
> Assignee: Bela Ban
> Fix For: 4.0.12
>
> Attachments: jdbc_test.xml, jgroup.zip
>
>
> This is the result of an investigation I carried out for a problem we have experienced within our
> application, the scenario it has been re-created by pausing the JVM using a debugger.
> The discovery mechanism is JDBC_PING.
> If the coordinator's JVM gets fronzen (for whatever reason) before the coordinator sets itself as the cluster coordinator and another node is started after that it will be unable to join the cluster and it will hang indefinitely.
> This seems to be caused by the "continue" statement at
> https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/...
> I have prepared a simple application which can help in replicating the problem.
> To replicate the problem :
> 1) Make sure the JGROUPSPING is empty
> 2) Run the application using an IDE and attaching a debugger to cause the JVM to
> be paused at line Main.java:67, wait for it.
> 3) Run the application in non debug mode or with gradle using "gradle run" and it will
> hang indefinitely
> Depending on the UUID/IP Address being used generated/assigned this may not happen all the time but it happened quite often in my local tests.
>
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 5 months