[JBoss JIRA] (JGRP-2261) NPE in FD_ALL2
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2261?page=com.atlassian.jira.plugin.... ]
Bela Ban resolved JGRP-2261.
----------------------------
Resolution: Done
This is caused by filter HAS_HEADER not checking for null messages. Null messages can happen when MessageBatch.remove(msg) is called.
Fixed by adding a null check to the HAS_HEADER filter.
> NPE in FD_ALL2
> --------------
>
> Key: JGRP-2261
> URL: https://issues.jboss.org/browse/JGRP-2261
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.10
> Environment: WildFly 12.0.0.Final
> Reporter: Rich DiCroce
> Assignee: Bela Ban
> Fix For: 4.0.12
>
>
> I'm seeing a NPE in FD_ALL2 from time to time. Not consistent but the reason isn't hard to see. Stack trace:
> {code}
> 16:08:06,244 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) Exception in thread "thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)" java.lang.NullPointerException
> 16:08:06,244 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.protocols.FD_ALL2.lambda$new$0(FD_ALL2.java:83)
> 16:08:06,245 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.util.MessageBatch.replaceIf(MessageBatch.java:220)
> 16:08:06,245 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.protocols.FD_ALL2.up(FD_ALL2.java:186)
> 16:08:06,245 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.stack.Protocol.up(Protocol.java:372)
> 16:08:06,245 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.stack.Protocol.up(Protocol.java:372)
> 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.stack.Protocol.up(Protocol.java:372)
> 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.protocols.TP.passBatchUp(TP.java:1274)
> 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.util.SubmitToThreadPool$BatchHandler.passBatchUp(SubmitToThreadPool.java:140)
> 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.util.SubmitToThreadPool$BatchHandler.run(SubmitToThreadPool.java:136)
> 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 16:08:06,247 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jboss.as.clustering.jgroups.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:52)
> 16:08:06,247 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at java.lang.Thread.run(Thread.java:748)
> {code}
> HAS_HEADER is assuming msg is non-null, but MessageBatch makes it clear that it's valid for elements of the batch to be null, and replaceIf() doesn't perform a null check.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
8 years
[JBoss JIRA] (JGRP-2262) "Frozen" coordinator causes the whole cluster to hang
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2262?page=com.atlassian.jira.plugin.... ]
Bela Ban resolved JGRP-2262.
----------------------------
Resolution: Done
FrozenCoordinatorTest was changed; another member C joins after B becomes the singleton cluster coordinator, this is successful. I only tested with FILE_PING but the logic is there anyway, and all subclasses (such as JDBC_PING) inherit it.
Let me know if this works, and I can backport to changes to the 3.6 branch.
> "Frozen" coordinator causes the whole cluster to hang
> -----------------------------------------------------
>
> Key: JGRP-2262
> URL: https://issues.jboss.org/browse/JGRP-2262
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.7
> Reporter: Pietro Paolini
> Assignee: Bela Ban
> Fix For: 4.0.12
>
> Attachments: jdbc_test.xml, jgroup.zip
>
>
> This is the result of an investigation I carried out for a problem we have experienced within our
> application, the scenario it has been re-created by pausing the JVM using a debugger.
> The discovery mechanism is JDBC_PING.
> If the coordinator's JVM gets fronzen (for whatever reason) before the coordinator sets itself as the cluster coordinator and another node is started after that it will be unable to join the cluster and it will hang indefinitely.
> This seems to be caused by the "continue" statement at
> https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/...
> I have prepared a simple application which can help in replicating the problem.
> To replicate the problem :
> 1) Make sure the JGROUPSPING is empty
> 2) Run the application using an IDE and attaching a debugger to cause the JVM to
> be paused at line Main.java:67, wait for it.
> 3) Run the application in non debug mode or with gradle using "gradle run" and it will
> hang indefinitely
> Depending on the UUID/IP Address being used generated/assigned this may not happen all the time but it happened quite often in my local tests.
>
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
8 years