[jboss-jira] [JBoss JIRA] (JGRP-1684) NPE in UNICAST2.up()

Thu Sep 5 08:39:03 EDT 2013

    [ https://issues.jboss.org/browse/JGRP-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801854#comment-12801854 ] 

Radoslav Husar commented on JGRP-1684:
--------------------------------------

Hi Bela,

I have been digging further, I need your advice.

== Singleton transport

I am assuming this is the logic that adds the source address if it hasn't been set yet (is null):

https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/TP.java#L1284

However that is done only for non-singleton transport, however in EAP I believe we are using singleton transport. So it would not get set by TP. I am wondering should it really be skipped? Maybe other code that does it?

So if we know we are going to check the sender address on receive, could we set it already in UNICAST's generation of the ack packet?

https://github.com/rhusar/JGroups/compare/belaban:Branch_JGroups_3_2...JGRP-1684_2?expand=1

== Timing

Looking at the startup logic, it seems as though the local_addr should be marked volatile:

https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/TP.java#L540

but that doesn't seem to be the issue hitting us now.

Looking at the other nodes' logs I am assuming the corrupted message has been by the node which was _shutting down_ exactly at that time 

http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbservlet-shutdown-dist-async/15/console-perf19/consoleText

Also I wonder if this could be related (haven't checked):

log [JBossINF] [0m[33m11:53:25,976 WARN  [org.infinispan.topology.CacheTopologyControlCommand] (OOB-2,shared=udp) ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=dist, type=REBALANCE_CONFIRM, sender=perf18/web, joinInfo=null, topologyId=13, currentCH=null, pendingCH=null, throwable=null, viewId=6}: org.infinispan.CacheException: Received invalid rebalance confirmation from perf18/web for cache dist, we don't have a rebalance in progress

Thanks

> NPE in UNICAST2.up()
> --------------------
>
>                 Key: JGRP-1684
>                 URL: https://issues.jboss.org/browse/JGRP-1684
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.2.10
>            Reporter: Radoslav Husar
>            Assignee: Radoslav Husar
>
> {noformat}
> 11:53:25,993 ERROR [org.jgroups.protocols.UDP] (OOB-13,shared=udp) failed handling incoming message: java.lang.NullPointerException
> 	at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) [rt.jar:1.6.0_45]
> 	at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:432)
> 	at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:645)
> 	at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:147)
> 	at org.jgroups.protocols.FD.up(FD.java:253)
> 	at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)
> 	at org.jgroups.protocols.MERGE3.up(MERGE3.java:290)
> 	at org.jgroups.protocols.Discovery.up(Discovery.java:359)
> 	at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2610)
> 	at org.jgroups.protocols.TP.passMessageUp(TP.java:1263)
> 	at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1825)
> 	at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1798)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [rt.jar:1.6.0_45]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [rt.jar:1.6.0_45]
> 	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_45]
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira