[jboss-jira] [JBoss JIRA] Commented: (JGRP-594) Intermittently, a Null pointer exception is thrown when trying to remove non-existent entry in ReplicatedHashMap

Dipak Kothari (JIRA) jira-events at lists.jboss.org
Mon Jan 21 04:47:25 EST 2008


    [ http://jira.jboss.com/jira/browse/JGRP-594?page=comments#action_12395928 ] 
            
Dipak Kothari commented on JGRP-594:
------------------------------------

 I have been investigating this (and NPE in JGRP-594) further and have found the following: 
 
1) At start up, infrequently, i get members starting in different groups due to initial coordinator not responding in time.  
2) The MergeTask identifies the sub groups and initiates the merge. 
3) New view and digest get installed. Also, the application gets a call back of View Change. 
4) The application, on identifing that its a mergeView request a getState from one of the members (Say X) of the subgroup. 
5) X returns its view of the View and digest before it had received the MergeView event (can see in the log that this events arrives afterwards). 
6) STATE_TRANSFER protocol handles the state Response 
7) As flush is not in the protocol, it calls sends a SET_DIGEST event down the stack which is picked up by NAKACK. It resets its digest (the digest it had was the correct merged version) and updates with the one that arrived with state. However, this digest doesn't have itself in it and so start getting the ERROR messages as mentioned above. 
8) Subsequent put in ReplicatedHashMap fails with NPE in the NAKACK as it tries to add message to the NakReceiverWindow associated with local address which now isn't there. 
 
So I tried adding FLUSH protocol and this avoids resetting the digest and the NPE but the same tests results in state being different between members and never seem to get to the point where the state is the same across all members of the group. 
 
I have looked at the bug list between 2.5 and 2.6.1 and have noticed some work that has been done in the area of flush, gms and state transfer and so was thinking of working on 2.6.1 now rather than 2.5.0. One point though, without the flush I had the same issue with 2.6.1 as well though it only happended once in a 24 hour soak test. 


> Intermittently, a Null pointer exception is thrown when trying to remove non-existent entry in ReplicatedHashMap
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: JGRP-594
>                 URL: http://jira.jboss.com/jira/browse/JGRP-594
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 2.5
>         Environment: Linux
>            Reporter: Dipak Kothari
>         Assigned To: Bela Ban
>             Fix For: 2.7
>
>         Attachments: Server2.log
>
>
> Intermittently, when an entry is removed from a ReplicatedHashMap (where the entry does not exist) the following exception is thrown:
> java.lang.RuntimeException: remove(APMExample.Services.examples.ServerA09) failed
>         at org.jgroups.blocks.ReplicatedHashMap.remove(ReplicatedHashMap.java:405)
>         at com.ubs.apm.control.service.nameservice.jgroup.JGroupNameService.unRegisterService(JGroupNameService.java:132)
>         at com.ubs.apm.control.sensors.ControlSensorManager.cleanup(ControlSensorManager.java:468)
>         at com.ubs.apm.control.sensors.ControlSensorManager.<init>(ControlSensorManager.java:125)
>         at com.ubs.apm.control.sensors.ControlSensorManager.<init>(ControlSensorManager.java:106)
>         at com.ubs.apm.control.example.ManagedServer.init(ManagedServer.java:20)
>         at com.ubs.apm.control.example.ManagedServer.main(ManagedServer.java:45)
> Caused by: java.lang.RuntimeException: failed executing request [req_id=1189617222436
> caller=14.64.61.201:6825
> 14.64.61.201:6838: sender=14.64.61.201:6838, retval=null, received=false, suspected=false
> .... many such lines ...
> 14.64.61.201:6860: sender=14.64.61.201:6860, retval=null, received=false, suspected=false
> 14.64.61.201:6847: sender=14.64.61.201:6847, retval=null, received=false, suspected=false
> request_msg: [dst: <null>, src: 14.64.61.201:6825 (2 headers), size=143 bytes]
> rsp_mode: GET_NONE
> done: true
> timeout: 5000
> expected_mbrs: 0 ([14.64.61.201:6815, 14.64.61.201:6816, 14.64.61.201:6824, 14.64.61.201:6825, 14.64.61.201:6826, 14.64.61.201:6827, 14.64.61.201:6828, 14.64.61.201:6829, 14.64.61.201:6830, 14.64.61.201:6831, 14.64.61.201:6833, 14.64.61.201:6834, 14.64.61.201:6835, 14.64.61.201:6836, 14.64.61.201:6837, 14.64.61.201:6838, 14.64.61.201:6839, 14.64.61.201:6842, 14.64.61.201:6843, 14.64.61.201:6844, 14.64.61.201:6845, 14.64.61.201:6846, 14.64.61.201:6847, 14.64.61.201:6848, 14.64.61.201:6849, 14.64.61.201:6850, 14.64.61.201:6851, 14.64.61.201:6852, 14.64.61.201:6853, 14.64.61.201:6854, 14.64.61.201:6855, 14.64.61.201:6856, 14.64.61.201:6857, 14.64.61.201:6858, 14.64.61.201:6859, 14.64.61.201:6860, 14.64.61.201:6861, 14.64.61.201:6862, 14.64.61.201:6863, 14.64.61.201:6864, 14.64.61.201:6865, 14.64.61.201:6866])]
>         at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:433)
>         at org.jgroups.blocks.RpcDispatcher.callRemoteMethods(RpcDispatcher.java:199)
>         at org.jgroups.blocks.RpcDispatcher.callRemoteMethods(RpcDispatcher.java:167)
>         at org.jgroups.blocks.RpcDispatcher.callRemoteMethods(RpcDispatcher.java:163)
>         at org.jgroups.blocks.ReplicatedHashMap.remove(ReplicatedHashMap.java:402)
>         ... 6 more
> Caused by: java.lang.RuntimeException: failure adding msg [dst: <null>, src: 14.64.61.201:6825 (2 headers), size=143 bytes] to the retransmit table for 14.64.61.201:6825
>         at org.jgroups.protocols.pbcast.NAKACK.send(NAKACK.java:636)
>         at org.jgroups.protocols.pbcast.NAKACK.down(NAKACK.java:438)
>         at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:317)
>         at org.jgroups.protocols.pbcast.GMS.down(GMS.java:782)
>         at org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:221)
>         at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:339)
>         at org.jgroups.JChannel.downcall(JChannel.java:1240)
>         at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:752)
>         at org.jgroups.blocks.RequestCorrelator.sendRequest(RequestCorrelator.java:301)
>         at org.jgroups.blocks.GroupRequest.doExecute(GroupRequest.java:440)
>         at org.jgroups.blocks.GroupRequest.execute(GroupRequest.java:190)
>         at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:430)
>         ... 10 more
> Caused by: java.lang.NullPointerException
>         at org.jgroups.protocols.pbcast.NAKACK.send(NAKACK.java:632)
>         ... 21 more

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list