[jboss-jira] [JBoss JIRA] (JGRP-1744) Race condition allows NullPointerException in Executing protocol

Bela Ban (JIRA) jira-events at lists.jboss.org
Fri Nov 15 09:42:06 EST 2013


     [ https://issues.jboss.org/browse/JGRP-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bela Ban updated JGRP-1744:
---------------------------

    Fix Version/s: 3.5

    
> Race condition allows NullPointerException in Executing protocol
> ----------------------------------------------------------------
>
>                 Key: JGRP-1744
>                 URL: https://issues.jboss.org/browse/JGRP-1744
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.2.7, 3.4.1
>            Reporter: Tim Gage
>            Assignee: Bela Ban
>             Fix For: 3.5
>
>
> In {{org.jgroups.protocols.Executing}}, There is a race condition between the handling of {{ExecutorEvent.TASK_SUBMIT}} in {{down()}} and {{handleConsumerFoundResponse()}}.
> This code in {{handleConsumerFoundResponse()}} can throw an NPE:
> {code}
> final Long requestId = _requestId.get(runnable);
> owner = new Owner(address, requestId);
> {code}
> The problem happens when there is no value in the _requestId map for the given runnable: An NPE is thrown when the null requestId is unboxed as it is passed to the constructor of Owner.
> We have seen this exception just twice in a test system where many hundreds of millions of items have been processed, so it's pretty rare.
> Looking at the code, the most likely scenario under which this can happen is:
> # A node has an empty queue of runnables
> # It receives a {{CONSUMER_FOUND}} message (discussed below)
> # Just as it does this, a new runnable is enqueued in {{down()}}. Then, before the new requestId is added to the map...
> # {{handleConsumerFoundResponse()}} gets the runnable from the queue, but finds no requestId in the map
> It seems like it should not be possible for a node to receive a {{CONSUMER_FOUND}} when it has no runnables (since it hasn't asked for any) but the code does handle this:
> {{handleConsumerFoundResponse()}} says:
> {code}
> if (runnable == null) {
>   owner = new Owner(address, threadId);
>   // For some reason we don't have a runnable anymore
>   // so we have to send back to the coordinator that
>   // the consumer is still available.  The runnable
>   // would be removed on a cancel
>   sendToCoordinator(Type.CONSUMER_READY, owner.getRequestId(),
>                 owner.getAddress());
> }
> {code}
> We do not believe any tasks were cancelled in the cases where we have seen the NPE, so perhaps there is some other way it can happen.
> In any case the solution should be quite simple: re-order the additions of the runnable to the queue and the requestId to the map:
> {code}
> case ExecutorEvent.TASK_SUBMIT:
>   Runnable runnable = (Runnable)evt.getArg();
>   _awaitingConsumer.add(runnable);
>   // We are limited to a number of concurrent request id's
>   // equal to 2^63-1.  This is quite large and if it 
>   // overflows it will still be positive
>   long requestId = Math.abs(counter.getAndIncrement());
>   if(requestId == Long.MIN_VALUE) {
>       counter.set(0);
>       requestId = Math.abs(counter.getAndIncrement());
>   }
>   _requestId.put(runnable, requestId);
> {code}
> We're seeing the problem in 3.2.7, but the code appears to the be the same in the master branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the jboss-jira mailing list