[jboss-jira] [JBoss JIRA] (JGRP-1744) Race condition allows NullPointerException in Executing protocol
Tim Gage (JIRA)
jira-events at lists.jboss.org
Fri Nov 15 09:31:05 EST 2013
Tim Gage created JGRP-1744:
------------------------------
Summary: Race condition allows NullPointerException in Executing protocol
Key: JGRP-1744
URL: https://issues.jboss.org/browse/JGRP-1744
Project: JGroups
Issue Type: Bug
Affects Versions: 3.4.1, 3.2.7
Reporter: Tim Gage
Assignee: Bela Ban
In {{org.jgroups.protocols.Executing}}, There is a race condition between the handling of {{ExecutorEvent.TASK_SUBMIT}} in {{down()}} and {{handleConsumerFoundResponse()}}.
This code in {{handleConsumerFoundResponse()}} can throw an NPE:
{code}
final Long requestId = _requestId.get(runnable);
owner = new Owner(address, requestId);
{code}
The problem happens when there is no value in the _requestId map for the given runnable: An NPE is thrown when the null requestId is unboxed as it is passed to the constructor of Owner.
We have seen this exception twice in a test system.
Looking at the code, the most likely scenario under which this can happen is:
# A node has an empty queue of runnables
# It receives a {{CONSUMER_FOUND}} message (discussed below)
# Just as it does this, a new runnable is enqueued in {{down()}}. Then, before the new requestId is added to the map...
# {{handleConsumerFoundResponse()}} gets the runnable from the queue, but finds no requestId in the map
It seems like it should not be possible for a node to receive a {{CONSUMER_FOUND}} when it has no runnables (since it hasn't asked for any) but the code does handle this:
{{handleConsumerFoundResponse()}} says:
{code}
if (runnable == null) {
owner = new Owner(address, threadId);
// For some reason we don't have a runnable anymore
// so we have to send back to the coordinator that
// the consumer is still available. The runnable
// would be removed on a cancel
sendToCoordinator(Type.CONSUMER_READY, owner.getRequestId(),
owner.getAddress());
}
{code}
We do not believe any tasks were cancelled in the cases where we have seen the NPE, so perhaps there is some other way it can happen.
In any case the solution should be quite simple: re-order the additions of the runnable to the queue and the requestId to the map:
{code}
case ExecutorEvent.TASK_SUBMIT:
Runnable runnable = (Runnable)evt.getArg();
_awaitingConsumer.add(runnable);
// We are limited to a number of concurrent request id's
// equal to 2^63-1. This is quite large and if it
// overflows it will still be positive
long requestId = Math.abs(counter.getAndIncrement());
if(requestId == Long.MIN_VALUE) {
counter.set(0);
requestId = Math.abs(counter.getAndIncrement());
}
_requestId.put(runnable, requestId);
{code}
We're seeing the problem in 3.2.7, but the code appears to the be the same in the master branch.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list