Hi Greg,
Thanks for the post. I'll give this a shot. Turns out that I can reproduce the issue
often enough that I'll be able to see if this simple change resolves it.
Regards,
Edward
________________________________________
From: rules-dev-bounces(a)lists.jboss.org [rules-dev-bounces(a)lists.jboss.org] On Behalf Of
Greg Barton [greg_barton(a)yahoo.com]
Sent: Tuesday, November 03, 2009 9:43 PM
To: Rules Dev List
Subject: Re: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for
resolution?
Well, I'm not sure how to avoid the deadlock without changing the drools codebase. I
was, however, able to change the type of AbstractWorkingMemory.actionQueue to
java.util.concurrent.ConcurrentLinkedQueue and remove the synchronization over the queue
with no apparent ill effects. (Two tests failed for drools-core, but they failed whether
the change was made or not.) Also I don't like the fact that the current code
synchronizes on actionQueue, but then exposes it outside the class through the
getActionQueue() method, where access can be unsynchronized. Changing it to
ConcurrentLinkedQueue makes it safe to expose externally. (Not to mention that the lock
can be stolen externally with the current code.)
diff attached. If you can run drools compiled from trunk, apply the diff and see if it
resolves the deadlock. If it does it's up to the drools devs as to whether the change
should be made. I'm just hacking about. :P
--- On Tue, 11/3/09, Edward Archibald <edward.archibald(a)continuent.com> wrote:
From: Edward Archibald <edward.archibald(a)continuent.com>
Subject: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for
resolution?
To: "rules-dev(a)lists.jboss.org" <rules-dev(a)lists.jboss.org>
Date: Tuesday, November 3, 2009, 9:41 PM
I found the following deadlock which is, apparently, due to
the concurrent execution
of a task for a 'delayed' rule with a concurrently
executing application thread attempting to get access to a
'global'. Any recommendations for avoiding this type
of deadlock besides not using rules with 'duration()' etc.
which cause asynchronous execution with respect to my main
application thread?
This problem is somewhat difficult to reproduce on demand
but it does come up frequently when the 'delayed' rule
"DETECT MONITORING HAS STOPPED" is activated as a result of
the trigger conditions.
===================================================================================
This thread, my application's EnterprisePolicyManager
thread, is attempting to get access to a global, policyMgr,
and is waiting for
the 'lock.lock' on RetooStatefulSession
It owns the 'ReteooStatefulSession.actionQueue'
and is waiting for the ReteooStatefulSession.lock.lock
owns: java.util.LinkedList<E> (id=207)
waited by: Thread [pool-3-thread-1] (Suspended)
owns:
com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine
(id=208)
sun.misc.Unsafe.park(boolean, long) line: not available
[native method] [local variables unavailable]
java.util.concurrent.locks.LockSupport.park() line: 118
[local variables unavailable]
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
line: 681 [local variables unavailable]
java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
int) line: 711
java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int)
line: 1041
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock()
line: 184 [local variables unavailable]
java.util.concurrent.locks.ReentrantLock.lock() line: 256
[local variables unavailable]
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String)
line: 587
com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple,
org.drools.rule.Declaration[], org.drools.WorkingMemory,
java.lang.Object) line: not available
org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple,
org.drools.WorkingMemory, java.lang.Object) line: 117
org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple,
org.drools.spi.PropagationContext,
org.drools.common.InternalWorkingMemory) line: 180
org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext,
org.drools.common.InternalWorkingMemory,
org.drools.reteoo.LeftTuple) line: 117
org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple,
org.drools.reteoo.RightTuple,
org.drools.spi.PropagationContext,
org.drools.common.InternalWorkingMemory, boolean) line: 28
org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle,
org.drools.spi.PropagationContext,
org.drools.common.InternalWorkingMemory) line: 175
org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle,
org.drools.spi.PropagationContext,
org.drools.common.InternalWorkingMemory) line: 42
org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator,
org.drools.common.InternalWorkingMemory) line: 326
org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory)
line: 221
org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory)
line: 394
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
line: 1486
org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle,
java.lang.Object, org.drools.rule.Rule,
org.drools.spi.Activation) line: 158
org.drools.common.NamedEntryPoint.insert(java.lang.Object,
boolean, boolean, org.drools.rule.Rule,
org.drools.spi.Activation) line: 122
org.drools.common.NamedEntryPoint.insert(java.lang.Object)
line: 80
com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID,
java.lang.Object, boolean) line: 162
com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run()
line: 249
java.lang.Thread.run() line: 595
The rule implicated in the above thread is:
rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"
salience 999
when
notification : ClusterResourceNotification()
from entry-point "MONITORING"
eval(policyMgr.getMode() ==
ClusterPolicyManagerMode.MAINTENANCE)
then
statistics.increment("IF IN
MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS");
retract(notification);
end
This other thread, apparently a scheduled thread for a rule
with a 10 second duration,
is attempting to insert a fact and owns the 'lock.lock' on
ReteooStatefulSession and
is waiting for the 'ReteooStatefulSession.actionQueue'.
owns: org.drools.common.DefaultAgenda (id=4046)
waiting for: java.util.LinkedList<E> (id=207)
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
line: 1480
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle,
java.lang.Object, org.drools.rule.Rule,
org.drools.spi.Activation, org.drools.reteoo.ObjectTypeConf)
line: 1051
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object,
boolean, boolean, org.drools.rule.Rule,
org.drools.spi.Activation) line: 1001
org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object,
boolean) line: 114
org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object)
line: 108
com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper,
com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification,
org.drools.FactHandle, java.lang.String,
org.drools.FactHandle,
com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager,
org.apache.log4j.Logger) line: not available
com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper,
org.drools.WorkingMemory) line: not available
org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation)
line: 934
org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext)
line: 70
org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
line: 132
org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
line: 110
java.util.concurrent.FutureTask$Sync.innerRun() line: 269
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run()
line: 123
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
line: 65
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run()
line: 168
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable)
line: 650
java.util.concurrent.ThreadPoolExecutor$Worker.run() line:
675
java.lang.Thread.run() line: 595
The rule for this task looks like:
rule "DETECT MONITORING HAS STOPPED"
duration(10s)
salience 1000
when
lastNotification :
DataServerNotification($resourceName : resourceName)
from entry-point "MONITORING"
not (DataServerNotification(resourceName ==
$resourceName,
this after [10s] lastNotification)
from entry-point "MONITORING")
not (ManagerFailedAlarm(expired == false,
resourceName == $resourceName))
not (DataSource(name == $resourceName,
state == ResourceState.SHUNNED ||
state == ResourceState.FAILED))
then
Object[] params = {$resourceName};
if (policyMgr.getMode() !=
ClusterPolicyManagerMode.MAINTENANCE)
{
lastNotification.setResourceState(ResourceState.UNKNOWN);
ManagerFailedAlarm alarm =
new
ManagerFailedAlarm(lastNotification, "rule detected monitor
stop",
6, AlarmSeverity.FAULT);
logger.info(alarm.toString());
insert(alarm);
update(lastNotification);
}
end
_______________________________________________
rules-dev mailing list
rules-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/rules-dev