[rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?

Edward Archibald edward.archibald at continuent.com
Tue Nov 3 22:41:21 EST 2009


I found the following deadlock which is, apparently, due to the concurrent execution
of a task for a 'delayed' rule with a concurrently executing application thread attempting to get access to a 'global'.  Any recommendations for avoiding this type of deadlock besides not using rules with 'duration()' etc. which cause asynchronous execution with respect to my main application thread?

This problem is somewhat difficult to reproduce on demand but it does come up frequently when the 'delayed' rule "DETECT MONITORING HAS STOPPED" is activated as a result of the trigger conditions.

===================================================================================

This thread, my application's EnterprisePolicyManager thread, is attempting to get access to a global, policyMgr, and is waiting for
the 'lock.lock' on RetooStatefulSession

It owns the 'ReteooStatefulSession.actionQueue'
and is waiting for the ReteooStatefulSession.lock.lock

owns: java.util.LinkedList<E>  (id=207)
waited by: Thread [pool-3-thread-1] (Suspended)
owns: com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine  (id=208)
sun.misc.Unsafe.park(boolean, long) line: not available [native method] [local variables unavailable]
java.util.concurrent.locks.LockSupport.park() line: 118 [local variables unavailable]
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() line: 681 [local variables unavailable]
java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, int) line: 711
java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int) line: 1041
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock() line: 184 [local variables unavailable]
java.util.concurrent.locks.ReentrantLock.lock() line: 256 [local variables unavailable]
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String) line: 587
com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple, org.drools.rule.Declaration[], org.drools.WorkingMemory, java.lang.Object) line: not available
org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple, org.drools.WorkingMemory, java.lang.Object) line: 117
org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple, org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory) line: 180
org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory, org.drools.reteoo.LeftTuple) line: 117
org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple, org.drools.reteoo.RightTuple, org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory, boolean) line: 28
org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle, org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory) line: 175
org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle, org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory) line: 42
org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator, org.drools.common.InternalWorkingMemory) line: 326
org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory) line: 221
org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory) line: 394
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() line: 1486
org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle, java.lang.Object, org.drools.rule.Rule, org.drools.spi.Activation) line: 158
org.drools.common.NamedEntryPoint.insert(java.lang.Object, boolean, boolean, org.drools.rule.Rule, org.drools.spi.Activation) line: 122
org.drools.common.NamedEntryPoint.insert(java.lang.Object) line: 80
com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID, java.lang.Object, boolean) line: 162
com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run() line: 249
java.lang.Thread.run() line: 595

The rule implicated in the above thread is:

rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"
salience 999
  when
    notification : ClusterResourceNotification() from entry-point "MONITORING"
    eval(policyMgr.getMode() == ClusterPolicyManagerMode.MAINTENANCE)
  then
     statistics.increment("IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS");
    retract(notification);
end



This other thread, apparently a scheduled thread for a rule with a 10 second duration,
is attempting to insert a fact and owns the 'lock.lock' on ReteooStatefulSession and
is waiting for the 'ReteooStatefulSession.actionQueue'.

owns: org.drools.common.DefaultAgenda  (id=4046)
waiting for: java.util.LinkedList<E>  (id=207)
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() line: 1480
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle, java.lang.Object, org.drools.rule.Rule, org.drools.spi.Activation, org.drools.reteoo.ObjectTypeConf) line: 1051
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object, boolean, boolean, org.drools.rule.Rule, org.drools.spi.Activation) line: 1001
org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object, boolean) line: 114
org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object) line: 108
com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper, com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification, org.drools.FactHandle, java.lang.String, org.drools.FactHandle, com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager, org.apache.log4j.Logger) line: not available
com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper, org.drools.WorkingMemory) line: not available
org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation) line: 934
org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext) line: 70
org.drools.time.impl.JDKTimerService$JDKCallableJob.call() line: 132
org.drools.time.impl.JDKTimerService$JDKCallableJob.call() line: 110
java.util.concurrent.FutureTask$Sync.innerRun() line: 269
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run() line: 123
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) line: 65
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run() line: 168
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) line: 650
java.util.concurrent.ThreadPoolExecutor$Worker.run() line: 675
java.lang.Thread.run() line: 595

The rule for this task looks like:
rule "DETECT MONITORING HAS STOPPED"
duration(10s)
salience 1000
  when
    lastNotification : DataServerNotification($resourceName : resourceName)
                  from entry-point "MONITORING"

    not (DataServerNotification(resourceName == $resourceName,
                                 this after [10s] lastNotification)
                  from entry-point "MONITORING")
   
    not (ManagerFailedAlarm(expired == false,
                   resourceName == $resourceName))

    not (DataSource(name == $resourceName,
                   state == ResourceState.SHUNNED ||
                   state == ResourceState.FAILED))

  then 
    Object[] params = {$resourceName};
    if (policyMgr.getMode() != ClusterPolicyManagerMode.MAINTENANCE)
    {
      lastNotification.setResourceState(ResourceState.UNKNOWN);
      ManagerFailedAlarm alarm =
                        new ManagerFailedAlarm(lastNotification, "rule detected monitor stop",
                                 6, AlarmSeverity.FAULT);
      logger.info(alarm.toString());
      insert(alarm);
      update(lastNotification);
     }
end 









More information about the rules-dev mailing list