I'll poke around.
--- On Wed, 11/4/09, Michael Neale <michael.neale(a)gmail.com> wrote:
From: Michael Neale <michael.neale(a)gmail.com>
Subject: Re: [rules-dev] Deadlock in the Drools core - suggested patch appears to resolve
the issue
To: "Rules Dev List" <rules-dev(a)lists.jboss.org>
Date: Wednesday, November 4, 2009, 9:01 PM
That is great news. If there are
other places we can move to
java.util.concurrent in older code, that would also be
welcome,
j.u.concurrent is awesome magic which saves so much pain.
On Thu, Nov 5, 2009 at 12:06 PM, Edward Archibald
<edward.archibald(a)continuent.com>
wrote:
> Hello Michael and Greg,
>
> I have pulled the drools head, made the patch that
Greg suggested (thanks Greg!) and deployed the drools-core
jar with my app. Prior to this change, I was able to
reproduce the deadlock - verified in the debugger and in
exactly the same place as my earlier post - roughly 50% of
the time. I have tried the same test scenario, now, 10
times with no failures.
>
> >From what I can tell, this problem will easily
happen every time that I am already in the 'delayed
execution' code created by the rule with the 'duration()'
qualifier and, at the same time, I get a new 'fact' and
attempt to insert it. Knowing this, I can probably come up
with a simpler test case. I'll give it a shot.
>
> Thanks, again, to you both for the quick responses.
>
> Edward
>
> -----Original Message-----
> From: rules-dev-bounces(a)lists.jboss.org
[mailto:rules-dev-bounces@lists.jboss.org]
On Behalf Of Michael Neale
> Sent: Wednesday, November 04, 2009 2:26 AM
> To: Rules Dev List
> Subject: Re: [rules-dev] Deadlock in the Drools core -
Drools 5.0 - any suggestions for resolution?
>
> ha - was just musing with someone the other day who
uses "duration"
> anymore ;) I guess its still useful to people !
>
> I would say that the "duration" codebase is probably
fairly "old" - in
> the sense that it probably pre-dates the availability
of
> j.u.concurrent (which was java 5 I think? ) - so
please try out that
> patch, if it works, we can probably pull it in (hoping
Edson can take
> a look).
>
> On Wed, Nov 4, 2009 at 4:43 PM, Greg Barton <greg_barton(a)yahoo.com>
wrote:
>> Well, I'm not sure how to avoid the deadlock
without changing the drools codebase. I was, however, able
to change the type of AbstractWorkingMemory.actionQueue to
java.util.concurrent.ConcurrentLinkedQueue and remove the
synchronization over the queue with no apparent ill effects.
(Two tests failed for drools-core, but they failed whether
the change was made or not.) Also I don't like the fact
that the current code synchronizes on actionQueue, but then
exposes it outside the class through the getActionQueue()
method, where access can be unsynchronized. Changing it to
ConcurrentLinkedQueue makes it safe to expose externally.
(Not to mention that the lock can be stolen externally with
the current code.)
>>
>> diff attached. If you can run drools compiled
from trunk, apply the diff and see if it resolves the
deadlock. If it does it's up to the drools devs as to
whether the change should be made. I'm just hacking about.
:P
>>
>> --- On Tue, 11/3/09, Edward Archibald <edward.archibald(a)continuent.com>
wrote:
>>
>>> From: Edward Archibald <edward.archibald(a)continuent.com>
>>> Subject: [rules-dev] Deadlock in the Drools
core - Drools 5.0 - any suggestions for resolution?
>>> To: "rules-dev(a)lists.jboss.org"
<rules-dev(a)lists.jboss.org>
>>> Date: Tuesday, November 3, 2009, 9:41 PM
>>>
>>> I found the following deadlock which is,
apparently, due to
>>> the concurrent execution
>>> of a task for a 'delayed' rule with a
concurrently
>>> executing application thread attempting to get
access to a
>>> 'global'. Any recommendations for avoiding
this type
>>> of deadlock besides not using rules with
'duration()' etc.
>>> which cause asynchronous execution with
respect to my main
>>> application thread?
>>>
>>> This problem is somewhat difficult to
reproduce on demand
>>> but it does come up frequently when the
'delayed' rule
>>> "DETECT MONITORING HAS STOPPED" is activated
as a result of
>>> the trigger conditions.
>>>
>>>
===================================================================================
>>>
>>> This thread, my application's
EnterprisePolicyManager
>>> thread, is attempting to get access to a
global, policyMgr,
>>> and is waiting for
>>> the 'lock.lock' on RetooStatefulSession
>>>
>>> It owns the
'ReteooStatefulSession.actionQueue'
>>> and is waiting for the
ReteooStatefulSession.lock.lock
>>>
>>> owns: java.util.LinkedList<E>
(id=207)
>>> waited by: Thread [pool-3-thread-1]
(Suspended)
>>> owns:
>>>
com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine
>>> (id=208)
>>> sun.misc.Unsafe.park(boolean, long) line: not
available
>>> [native method] [local variables unavailable]
>>> java.util.concurrent.locks.LockSupport.park()
line: 118
>>> [local variables unavailable]
>>>
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
>>> line: 681 [local variables unavailable]
>>>
java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>>> int) line: 711
>>>
java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int)
>>> line: 1041
>>>
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock()
>>> line: 184 [local variables unavailable]
>>>
java.util.concurrent.locks.ReentrantLock.lock() line: 256
>>> [local variables unavailable]
>>>
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String)
>>> line: 587
>>>
com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple,
>>> org.drools.rule.Declaration[],
org.drools.WorkingMemory,
>>> java.lang.Object) line: not available
>>>
org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple,
>>> org.drools.WorkingMemory, java.lang.Object)
line: 117
>>>
org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple,
>>> org.drools.spi.PropagationContext,
>>> org.drools.common.InternalWorkingMemory) line:
180
>>>
org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext,
>>> org.drools.common.InternalWorkingMemory,
>>> org.drools.reteoo.LeftTuple) line: 117
>>>
org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple,
>>> org.drools.reteoo.RightTuple,
>>> org.drools.spi.PropagationContext,
>>> org.drools.common.InternalWorkingMemory,
boolean) line: 28
>>>
org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle,
>>> org.drools.spi.PropagationContext,
>>> org.drools.common.InternalWorkingMemory) line:
175
>>>
org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle,
>>> org.drools.spi.PropagationContext,
>>> org.drools.common.InternalWorkingMemory) line:
42
>>>
org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator,
>>> org.drools.common.InternalWorkingMemory) line:
326
>>>
org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory)
>>> line: 221
>>>
org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory)
>>> line: 394
>>>
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
>>> line: 1486
>>>
org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle,
>>> java.lang.Object, org.drools.rule.Rule,
>>> org.drools.spi.Activation) line: 158
>>>
org.drools.common.NamedEntryPoint.insert(java.lang.Object,
>>> boolean, boolean, org.drools.rule.Rule,
>>> org.drools.spi.Activation) line: 122
>>>
org.drools.common.NamedEntryPoint.insert(java.lang.Object)
>>> line: 80
>>>
com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID,
>>> java.lang.Object, boolean) line: 162
>>>
com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run()
>>> line: 249
>>> java.lang.Thread.run() line: 595
>>>
>>> The rule implicated in the above thread is:
>>>
>>> rule "IF IN MAINTENANCE MODE, CONSUME ALL
NOTIFICATIONS"
>>> salience 999
>>> when
>>> notification :
ClusterResourceNotification()
>>> from entry-point "MONITORING"
>>> eval(policyMgr.getMode() ==
>>> ClusterPolicyManagerMode.MAINTENANCE)
>>> then
>>> statistics.increment("IF IN
>>> MAINTENANCE MODE, CONSUME ALL
NOTIFICATIONS");
>>> retract(notification);
>>> end
>>>
>>>
>>>
>>> This other thread, apparently a scheduled
thread for a rule
>>> with a 10 second duration,
>>> is attempting to insert a fact and owns the
'lock.lock' on
>>> ReteooStatefulSession and
>>> is waiting for the
'ReteooStatefulSession.actionQueue'.
>>>
>>> owns: org.drools.common.DefaultAgenda
(id=4046)
>>> waiting for: java.util.LinkedList<E>
(id=207)
>>>
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
>>> line: 1480
>>>
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle,
>>> java.lang.Object, org.drools.rule.Rule,
>>> org.drools.spi.Activation,
org.drools.reteoo.ObjectTypeConf)
>>> line: 1051
>>>
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object,
>>> boolean, boolean, org.drools.rule.Rule,
>>> org.drools.spi.Activation) line: 1001
>>>
org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object,
>>> boolean) line: 114
>>>
org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object)
>>> line: 108
>>>
com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper,
>>>
com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification,
>>> org.drools.FactHandle, java.lang.String,
>>> org.drools.FactHandle,
>>>
com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager,
>>> org.apache.log4j.Logger) line: not available
>>>
com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper,
>>> org.drools.WorkingMemory) line: not available
>>>
org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation)
>>> line: 934
>>>
org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext)
>>> line: 70
>>>
org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
>>> line: 132
>>>
org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
>>> line: 110
>>>
java.util.concurrent.FutureTask$Sync.innerRun() line: 269
>>>
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run()
>>> line: 123
>>>
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>>> line: 65
>>>
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run()
>>> line: 168
>>>
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable)
>>> line: 650
>>>
java.util.concurrent.ThreadPoolExecutor$Worker.run() line:
>>> 675
>>> java.lang.Thread.run() line: 595
>>>
>>> The rule for this task looks like:
>>> rule "DETECT MONITORING HAS STOPPED"
>>> duration(10s)
>>> salience 1000
>>> when
>>> lastNotification :
>>> DataServerNotification($resourceName :
resourceName)
>>>
>>> from entry-point "MONITORING"
>>>
>>> not (DataServerNotification(resourceName
==
>>> $resourceName,
>>>
>>>
>>> this after [10s] lastNotification)
>>>
>>> from entry-point "MONITORING")
>>>
>>> not (ManagerFailedAlarm(expired ==
false,
>>>
>>> resourceName == $resourceName))
>>>
>>> not (DataSource(name == $resourceName,
>>>
>>> state == ResourceState.SHUNNED ||
>>>
>>> state == ResourceState.FAILED))
>>>
>>> then
>>> Object[] params = {$resourceName};
>>> if (policyMgr.getMode() !=
>>> ClusterPolicyManagerMode.MAINTENANCE)
>>> {
>>>
>>>
lastNotification.setResourceState(ResourceState.UNKNOWN);
>>> ManagerFailedAlarm alarm =
>>>
>>> new
>>> ManagerFailedAlarm(lastNotification, "rule
detected monitor
>>> stop",
>>>
>>>
>>> 6, AlarmSeverity.FAULT);
>>> logger.info(alarm.toString());
>>> insert(alarm);
>>> update(lastNotification);
>>> }
>>> end
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
_______________________________________________
>>> rules-dev mailing list
>>> rules-dev(a)lists.jboss.org
>>>
https://lists.jboss.org/mailman/listinfo/rules-dev
>>>
>>
>>
>>
>> _______________________________________________
>> rules-dev mailing list
>> rules-dev(a)lists.jboss.org
>>
https://lists.jboss.org/mailman/listinfo/rules-dev
>>
>>
>
>
>
> --
> Michael D Neale
> home:
www.michaelneale.net
> blog:
michaelneale.blogspot.com
>
> _______________________________________________
> rules-dev mailing list
> rules-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/rules-dev
>
> _______________________________________________
> rules-dev mailing list
> rules-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/rules-dev
>
--
Michael D Neale
home:
www.michaelneale.net
blog:
michaelneale.blogspot.com
_______________________________________________
rules-dev mailing list
rules-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/rules-dev