[rules-users] PLEASE HELP: Possible concurrency issue in Drools

Tue Aug 17 19:49:03 EDT 2010

Hi All,

I've posted on this topic twice and logged a JIRA ticket 
(https://jira.jboss.org/browse/JBRULES-2651) as well.  I've received no 
responses and the bug hasn't been updated since I logged it.

This is a serious issue as it causes my production system to freeze up and it 
has to be restarted.  It's consistently reproducible (usually takes a few 
days).  

Can someone please take a quick look at the code?  Does the call to 
SingleThreadedObjectStore.addHandle in NamedEntryPoint.insert need to be 
preceded by acquiring the lock?

Thanks again for your help.

Norman

________________________________
From: Norman C <rent_my_time at yahoo.com>
To: rules-users at lists.jboss.org
Sent: Wed, August 4, 2010 11:23:55 PM
Subject: Re: Possible concurrency issue in Drools

I've run into this issue a few more times.  Should I log a JIRA ticket for 
this?  Any advice would be appreciated.

Thanks,
Norman

________________________________
From: Norman C <rent_my_time at yahoo.com>
To: rules-users at lists.jboss.org
Sent: Sat, July 31, 2010 9:56:26 PM
Subject: Re: Possible concurrency issue in Drools

All,

Just wanted to mention, I'm using version 5.0.1 of Drools.

Thanks,
Norman

________________________________
From: Norman C <rent_my_time at yahoo.com>
To: rules-users at lists.jboss.org
Sent: Sat, July 31, 2010 9:50:19 PM
Subject: Possible concurrency issue in Drools

Hi All,

I recently ran into an issue which I believe might point to a concurrency 
issue.  My server stopped processing new requests, so I did a thread dump.  In 
examining the dump, I found that all of the processing threads, save two, were 
blocking while trying to acquire the lock in NamedEntryPoint.insert.  Both of 
the other two threads appeared to be infinitely looping in the 
NamedEntryPoint.insert method.  Here are snippets of the stack traces:

ActiveMQ Session Task" prio=10 tid=0x00002aab0003b000 nid=0x7b98 runnable 
[0x000000004c086000..0x000000004c087c90]    java.lang.Thread.State: RUNNABLE    
at org.drools.util.ObjectHashMap.remove(ObjectHashMap.java:121)    at 
org.drools.common.SingleThreadedObjectStore.removeHandle(SingleThreadedObjectStore.java:150)
      at org.drools.common.NamedEntryPoint.retract(NamedEntryPoint.java:296)     
at org.drools.common.NamedEntryPoint.retract(NamedEntryPoint.java:245) at 
org.drools.reteoo.ReteooWorkingMemory$WorkingMemoryReteExpireAction.execute(ReteooWorkingMemory.java:350)
      at 
org.drools.common.AbstractWorkingMemory.executeQueuedActions(AbstractWorkingMemory.java:1488)
      at org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:158)      
at org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:122)   at 
org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:80)    at 
org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:28)    at 

ActiveMQ Session Task" prio=10 tid=0x000000005a35cc00 nid=0xdf6 runnable 
[0x000000004a268000..0x000000004a269a90]    java.lang.Thread.State: RUNNABLE    
at org.drools.util.AbstractHashTable.resize(AbstractHashTable.java:115) at 
org.drools.util.ObjectHashMap.put(ObjectHashMap.java:78)   at 
org.drools.common.SingleThreadedObjectStore.addHandle(SingleThreadedObjectStore.java:136)
      at org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:113)      
at org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:80)    at 
org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:28)    at 

So it seems like one while the first thread is holding the lock and is 
attempting to remove an object handle from the object store in NamedEntryPoint, 
the other thread is trying to resize that same object store in response to an 
addHandle call that puts it over the threshold.  I haven't worked out exactly 
how these concurrent accesses to the same object store by two different threads 
causes an infinite loop in both threads, but it seems like the call to 
SingleThreadedObjectStore.addHandle should be preceded by acquiring the lock.

Is this correct?  I can imagine that resizing a large hash map could potentially 
take a long time and thus synchronizing this call could impact performance, but 
somehow, the action of resizing the table must be protected in some way from 
adversely impacting other operations on the table.

Any help would be appreciated.

thanks,
Norman

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/rules-users/attachments/20100817/3a0e0ccb/attachment.html