Hi All,
I recently ran into an issue which I
believe might point to a concurrency issue.
My server stopped processing new requests,
so I did a thread dump. In examining the
dump, I found that all of the processing
threads, save two, were blocking while
trying to acquire the lock in
NamedEntryPoint.insert. Both of the other
two threads appeared to be infinitely
looping in the NamedEntryPoint.insert
method. Here are snippets of the stack
traces:
ActiveMQ Session Task"
prio=10 tid=0x00002aab0003b000 nid=0x7b98
runnable
[0x000000004c086000..0x000000004c087c90]
java.lang.Thread.State: RUNNABLE at
org.drools.util.ObjectHashMap.remove(ObjectHashMap.java:121)
at
org.drools.common.SingleThreadedObjectStore.removeHandle(SingleThreadedObjectStore.java:150)
at
org.drools.common.NamedEntryPoint.retract(NamedEntryPoint.java:296)
at
org.drools.common.NamedEntryPoint.retract(NamedEntryPoint.java:245)
at
org.drools.reteoo.ReteooWorkingMemory$WorkingMemoryReteExpireAction.execute(ReteooWorkingMemory.java:350)
at
org.drools.common.AbstractWorkingMemory.executeQueuedActions(AbstractWorkingMemory.java:1488)
at
org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:158)
at
org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:122)
at
org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:80)
at
org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:28)
at
ActiveMQ Session Task"
prio=10 tid=0x000000005a35cc00 nid=0xdf6
runnable
[0x000000004a268000..0x000000004a269a90]
java.lang.Thread.State: RUNNABLE at
org.drools.util.AbstractHashTable.resize(AbstractHashTable.java:115)
at
org.drools.util.ObjectHashMap.put(ObjectHashMap.java:78)
at
org.drools.common.SingleThreadedObjectStore.addHandle(SingleThreadedObjectStore.java:136)
at
org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:113)
at
org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:80)
at
org.drools.common.NamedEntryPoint.insert(NamedEntryPoint.java:28)
at
So it seems like one
while the first thread is holding the
lock and is attempting to remove an
object handle from the object store in
NamedEntryPoint, the other thread is
trying to resize that same object
store in response to an addHandle call
that puts it over the threshold. I
haven't worked out exactly how these
concurrent accesses to the same object
store by two different threads causes
an infinite loop in both threads, but
it seems like the call to
SingleThreadedObjectStore.addHandle
should be preceded by acquiring the
lock.
Is this correct? I
can imagine that resizing a large hash
map could potentially take a long time
and thus synchronizing this call could
impact performance, but somehow, the
action of resizing the table must be
protected in some way from adversely
impacting other operations on the
table.
Any help would be
appreciated.
thanks,
Norman