We’re running DROOLS 6.1.0-SNAPSHOT and are seeing unbounded memory growth for certain rules. I’ve put together an example that reproduces the problem using a rule based on (but not identical to) one that we are seeing memory leaks. I’m not sure if the leaks are caused by poorly written rules or by something else.

The attached example can be built and run using maven, i.e. “mvn install exec:exec” It’s running DROOLS in stream mode using a real-time clock and using EQUALITY equals behavior. The input to the rules are events that are identified by their eventId (a long value) and a state enumeration having 3 possible values: ACTIVE, INACTIVE and ACCEPTED. The rules are supposed to work like this:

1. If an event is received with a state of ACTIVE and no event having that eventId has been captured, then capture the event data in a new fact (CapturedEvent) and delete the event.
2. If an event is received with a state of INACTIVE and an event having the same eventId has previously been captured, then delete the CapturedEvent fact.
3. If, after 10 seconds the CapturedEvent fact is still there then set its state to ACCEPTED.
4. All other events  facts are deleted.

Not a real useful example, but it does demonstrate either a memory leak or a poorly written set of rules.

After creating the KieSession in streams mode, the demo starts inserting event facts every 5ms. There are 1000 different eventIds - each iteration of events being inserted sends one event with each eventId. Initially the events state is set to ACTIVE. Periodically the events state is set to INACTIVE for one iteration, then back to ACTIVE. The intent is that 1000 different events will, 10 seconds after they’re first inserted, transition from ACTIVE to ACCEPTED. Then shortly after the INACTIVE events are sent and all 1000 CapturedEvent facts are deleted. The cycle then repeats until 500,000 events have been inserted into the session. Because all events are deleted by one of the rules, at the end of the run there should be 1000 facts of type CapturedEvent in the session since the final iteration of events all have a state of ACTIVE.

For example, the set of events look something like this:

Time ID State
0.000s 0 ACTIVE
0.005s 1 ACTIVE
0.010s 2 ACTIVE
4.995s 999 ACTIVE
5.000s 0 ACTIVE
5.005s 1 ACTIVE
9.995s 999 ACTIVE
10.000s 0 ACTIVE
10.005s 1 ACTIVE
14.995s 999 ACTIVE
15.000s 0 INACTIVE
15.005s 1 INACTIVE
15.010s 2 INACTIVE
19.995s 999 INACTIVE
20.000s 0 ACTIVE

and so on until a total of 500,000 events are inserted. Each iteration of 1000 events takes 5 seconds. So basically the first three iterations the events are ACTIVE then the fourth iteration the events are INACTIVE, then three more iterations of ACTIVE, etc.

What actually happens is that the rules actually work but only once for each ID. So, for example, the first insert of event ID 0 with state of ACTIVE matches the “capture active” rule as expected. The second insert of event ID 0 with state of ACTIVE matches the “delete events” rule. 10 seconds after the first event with ID 0 arrives, the “ensure active” rule matches. The third ACTIVE event also matches the “delete event” rule. When the fourth event with ID 0 and state INACTIVE is inserted that matches the “restart timer” rule which deletes the CapturedEvent fact. When the next event with ID 0 and state ACTIVE is inserted, I expected the “capture active” rule to fire because the CapturedEvent fact was deleted. But instead the “delete events” rule matches, and from this point on all events trigger the “delete events” rule.

After garbage collection, analysis of the heap dump shows there are 248,000 instances of org.drools.core.reteoo.RightTuple objects and the same number of org.drools.core.common.PhreakPropagationContext objects. Also, there are 124,000 instances of org.drools.core.common.EventFactHandle objects, even though the reported fact count is 0.

The demo project is attached.