[jboss-jira] [JBoss JIRA] (JBRULES-3100) Long.MAX_VALUE duration for "A and not(B after A)" type rules causes invalid session clock time in rule RHS when running with pseudo clock

Wed Jan 4 15:37:10 EST 2012

    [ https://issues.jboss.org/browse/JBRULES-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653888#comment-12653888 ] 

Edson Tirelli commented on JBRULES-3100:
----------------------------------------

Wow, excellent analysis. This is indeed a bad oversight.

As you noticed already, though, this is more than a technical issue, it is also a conceptual one. So, ignoring the code bug for a moment and focusing on the conceptual aspect, in the temporal reasoning algorithm Long.MAX_VALUE and Long.MIN_VALUE are special values that represent the positive and negative infinity. If we take a strict mathematical approach, any number added/subtracted to/from infinity, still results in infinity. So, for any A and any B, the logical formula:

A and not( B after A )

when considered in a dynamic world (i.e., stream mode), would never fire AND would retain A for infinity, waiting to check if B will ever arrive. In a static world (cloud mode), this would not be a problem as the result of the formula would be known immediately.

Now, I am not sure about the best way to handle this situation:

1) maintain reasoning consistency and raise a compile time/runtime warning stating the rule will never fire in stream mode

2) handle this as a special case and fire the rule right away instead of scheduling it

IMHO, (2) does not sound correct as it will yield incorrect results in some situations and it will certainly surprise most users, but I am open for discussions. OTOH, in case of (1), we should not retain A in memory for a rule that will never match. I will do some research on the topic and come back, but would like to hear your opinion.

In any case, the bug needs to be fixed by properly handling the special values Long.MAX_VALUE/Long.MIN_Value in all calculations and also preventing the overflow/underflow of the time counter.

Thanks for the analysis and let me know your thoughts about the subject.

> Long.MAX_VALUE duration for "A and not(B after A)" type rules causes invalid session clock time in rule RHS when running with pseudo clock
> ------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: JBRULES-3100
>                 URL: https://issues.jboss.org/browse/JBRULES-3100
>             Project: Drools
>          Issue Type: Bug
>      Security Level: Public(Everyone can see) 
>          Components: drools-core, drools-core (fusion)
>    Affects Versions: 5.2.0.Final
>            Reporter: Richard Calmbach
>            Assignee: Edson Tirelli
>            Priority: Critical
>             Fix For: 5.4.0.Beta2
>
>
> Ok, this is a subtle one. How did I run into this one? I need dynamic timers that are started in the RHS because Drools LHS syntax does not support dynamic timers. In order to be able to unit test my rules even with dynamic timers, I rely on the session clock (aka TimerService) to schedule and execute these manually defined jobs. Works beautifully with the real-time clock for all my rules, and also works with the pseudo clock for rules that don't have a negated unbounded "after" clause. However, the pseudo clock breaks the timers that are scheduled in the RHS of rules that have a LHS of type "A and not(B after A)". The reason is extremely subtle, and understanding it required an excursion into the inner workings of the rule engine, in particular its mechanism for scheduling activations of rules with non-zero duration. (Good thing that javadoc and sources are now available via Maven repositories for all Drools artifacts. Thank you, Drools team and the m2eclipse "Download Artifact Sources" checkbox!)
> In PseudoClockScheduler.runCallBacks(), the pseudo clock checks whether any scheduled jobs need to be executed by comparing their trigger times to the current time (this.timer). For every job that is due, e.g., a scheduled rule activation, the pseudo clock saves this.timer to a local variable (savedTimer) and sets this.timer temporarily to the trigger time of the job (keeping this.timer at that value for the duration of the job execution, e.g., the firing of the scheduled activation). Now, for rules with a bounded duration (e.g., "A and not(B after[0s, 10s] A"), this trigger time is a reasonable value (current time + duration), and everything works out. However, rules of type "A and not(B after A)" (i.e., with negated unbounded after) have a duration value of Long.MAX_VALUE. When DurationTimer.createTrigger() constructs a new PointInTimeTrigger object, it sets the trigger time to "current time + duration", and with a post-epoch current time (i.e., greater than zero) and duration == Long.MAX_VALUE, this causes wrap-around and sets the trigger time to a very large negative number. Up to this point, I think the behavior of the code is intentional (albeit hacky), to ensure that "negated unbounded after" type rules get activated right away (as they should). However, this approach has unintended, buggy consequences as we shall see. When such a scheduled rule fires, then for the duration of its RHS execution, the session clock has an invalid value (very large negative number). The only way how you would know that is if you queried the session clock, say because you're trying to schedule a dynamic timer because Drools LHS syntax only supports static timers. The upshot is that the dynamic timer job is scheduled with "very large negative number + my trigger delay" as the trigger time, which PseudoClockScheduler.runCallBacks() promptly interprets as a job that is due for execution, causing premature and therefore incorrect triggering of my dynamic timer, thereby breaking my unit tests, while the real-time clock does not exhibit this problem. And that's how I spent my Friday.
> I can work around this bug by rewriting the "A and not(B after A)" construct so that I'm explicitly comparing the timestamp fields instead of using "after". However, the current Drools implementation is broken in more than one way: Say, you're simulating the events at the original Woodstock (or any other events pre-dating the Unix epoch), and you're setting the pseudo clock to a negative value so that Date.toString() gives you the actual dates of the simulated events. As long as current time is a non-positive value, adding duration == Long.MAX_VALUE will yield a time much, much later than current time. This means that "A and not(B after A)" type rules will not fire when they should because their trigger time is in a future far, far away that will never be reached during the simulation. If I interpret this correctly, this also affects rule activation with the real-time clock if it is set to a negative value (there goes your time traveler market share...).
> I'm not sure what the best fix for these bugs is. A real fix probably has to rework how "negated unbounded after" type rule activations get scheduled. The duration == Long.MAX_VALUE hack isn't going to cut it. Maybe these rules don't need scheduling at all since their activations should be created right away? Anyway, tricky stuff.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira