[rules-dev] Determining if a class is an event or not

Fri Nov 9 09:26:55 EST 2007

Edson, 

> We always tried to avoid mandating the user to change its domain model to use
Drools and "contaminating" it with Drools specific classes/interfaces. This way,
different from other engines, we work with simple POJOs as facts. There are also
other reasons for that, like we want to implement support for other fact types,
like ontologies, etc, but the main one I still thing is the one above.
>     So, we can have an Event interface or an annotation that the user may use
to explicit mark a Java class as an event, but IMO, we should use the Event
interface as an alternative way of doing it, and not as the only way.

The event interface was the first thing that came to my mind since it's pretty
straight-forward. However, I see your point. So if there is a way to maintain
the POJO philosophy without making too many awkward allowances, I'm up for it :).

>     A simpler solution we could go for is to mandate that the user declares
all its event classes/interfaces with a declare like statement in the DRL
instead of using the "import event" syntax. Something like that:
> declare event a.b.c.Foo;    So the engine can load the class/interface
a.b.c.Foo and check each asserted fact to see if it is an a.b.c.Foo instance.
That would be easier from an engine perspective, but would be verbose from a
users perspective, since we would not be able to allow wildcards in such
declaration. Would that be better?

Actually, that sounds like a good compromise to me. The question is how many
different event class definitions a user would define. As long as there are not
too many, manually declaring them in a way you suggested shouldn't be a big
deal. I don't know what's the most common approach, but IMO it makes sense to
have a generic event (class) definition, and different events just differ in a
different name and attributes they carry with them. To give an example: In the
use case I'm working on, having lots of different kinds of events though, there
will be exactly one primitive event class/interface. A primitive event will have
a name, a parent id (i.e. some device it refers to) and a list or map,
respectively, of parameters (which have changed). I might have to add another
interface for composite events (i.e. events composed of primitive or other
composite events by means of temporal relationships), but that should be it.
However, other people might prefer to have a different event class for every
different event (like an alarm, a status change, a confirmation, etc.); not
being able to use wildcards and having to declare all the event class manually
may becomee awkward; then I can't think of another way to do it than the one you
initially suggested, although it may be costly...

>     The other question you asked is really interesting and I also don't have a
definitive answer, but have some ideas:> In this regard, for me another question
raises: Without making any restrictions on the structure for a user defined
event class, how do you make sure it has all the  required attributes of an
event (which in my opinion must be a timestamp, at least) and how do you access
them (necessary for temporal relationships)? 
>     In the paper "Discarding unused temporal information in a Production
System", by Dan Teodosiu and Günter Pollak, they state that "Events carry an
additional *implicit* attribute that represents the timestamp. The value of this
attribute is automatically set upon creation of an event to the current value of
the system-clock."
>     I agree with them, except for the "system-clock" part. Replace
"system-clock" by "engine-clock" and I think we get the best compromise.    We
know that most real-world events are generated with an explicit timestamp
attribute, but since we may be handling events coming from different sources
without any guarantee that the sources have synchronized clocks, we can not rely
on the eventual explicit timestamp attribute for reasoning in the engine. Not
relying on that also give us the ability to handle events that do not contain
the explicit timestamp attribute.

I completely agree. Relying only on the explicit timestamp attribute could raise
problems; not only due to unsynchronized clocks, but also network delay might be
reason for events arriving in an order distinct to their original time and order
of occurrence. 

>     What I am thinking is:1. In current Drools implementation, each fact
asserted into the rules engine is wrapped in a FactHandle implementation. This
FactHandle implementation contains metadata attributes that are used by the
engine during fact manipulation and reasoning. What I'm suggesting is we derive
a new EventFactHandle implementation that would contain all event's implicit
attributes and metada we need, including the timestamp. This would allow us to
also support different semantic-models for the timestamp, and that is why I
asked you about the point-interval semantics. The easiest one would be the
point-in-time semantic, obviously.

As you may have noticed, I shirked from answering the question which timesramp
represntation to use.  I didn't find a paper making the decision for me; most of
the times it's just a comparison between these two approaches without clear
arguments for or against it. IMO, interval-based representation seems to be more
general, i.e. more expressive; at the same time, its operators are more complex
and harder to use. The maybe most comprehensive analysis can be found in the
paper "Point- Versus Interval-based Temporal Data Models" from Bohlen, Busatto
and Jensen; another short summary is given in chapter 3, paragraph "Timestamps"
in "Unified Semantics for Event Correlation Over Time and Space in Hybrid
Network Environments" from Yoneki and Bacon. Let me add that the set of temporal
relationships (between two timestamps) in interval-based semantics is much more
extensive than in point-based semantics (13 vs. 2).

As you mentioned, the more intuitive and straight-forward approach to represent
timestamps would be point-interval semantics. However, since we (or at least me
:)) wanna process composite events (which consists of several primitive events),
I'm not quite sure whether this will be enough. Primitive events in general are
instantaneous, however they could be modelled as interval with identical start
and end point. Composite events definitely will last longer than an instant,
i.e. from the occurrence time of the first until the last constituent event. Of
course, detecting them may only require to detect the single constituent events,
however you might wanna store them as a composite event and insert them into
working memory to use them later on. What time(point) do you assign them? The
time of the first constituent, or of the last, or some time in between? Not to
mention that the constituent events could be composite events again.

> 2. We also will need to create the engine clock abstraction, and, as we
discussed previously, we will probably need 4 implementations:
> - "System Clock": a clock that periodically synchronizes with the machine clock
> - "Event based clock": a clock that is updated every time a given event arrives
> 
> - "Pseudo Clock": a generic implementation that is arbitrarily updated by
application code
> - "Event Attribute Clock": a clock that is updated by a configured event
attribute 

> 3. Every time a new event is inserted into the engine, we would then populate
the "implicit" attributes by using a configurable strategy. For instance, if a
user wants to use the explicit value of the object timestamp as the event
timestamp, the configurable strategy would simply copy the value of the
attribute to the implicit event attribute. If the user wants to use the engine
clock timestamp at assertion time as the event timestamp, the configurable
strategy would simply request the current value to the engine clock and populate
the event timestamp with this attribute.

Makes perfect sense. I don't know why this "different strategy" thing didn't
come to my mind, but that's the way we should do it. Preferably the implicit
timestamp should be used, but in case there's none or the user simply wants to
use the engine's timestamp, the excplicit timestamp will be taken. 
Maybe one could even think of a mixed strategy, i.e. maintaining the order of
the events (from possibly different sources) by means of their "local"
timestamp, but adapting them to one global clock (that could be the engine
clock) by balancing different delays from different sources, for instance. Just
an idea :)...

Cheers,
Matthias