Resending...
---------- Forwarded message ----------
Karen and Matthias,
Thanks for the suggestions. A few items that we need to solve before we
decide on a syntax:
1. The qualification of the units: we need to be able (as a CEP engine) to
allow the definition of arbitrary time windows, like seconds, minutes or
hours or compositions of them. So, a few notations could be used: 30 sec, 30
min, 30 hours (or variations of that), 25:10:23 (25 hours and 10 minutes and
23 seconds), etc. I would really like to avoid hard coding the unit parsing
in the DRL parser, since the parser is one of the more delicate pieces of
the code to change. We can avoid that if we use delimiters to declare the
"parameters" of the window, as the parser can simply read it all as a String
chunk and let each window handler deal with its own units. So, the
suggestion is to use () or something like that to provide the parameters to
the window handlers, in a way like that:
window:time( 30 min ) or time-window( 30 min ) or another variation of that
window:events( 60 ) or events-window( 60 ) or another variation of that
2. The smallest "unit" in a rule definition is a Pattern, and right now, a
Pattern looks like this:
[<var-bind> : ] <Pattern-Type>( [<constraints>] ) [from <source>]
<> : means must be replaced by the corresponding data
[] : means optional
Now, what we need is a way to declare window constraints for patterns. The
accumulate example I gave you was not a good one, because accumulates have 2
or more patterns (the inner patterns and the accumulate result pattern). You
can even apply different window constraints for each of the patterns.
So, thinking about a single pattern, how would you define a window for it?
Given the current grammar and the concept of pattern, I see only two
options: either we define the window constraints inside the pattern
declaration ( i.e., near regular constraints), or we create another keyword
to introduce them (like we have the "from" keyword).
option 1: [<var-bind> : ] <Pattern-Type>( [<window-constraints> ,]
[<constraints>] ) [from <source>]
option 2: [<var-bind> : ] <Pattern-Type>( [<constraints>] ) [from
<source>]
[with <window-constraints>]
Note: replace the "with" above with the keyword we chose.
Just as a real world example, if you look at ESPER queries, they do events
correlation like this:
select *
from TxnEventA.win:time(30 minutes) A
join TxnEventC.win:time(60 minutes) C on A.transactionId =
C.transactionId
join TxnEventB.win:time(30 minutes) B on B.transactionId =
C.transactionId
where C.transactionId is null
One example of event aggregation is this:
select min(latencyAC) as minLatencyAC, max(latencyAC) as maxLatencyAC,
avg(latencyAC) as avgLatencyAC
from CombinedEvent.win:time
(30 minutes)
So, you see that the time windows are defined for each pattern and they may
be different among joined patterns.
Accumulate may have multiple inner patterns, and so, we need to be able to
define windows for each one.
On our initial suggestions, we would define the window constraints inside
the pattern, as this is a lot better to avoid grammar ambiguities. The use
of square brackets would help to avoid the need of creating new keywords,
but as long as we decide that it is worth to create at least one keyword
(like "window" for instance), we can avoid the []:
TemperatureReading( window:time(30 sec), temp > 50 ) from streamA
TemperatureReading( window:events(60), window:time(1 min), temp > 50 ) from
streamB
Something closer to what you suggested would be (replace "with" with a
chosen keyword) :
TemperatureReading( temp > 50 ) from streamB with window:events(60),
window:time(1 min)
My preference is to keep that inside the pattern, but would like to hear
from you (if the change proposed in item #1 is ok) and from Mark and
Michael.
3. Regarding "distinct", "group by" and "order by", we want
to support that,
specially "distinct". From a syntax perspective, I would like to use a
similar solution as we chose for the windows, so that we can keep the
language easier to learn, but if it is best for clarity to use them like
"distinct PatternA(...)", I'm ok with that too. Unfortunately, we can't
allow distinct directly for attributes, as different from SQL, we need to
declare all the patterns that make a tuple. So, instead of saying something
like:
Person( distinct address )
we need to say something like:
distinct Address() from $person.address
Address( distinct ) from $person.address
Anyway, item #3 is of minor importance right now.
Sorry for the long e-mail.
Edson
2007/11/27, kw14(a)mail.inf.tu-dresden.de <kw14(a)mail.inf.tu-dresden.de>:
Hi Mark, hi Edson,
in the following you'll find our suggestions regarding Drools syntax
extensions for definition of windows and other node restrictions.
Edson showed us an example of what you had in mind:
Number( avg : doubleValue ) from accumulate( TemperatureReading( [
window:events( 30 ), window:time( 60 sec ), distinct ], $temp :
temperature ), average( $temp ) )
It calculates the average of distinct temperature readings in 60 seconds
or over 30 events (whatever comes first).
Your notation has the advantage of being short, easy to use and orthogonal
to the current grammar.
In contrast to the definition using square brackets, we'd prefer a more
explicit way of defining windows and other constraints. This increases
readability and makes it easier for novices to define rules. You've also
used a similar notation in your talk at Synasc 2007.
So our version would look like this:
Number( avg : doubleValue ) from accumulate within|in|over 60s or 30
events( distinct TemperatureReading($temp : temperature ), average( $temp
))
It allows using nested 'accumulates' while its clear to which
'accumulate'
a window definition belongs. The usage of "in", "within" or
"over" as a
keyword is up to your preference. The "distinct" keyword could be used in
front of either an object or an attribute definition similar to select
"distinct *" vs. "select distinct fieldname" in SQL. We think
it's clearer
to have it in front of the object/attribute it refers to.
For collect, a definition would look similar:
e.g .
ArrayList( ) from collect over 60s ( TemperatureReading( orderBy
temperature > 0 , distinct id!=null ) )
In this case, we also used an "orderBy" keyword similar to distinct.
However, it could also be used like a function as is currently done with
aggregation functions in 'accumulate':
ArrayList( ) from collect over 60s ( TemperatureReading( $temp:temperature
> 0 , distinct id!=null ), orderBy($temp) )
For us, orderBy and groupBy only make sense with the set operator
'collect'. Although, they could be used in 'accumulate', if the order of
values is relevant to the result of a user-defined aggregation function or
if different result sets are created based on a grouping attribute. Are
you planning to support this?
Window definitions can also make sense for the 'exists' and 'not'
operators or a complete rule as already suggested by Edson.
An alternative way of describing the windows and other constructs would be
the one below. However, it would require additional brackets to show the
scope of windows for nested accumulates and was therefore excluded from
our considerations.
Number( avg : doubleValue ) from accumulate( distinct
TemperatureReading($temp : temperature ), average( $temp ) ) over 60s or
30 events
Okay, we are aware that these suggestions would require more extensions in
the Drools grammar, but we believe the resulting rules are easier to
understand than the ones using the square brackets. So, that's open for
discussion. :)
Cheers, Karen
--
Edson Tirelli
JBoss Drools Core Development
Office: +55 11 3529-6000
Mobile: +55 11 9287-5646
JBoss, a division of Red Hat @
www.jboss.com
--
Edson Tirelli
JBoss Drools Core Development
Office: +55 11 3529-6000
Mobile: +55 11 9287-5646
JBoss, a division of Red Hat @
www.jboss.com