[rules-users] To accumulate or not to accumulate

Wolfgang Laun wolfgang.laun at gmail.com
Fri Aug 19 10:05:35 EDT 2011


How would you write "buildwindows", given that its "action" method would be
called once for each Sentence, in random order?

It's very simple to write a very small set of rules to construct all
SentenceWindow facts of size 1 and then to extend them to any desired size,
depending on some parameter.
1. Given a Sentence and no Window beginning with it, create a Window of
length 1.
2. Given a Window of size n < desiredSize and given a Sentence immediately
following it, extend the Window to one of size n+1.
3a. For any Window of desiredSize, inspect it for "closely situated
ManualAnnotations".
3b. If ManualAnnotations have been associated with their containing
Sentences up-front, you just need to find Windows with more than 1
ManualAnnotation, adding them in the RHS of rule 2 above.

-W


2011/8/19 Bruno Freudensprung <bruno.freudensprung at temis.com>

> **
>
> Hi Wolfgang,
>
> Thanks for your answer.
> Sentences are not contiguous (might be some space characters in between)
> but manual annotations cannot overlap sentences (interpret "overlap" in
> terms of Drools Fusion terminology).
> If I had an "inside" operator, do you think the following accumulate option
> could be better?
>
> when
>     *$result : ArrayList() from accumulate ( $s: Sentence(),
> buildwindows($s))*
>    * $w : SentenceWindows () **from $result*
>      a1 : ManualAnnotation (this *inside *$w)
>     a2 : ManualAnnotation (this != a1, this *inside *$w)
> then
>     ... do something with a1 and a2 since they are "close" to each other
>  end
>
> Does anyone know something about accumulator parametrization (looking at
> the source code it does not seem to be possible, though)?
> Maybe a syntax inspired of operator parametrization could be nice:
>
>      $result : ArrayList() from accumulate ( $s: Sentence(), *buildwindows[3]($s)
> *)
>
> Best regards,
>
> Bruno.
>
> Le 19/08/2011 13:55, Wolfgang Laun a écrit :
>
> There are some details that one should consider before deciding on a
> particular implementation technique.
>
>    - Are all Sentences contiguous, i.e., s1.end = pred( s2.start )
>    - Can a ManualAnnotation start on one Sentence and end in the next or
>    any further successor?
>
> As in all problems where constraints depend on an order between facts,
> performance is going to be a problem with increasing numbers of Sentences
> and ManualAnnotations.
>
> Your accumulate plan could be a very inefficient approach. Creating O(N*N)
> pairs and then looking for an overlapping window is much worse than looking
> at each window, for instance. But it depends on the expected numbers for
> both.
>
> -W
>
>
>
> 2011/8/19 Bruno Freudensprung <bruno.freudensprung at temis.com>
>
>>  Hello,
>>
>> I am trying to implement rules handling "Sentence", "ManualAnnotation"
>> objects (imagine someone highligthing words of the document). Basically
>> "Sentence" objects have "start" and "end" positions (fields) into the text
>> of a document, and they are Comparable according to their location into the
>> document.
>>
>> I need to write rules using the notion "window of consecutive sentences".
>>
>> Basically I am not very interested by those "SentenceWindow" objects, I
>> just need them to define a kind of proximity between "ManualAnnotation"
>> objects.
>> What I eventually need in the "when" of my rule is something like:
>>
>> when
>>     ... maybe something creating the windows
>>     a1 : ManualAnnotation ()
>>     a2 : ManualAnnotation (this != a1)
>>     SentenceWindow (this includes a1, this includes a2)
>> then
>>     ... do something with a1 and a2 since they are "close" to each other
>> end
>>
>> As I don't know the "internals" of Drools, I would like to have your
>> opinion about what the best "idiom":
>>
>>    - create all SentenceWindow objects and insert them in the working
>>    memory, then write rules against all the facts (SentenceWindow and
>>    ManualAnnotation)
>>     - implement an accumulator that will create a list of  SentenceWindow
>>    object
>>
>>
>> The first option could look like:
>>
>> rule "Create sentence windows"
>>    when
>>       # find 3 consecutive sentences
>>       s1 : Sentence()
>>       s2 : Sentence(this > s1)
>>       s3 : Sentence(this > s2)
>>       not Sentence(this != s2 && > s1 && < s3)
>>    then
>>       SentenceWindow swindow = new SentenceWindow();
>>       swindow.setStart(s1.getStart());
>>       swindow.setTheend(s3.getEnd());
>>       insert(swindow);
>> end
>>
>> ... Then use the first rule "as is".
>>
>> The accumulator option could look like (I am not really sure the syntax is
>> correct) :
>>
>> when
>>     *$result : ArrayList() from accumulate ( $s: Sentence(),
>> buildwindows($s))*
>>     a1 : ManualAnnotation ()
>>     a2 : ManualAnnotation (this != a1)
>>    * SentenceWindows (this includes a1, this includes a2) **from $result*
>>  then
>>     ... do something with a1 and a2 since they are "close" to each other
>>  end
>>
>> Is it possible to decide if one way is best than the other?
>>
>> And one last question: it is possible to "parametrize" an accumulator (in
>> order to provide the number of sentences that should be put in the windows)?
>> I mean something like:
>>
>> when
>>     $result : ArrayList() from accumulate ( $s: Sentence(), *
>> buildwindows(3,* $s))
>>
>>
>> Thanks in advance for you insights,
>>
>> Best regards,
>>
>> Bruno.
>>
>> _______________________________________________
>> rules-users mailing list
>> rules-users at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/rules-users
>>
>>
>
> _______________________________________________
> rules-users mailing listrules-users at lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/rules-users
>
>
>
> _______________________________________________
> rules-users mailing list
> rules-users at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/rules-users/attachments/20110819/e90918da/attachment.html 


More information about the rules-users mailing list