[rules-users] To accumulate or not to accumulate

Bruno Freudensprung bruno.freudensprung at temis.com
Fri Aug 19 10:41:50 EDT 2011


I am not sure I understand what you mean by "random order" but I guess 
it has to do with my ArrayList result type.
What I had in mind is to put all sentences in a TreeSet during the 
"action" method, and finally issue an ArrayList result object by 
iterating over the TreeSet and grouping sentences.
My first guess was that such an accumulator might be faster than a 
construction of windows using rules.
However I admit your suggestion is very elegant, and I thank you for 
that! I am probably still too imperative-minded...

Regards,

Bruno.

Le 19/08/2011 16:05, Wolfgang Laun a écrit :
> How would you write "buildwindows", given that its "action" method 
> would be called once for each Sentence, in random order?
>
> It's very simple to write a very small set of rules to construct all 
> SentenceWindow facts of size 1 and then to extend them to any desired 
> size, depending on some parameter.
> 1. Given a Sentence and no Window beginning with it, create a Window 
> of length 1.
> 2. Given a Window of size n < desiredSize and given a Sentence 
> immediately following it, extend the Window to one of size n+1.
> 3a. For any Window of desiredSize, inspect it for "closely situated 
> ManualAnnotations".
> 3b. If ManualAnnotations have been associated with their containing 
> Sentences up-front, you just need to find Windows with more than 1 
> ManualAnnotation, adding them in the RHS of rule 2 above.
>
> -W
>
>
> 2011/8/19 Bruno Freudensprung <bruno.freudensprung at temis.com 
> <mailto:bruno.freudensprung at temis.com>>
>
>
>     Hi Wolfgang,
>
>     Thanks for your answer.
>     Sentences are not contiguous (might be some space characters in
>     between) but manual annotations cannot overlap sentences
>     (interpret "overlap" in terms of Drools Fusion terminology).
>     If I had an "inside" operator, do you think the following
>     accumulate option could be better?
>
>     when
>     *$result : ArrayList() from accumulate ( $s: Sentence(),
>     buildwindows($s))*
>     *$w : SentenceWindows () **from $result*
>         a1 : ManualAnnotation (this *inside *$w)
>         a2 : ManualAnnotation (this != a1, this *inside *$w)
>     then
>         ... do something with a1 and a2 since they are "close" to each
>     other
>     end
>
>     Does anyone know something about accumulator parametrization
>     (looking at the source code it does not seem to be possible, though)?
>     Maybe a syntax inspired of operator parametrization could be nice:
>
>         $result : ArrayList() from accumulate ( $s: Sentence(),
>     *buildwindows[3]($s)*)
>
>     Best regards,
>
>     Bruno.
>
>     Le 19/08/2011 13:55, Wolfgang Laun a écrit :
>>     There are some details that one should consider before deciding
>>     on a particular implementation technique.
>>
>>         * Are all Sentences contiguous, i.e., s1.end = pred( s2.start )
>>         * Can a ManualAnnotation start on one Sentence and end in the
>>           next or any further successor?
>>
>>     As in all problems where constraints depend on an order between
>>     facts, performance is going to be a problem with increasing
>>     numbers of Sentences and ManualAnnotations.
>>
>>     Your accumulate plan could be a very inefficient approach.
>>     Creating O(N*N) pairs and then looking for an overlapping window
>>     is much worse than looking at each window, for instance. But it
>>     depends on the expected numbers for both.
>>
>>     -W
>>
>>
>>
>>     2011/8/19 Bruno Freudensprung <bruno.freudensprung at temis.com
>>     <mailto:bruno.freudensprung at temis.com>>
>>
>>         Hello,
>>
>>         I am trying to implement rules handling "Sentence",
>>         "ManualAnnotation" objects (imagine someone highligthing
>>         words of the document). Basically "Sentence" objects have
>>         "start" and "end" positions (fields) into the text of a
>>         document, and they are Comparable according to their location
>>         into the document.
>>
>>         I need to write rules using the notion "window of consecutive
>>         sentences".
>>
>>         Basically I am not very interested by those "SentenceWindow"
>>         objects, I just need them to define a kind of proximity
>>         between "ManualAnnotation" objects.
>>         What I eventually need in the "when" of my rule is something
>>         like:
>>
>>         when
>>             ... maybe something creating the windows
>>             a1 : ManualAnnotation ()
>>             a2 : ManualAnnotation (this != a1)
>>             SentenceWindow (this includes a1, this includes a2)
>>         then
>>             ... do something with a1 and a2 since they are "close" to
>>         each other
>>         end
>>
>>         As I don't know the "internals" of Drools, I would like to
>>         have your opinion about what the best "idiom":
>>
>>             * create all SentenceWindow objects and insert them in
>>               the working memory, then write rules against all the
>>               facts (SentenceWindow and ManualAnnotation)
>>             * implement an accumulator that will create a list of 
>>               SentenceWindow object
>>
>>
>>         The first option could look like:
>>
>>         |||rule "Create sentence windows"
>>            when
>>               # find 3 consecutive sentences
>>               s1 : Sentence()
>>               s2 : Sentence(this > s1)
>>               s3 : Sentence(this > s2)
>>               not Sentence(this != s2 && > s1 && < s3)
>>            then
>>               SentenceWindow swindow = new SentenceWindow();
>>               swindow.setStart(s1.getStart());
>>               swindow.setTheend(s3.getEnd());
>>               insert(swindow);
>>         end|
>>
>>         ... Then use the first rule "as is".
>>
>>         The accumulator option could look like (I am not really sure
>>         the syntax is correct) :
>>
>>         when
>>         *$result : ArrayList() from accumulate ( $s: Sentence(),
>>         buildwindows($s))*
>>             a1 : ManualAnnotation ()
>>             a2 : ManualAnnotation (this != a1)
>>         *SentenceWindows (this includes a1, this includes a2) **from
>>         $result*
>>         then
>>             ... do something with a1 and a2 since they are "close" to
>>         each other
>>         end
>>
>>         Is it possible to decide if one way is best than the other?
>>
>>         And one last question: it is possible to "parametrize" an
>>         accumulator (in order to provide the number of sentences that
>>         should be put in the windows)?
>>         I mean something like:
>>
>>         when
>>             $result : ArrayList() from accumulate ( $s: Sentence(),
>>         *buildwindows(3,* $s))
>>
>>
>>         Thanks in advance for you insights,
>>
>>         Best regards,
>>
>>         Bruno.
>>
>>         _______________________________________________
>>         rules-users mailing list
>>         rules-users at lists.jboss.org <mailto:rules-users at lists.jboss.org>
>>         https://lists.jboss.org/mailman/listinfo/rules-users
>>
>>
>>
>>     _______________________________________________
>>     rules-users mailing list
>>     rules-users at lists.jboss.org  <mailto:rules-users at lists.jboss.org>
>>     https://lists.jboss.org/mailman/listinfo/rules-users
>
>
>     _______________________________________________
>     rules-users mailing list
>     rules-users at lists.jboss.org <mailto:rules-users at lists.jboss.org>
>     https://lists.jboss.org/mailman/listinfo/rules-users
>
>
>
> _______________________________________________
> rules-users mailing list
> rules-users at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/rules-users/attachments/20110819/ca66857d/attachment.html 


More information about the rules-users mailing list