[rules-users] To accumulate or not to accumulate

Bruno Freudensprung bruno.freudensprung at temis.com
Mon Aug 22 09:29:28 EDT 2011


Hi Wolfgang,

I finally decided to test different implementations:

    * first based on an accumulation function
    * second (your suggestion) relying on drools to 1) build all
      SentenceWindows then to 2) locate ManualAnnotations inside those
      Windows
    * third (your suggestion as well) relying on drools to 1) build only
      SentenceWindows that might be interesting (containing one of the
      ManualAnnotations I am looking for) then to 2) locate
      ManualAnnotations inside those Windows

First implementation performs quite well but I am stuck on the 
parametrization (I need to to define build2windows, build3windows, 
build4windows... functions): 47 milliseconds on 100 sentences, 94 ms 
sentences with 1000 sentences
Second implementations is of course sub optimal since it creates many 
useless windows: 125 ms on 100 sentences, 14400 ms on 1000 sentences
Third implementation is very versatile and its performances are 
comparable to the accumulator solution: 93 ms on 100 sentences, 125 ms 
on 1000 sentences

So thanks again for your suggestion; it was definitely useful :-).

Regards,

Bruno.

Le 19/08/2011 17:25, Wolfgang Laun a écrit :
> 2011/8/19 Bruno Freudensprung <bruno.freudensprung at temis.com 
> <mailto:bruno.freudensprung at temis.com>>
>
>
>     I am not sure I understand what you mean by "random order" but I
>     guess it has to do with my ArrayList result type.
>     What I had in mind is to put all sentences in a TreeSet during the
>     "action" method, and finally issue an ArrayList result object by
>     iterating over the TreeSet and grouping sentences.
>
>
> Heh :) I clean forgot that I had done this sort of thing not too long ago.
>
>     My first guess was that such an accumulator might be faster than a
>     construction of windows using rules.
>     However I admit your suggestion is very elegant, and I thank you
>     for that! I am probably still too imperative-minded...
>
>
> Well, a procedural solution would be a reasonable alternative for this 
> problem.
>
> -W
>
>
>     Regards,
>
>     Bruno.
>
>     Le 19/08/2011 16:05, Wolfgang Laun a écrit :
>>     How would you write "buildwindows", given that its "action"
>>     method would be called once for each Sentence, in random order?
>>
>>     It's very simple to write a very small set of rules to construct
>>     all SentenceWindow facts of size 1 and then to extend them to any
>>     desired size, depending on some parameter.
>>     1. Given a Sentence and no Window beginning with it, create a
>>     Window of length 1.
>>     2. Given a Window of size n < desiredSize and given a Sentence
>>     immediately following it, extend the Window to one of size n+1.
>>     3a. For any Window of desiredSize, inspect it for "closely
>>     situated ManualAnnotations".
>>     3b. If ManualAnnotations have been associated with their
>>     containing Sentences up-front, you just need to find Windows with
>>     more than 1 ManualAnnotation, adding them in the RHS of rule 2 above.
>>
>>     -W
>>
>>
>>     2011/8/19 Bruno Freudensprung <bruno.freudensprung at temis.com
>>     <mailto:bruno.freudensprung at temis.com>>
>>
>>
>>         Hi Wolfgang,
>>
>>         Thanks for your answer.
>>         Sentences are not contiguous (might be some space characters
>>         in between) but manual annotations cannot overlap sentences
>>         (interpret "overlap" in terms of Drools Fusion terminology).
>>         If I had an "inside" operator, do you think the following
>>         accumulate option could be better?
>>
>>         when
>>         *$result : ArrayList() from accumulate ( $s: Sentence(),
>>         buildwindows($s))*
>>         *$w : SentenceWindows () **from $result*
>>             a1 : ManualAnnotation (this *inside *$w)
>>             a2 : ManualAnnotation (this != a1, this *inside *$w)
>>         then
>>             ... do something with a1 and a2 since they are "close" to
>>         each other
>>         end
>>
>>         Does anyone know something about accumulator parametrization
>>         (looking at the source code it does not seem to be possible,
>>         though)?
>>         Maybe a syntax inspired of operator parametrization could be
>>         nice:
>>
>>             $result : ArrayList() from accumulate ( $s: Sentence(),
>>         *buildwindows[3]($s)*)
>>
>>         Best regards,
>>
>>         Bruno.
>>
>>         Le 19/08/2011 13:55, Wolfgang Laun a écrit :
>>>         There are some details that one should consider before
>>>         deciding on a particular implementation technique.
>>>
>>>             * Are all Sentences contiguous, i.e., s1.end = pred(
>>>               s2.start )
>>>             * Can a ManualAnnotation start on one Sentence and end
>>>               in the next or any further successor?
>>>
>>>         As in all problems where constraints depend on an order
>>>         between facts, performance is going to be a problem with
>>>         increasing numbers of Sentences and ManualAnnotations.
>>>
>>>         Your accumulate plan could be a very inefficient approach.
>>>         Creating O(N*N) pairs and then looking for an overlapping
>>>         window is much worse than looking at each window, for
>>>         instance. But it depends on the expected numbers for both.
>>>
>>>         -W
>>>
>>>
>>>
>>>         2011/8/19 Bruno Freudensprung <bruno.freudensprung at temis.com
>>>         <mailto:bruno.freudensprung at temis.com>>
>>>
>>>             Hello,
>>>
>>>             I am trying to implement rules handling "Sentence",
>>>             "ManualAnnotation" objects (imagine someone highligthing
>>>             words of the document). Basically "Sentence" objects
>>>             have "start" and "end" positions (fields) into the text
>>>             of a document, and they are Comparable according to
>>>             their location into the document.
>>>
>>>             I need to write rules using the notion "window of
>>>             consecutive sentences".
>>>
>>>             Basically I am not very interested by those
>>>             "SentenceWindow" objects, I just need them to define a
>>>             kind of proximity between "ManualAnnotation" objects.
>>>             What I eventually need in the "when" of my rule is
>>>             something like:
>>>
>>>             when
>>>                 ... maybe something creating the windows
>>>                 a1 : ManualAnnotation ()
>>>                 a2 : ManualAnnotation (this != a1)
>>>                 SentenceWindow (this includes a1, this includes a2)
>>>             then
>>>                 ... do something with a1 and a2 since they are
>>>             "close" to each other
>>>             end
>>>
>>>             As I don't know the "internals" of Drools, I would like
>>>             to have your opinion about what the best "idiom":
>>>
>>>                 * create all SentenceWindow objects and insert them
>>>                   in the working memory, then write rules against
>>>                   all the facts (SentenceWindow and ManualAnnotation)
>>>                 * implement an accumulator that will create a list
>>>                   of  SentenceWindow object
>>>
>>>
>>>             The first option could look like:
>>>
>>>             |||rule "Create sentence windows"
>>>                when
>>>                   # find 3 consecutive sentences
>>>                   s1 : Sentence()
>>>                   s2 : Sentence(this > s1)
>>>                   s3 : Sentence(this > s2)
>>>                   not Sentence(this != s2 && > s1 && < s3)
>>>                then
>>>                   SentenceWindow swindow = new SentenceWindow();
>>>                   swindow.setStart(s1.getStart());
>>>                   swindow.setTheend(s3.getEnd());
>>>                   insert(swindow);
>>>             end|
>>>
>>>             ... Then use the first rule "as is".
>>>
>>>             The accumulator option could look like (I am not really
>>>             sure the syntax is correct) :
>>>
>>>             when
>>>             *$result : ArrayList() from accumulate ( $s: Sentence(),
>>>             buildwindows($s))*
>>>                 a1 : ManualAnnotation ()
>>>                 a2 : ManualAnnotation (this != a1)
>>>             *SentenceWindows (this includes a1, this includes a2)
>>>             **from $result*
>>>             then
>>>                 ... do something with a1 and a2 since they are
>>>             "close" to each other
>>>             end
>>>
>>>             Is it possible to decide if one way is best than the other?
>>>
>>>             And one last question: it is possible to "parametrize"
>>>             an accumulator (in order to provide the number of
>>>             sentences that should be put in the windows)?
>>>             I mean something like:
>>>
>>>             when
>>>                 $result : ArrayList() from accumulate ( $s:
>>>             Sentence(), *buildwindows(3,* $s))
>>>
>>>
>>>             Thanks in advance for you insights,
>>>
>>>             Best regards,
>>>
>>>             Bruno.
>>>
>>>             _______________________________________________
>>>             rules-users mailing list
>>>             rules-users at lists.jboss.org
>>>             <mailto:rules-users at lists.jboss.org>
>>>             https://lists.jboss.org/mailman/listinfo/rules-users
>>>
>>>
>>>
>>>         _______________________________________________
>>>         rules-users mailing list
>>>         rules-users at lists.jboss.org  <mailto:rules-users at lists.jboss.org>
>>>         https://lists.jboss.org/mailman/listinfo/rules-users
>>
>>
>>         _______________________________________________
>>         rules-users mailing list
>>         rules-users at lists.jboss.org <mailto:rules-users at lists.jboss.org>
>>         https://lists.jboss.org/mailman/listinfo/rules-users
>>
>>
>>
>>     _______________________________________________
>>     rules-users mailing list
>>     rules-users at lists.jboss.org  <mailto:rules-users at lists.jboss.org>
>>     https://lists.jboss.org/mailman/listinfo/rules-users
>
>
>     _______________________________________________
>     rules-users mailing list
>     rules-users at lists.jboss.org <mailto:rules-users at lists.jboss.org>
>     https://lists.jboss.org/mailman/listinfo/rules-users
>
>
>
> _______________________________________________
> rules-users mailing list
> rules-users at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/rules-users/attachments/20110822/b6a333f5/attachment.html 


More information about the rules-users mailing list