[rules-users] To accumulate or not to accumulate
Bruno Freudensprung
bruno.freudensprung at temis.com
Mon Aug 22 09:29:28 EDT 2011
Hi Wolfgang,
I finally decided to test different implementations:
* first based on an accumulation function
* second (your suggestion) relying on drools to 1) build all
SentenceWindows then to 2) locate ManualAnnotations inside those
Windows
* third (your suggestion as well) relying on drools to 1) build only
SentenceWindows that might be interesting (containing one of the
ManualAnnotations I am looking for) then to 2) locate
ManualAnnotations inside those Windows
First implementation performs quite well but I am stuck on the
parametrization (I need to to define build2windows, build3windows,
build4windows... functions): 47 milliseconds on 100 sentences, 94 ms
sentences with 1000 sentences
Second implementations is of course sub optimal since it creates many
useless windows: 125 ms on 100 sentences, 14400 ms on 1000 sentences
Third implementation is very versatile and its performances are
comparable to the accumulator solution: 93 ms on 100 sentences, 125 ms
on 1000 sentences
So thanks again for your suggestion; it was definitely useful :-).
Regards,
Bruno.
Le 19/08/2011 17:25, Wolfgang Laun a écrit :
> 2011/8/19 Bruno Freudensprung <bruno.freudensprung at temis.com
> <mailto:bruno.freudensprung at temis.com>>
>
>
> I am not sure I understand what you mean by "random order" but I
> guess it has to do with my ArrayList result type.
> What I had in mind is to put all sentences in a TreeSet during the
> "action" method, and finally issue an ArrayList result object by
> iterating over the TreeSet and grouping sentences.
>
>
> Heh :) I clean forgot that I had done this sort of thing not too long ago.
>
> My first guess was that such an accumulator might be faster than a
> construction of windows using rules.
> However I admit your suggestion is very elegant, and I thank you
> for that! I am probably still too imperative-minded...
>
>
> Well, a procedural solution would be a reasonable alternative for this
> problem.
>
> -W
>
>
> Regards,
>
> Bruno.
>
> Le 19/08/2011 16:05, Wolfgang Laun a écrit :
>> How would you write "buildwindows", given that its "action"
>> method would be called once for each Sentence, in random order?
>>
>> It's very simple to write a very small set of rules to construct
>> all SentenceWindow facts of size 1 and then to extend them to any
>> desired size, depending on some parameter.
>> 1. Given a Sentence and no Window beginning with it, create a
>> Window of length 1.
>> 2. Given a Window of size n < desiredSize and given a Sentence
>> immediately following it, extend the Window to one of size n+1.
>> 3a. For any Window of desiredSize, inspect it for "closely
>> situated ManualAnnotations".
>> 3b. If ManualAnnotations have been associated with their
>> containing Sentences up-front, you just need to find Windows with
>> more than 1 ManualAnnotation, adding them in the RHS of rule 2 above.
>>
>> -W
>>
>>
>> 2011/8/19 Bruno Freudensprung <bruno.freudensprung at temis.com
>> <mailto:bruno.freudensprung at temis.com>>
>>
>>
>> Hi Wolfgang,
>>
>> Thanks for your answer.
>> Sentences are not contiguous (might be some space characters
>> in between) but manual annotations cannot overlap sentences
>> (interpret "overlap" in terms of Drools Fusion terminology).
>> If I had an "inside" operator, do you think the following
>> accumulate option could be better?
>>
>> when
>> *$result : ArrayList() from accumulate ( $s: Sentence(),
>> buildwindows($s))*
>> *$w : SentenceWindows () **from $result*
>> a1 : ManualAnnotation (this *inside *$w)
>> a2 : ManualAnnotation (this != a1, this *inside *$w)
>> then
>> ... do something with a1 and a2 since they are "close" to
>> each other
>> end
>>
>> Does anyone know something about accumulator parametrization
>> (looking at the source code it does not seem to be possible,
>> though)?
>> Maybe a syntax inspired of operator parametrization could be
>> nice:
>>
>> $result : ArrayList() from accumulate ( $s: Sentence(),
>> *buildwindows[3]($s)*)
>>
>> Best regards,
>>
>> Bruno.
>>
>> Le 19/08/2011 13:55, Wolfgang Laun a écrit :
>>> There are some details that one should consider before
>>> deciding on a particular implementation technique.
>>>
>>> * Are all Sentences contiguous, i.e., s1.end = pred(
>>> s2.start )
>>> * Can a ManualAnnotation start on one Sentence and end
>>> in the next or any further successor?
>>>
>>> As in all problems where constraints depend on an order
>>> between facts, performance is going to be a problem with
>>> increasing numbers of Sentences and ManualAnnotations.
>>>
>>> Your accumulate plan could be a very inefficient approach.
>>> Creating O(N*N) pairs and then looking for an overlapping
>>> window is much worse than looking at each window, for
>>> instance. But it depends on the expected numbers for both.
>>>
>>> -W
>>>
>>>
>>>
>>> 2011/8/19 Bruno Freudensprung <bruno.freudensprung at temis.com
>>> <mailto:bruno.freudensprung at temis.com>>
>>>
>>> Hello,
>>>
>>> I am trying to implement rules handling "Sentence",
>>> "ManualAnnotation" objects (imagine someone highligthing
>>> words of the document). Basically "Sentence" objects
>>> have "start" and "end" positions (fields) into the text
>>> of a document, and they are Comparable according to
>>> their location into the document.
>>>
>>> I need to write rules using the notion "window of
>>> consecutive sentences".
>>>
>>> Basically I am not very interested by those
>>> "SentenceWindow" objects, I just need them to define a
>>> kind of proximity between "ManualAnnotation" objects.
>>> What I eventually need in the "when" of my rule is
>>> something like:
>>>
>>> when
>>> ... maybe something creating the windows
>>> a1 : ManualAnnotation ()
>>> a2 : ManualAnnotation (this != a1)
>>> SentenceWindow (this includes a1, this includes a2)
>>> then
>>> ... do something with a1 and a2 since they are
>>> "close" to each other
>>> end
>>>
>>> As I don't know the "internals" of Drools, I would like
>>> to have your opinion about what the best "idiom":
>>>
>>> * create all SentenceWindow objects and insert them
>>> in the working memory, then write rules against
>>> all the facts (SentenceWindow and ManualAnnotation)
>>> * implement an accumulator that will create a list
>>> of SentenceWindow object
>>>
>>>
>>> The first option could look like:
>>>
>>> |||rule "Create sentence windows"
>>> when
>>> # find 3 consecutive sentences
>>> s1 : Sentence()
>>> s2 : Sentence(this > s1)
>>> s3 : Sentence(this > s2)
>>> not Sentence(this != s2 && > s1 && < s3)
>>> then
>>> SentenceWindow swindow = new SentenceWindow();
>>> swindow.setStart(s1.getStart());
>>> swindow.setTheend(s3.getEnd());
>>> insert(swindow);
>>> end|
>>>
>>> ... Then use the first rule "as is".
>>>
>>> The accumulator option could look like (I am not really
>>> sure the syntax is correct) :
>>>
>>> when
>>> *$result : ArrayList() from accumulate ( $s: Sentence(),
>>> buildwindows($s))*
>>> a1 : ManualAnnotation ()
>>> a2 : ManualAnnotation (this != a1)
>>> *SentenceWindows (this includes a1, this includes a2)
>>> **from $result*
>>> then
>>> ... do something with a1 and a2 since they are
>>> "close" to each other
>>> end
>>>
>>> Is it possible to decide if one way is best than the other?
>>>
>>> And one last question: it is possible to "parametrize"
>>> an accumulator (in order to provide the number of
>>> sentences that should be put in the windows)?
>>> I mean something like:
>>>
>>> when
>>> $result : ArrayList() from accumulate ( $s:
>>> Sentence(), *buildwindows(3,* $s))
>>>
>>>
>>> Thanks in advance for you insights,
>>>
>>> Best regards,
>>>
>>> Bruno.
>>>
>>> _______________________________________________
>>> rules-users mailing list
>>> rules-users at lists.jboss.org
>>> <mailto:rules-users at lists.jboss.org>
>>> https://lists.jboss.org/mailman/listinfo/rules-users
>>>
>>>
>>>
>>> _______________________________________________
>>> rules-users mailing list
>>> rules-users at lists.jboss.org <mailto:rules-users at lists.jboss.org>
>>> https://lists.jboss.org/mailman/listinfo/rules-users
>>
>>
>> _______________________________________________
>> rules-users mailing list
>> rules-users at lists.jboss.org <mailto:rules-users at lists.jboss.org>
>> https://lists.jboss.org/mailman/listinfo/rules-users
>>
>>
>>
>> _______________________________________________
>> rules-users mailing list
>> rules-users at lists.jboss.org <mailto:rules-users at lists.jboss.org>
>> https://lists.jboss.org/mailman/listinfo/rules-users
>
>
> _______________________________________________
> rules-users mailing list
> rules-users at lists.jboss.org <mailto:rules-users at lists.jboss.org>
> https://lists.jboss.org/mailman/listinfo/rules-users
>
>
>
> _______________________________________________
> rules-users mailing list
> rules-users at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/rules-users/attachments/20110822/b6a333f5/attachment.html
More information about the rules-users
mailing list