Hi Wolfgang,
I am very interested in using Drools in common NLP tasks and would really like to have
something to startup with, while I am still drools novice.
Do you mind if I ask, is the code you posted part of some library/project?
Thanks & best,
Milan
--- On Fri, 4/9/09, Wolfgang Laun <wolfgang.laun(a)gmail.com> wrote:
From: Wolfgang Laun <wolfgang.laun(a)gmail.com>
Subject: Re: [rules-users] Drools as Lexer / Parser (sequential data processing)
To: "Rules Users List" <rules-users(a)lists.jboss.org>
Date: Friday, 4 September, 2009, 8:16
The rules for a lexer are bound to be all of
the same standard pattern:
rule FirstDigit
salience 0
where
$i : Input( $ch : character matches
"\d" )
$l : Lexer( state == State.IDLE, $buf :
buffer )
then
$buf.setLength( 0 ); $buf.append( $ch );
modify( $l ){ setState( State.NUMBER ) }
modify( $i ){ next() }
end
Rather boring.
If you prefer, you can write a single rule exercising the
FSM mechanism (match input, execute action, change state)
that is driven by a set of (static and immutable) facts
defining the individual state/symbol/action/transition
quadruples.
rule Lex
salience 0
when
$k : CharClass( $pattern : pattern
) // such as
"\d" or "\w"
$i : Input( $ch : character matches $pattern
)
$l : Lexer( $state : state
)
// also containing a StringBuilder buffer
$t : StateTransition( state == $state,
charClass == $k, $action : action, $succ : succ )
then
$action.execute( $i );
modify( $l ){ setState( $succ ) }
modify( $i ){ next() }
end
An additional rule catches errors. - An advantage of this
approach is that any lexer can now be configured by a set of
fact definitions, e.g., this one for a sequence of
identifiers, numbers and quoted strings:
CharClass letter = new CharClass(
"[A-Za-z_]" );
CharClass digit = new CharClass(
"\\d" );
CharClass escape = new CharClass(
"\\\\" );
CharClass space = new CharClass(
"\\s" );
CharClass quote = new CharClass(
"\"" );
CharClass empty = new CharClass(
"\u0000" );
ins( letter, digit, escape, space, quote,
empty );
Action<State,TokenType> save = new
SaveAction( this );
Action<State,TokenType> skip = new
SkipAction( this );
Action<State,TokenType> halt = new
HaltAction( this );
Action<State,TokenType> emit = new
EmitAction( this );
StateTransition<State,TokenType> t1 =
new StateTransition<State,TokenType>( State.S0,
letter, save, State.IDENT );
StateTransition<State,TokenType> t2 =
new StateTransition<State,TokenType>( State.S0,
digit, save, State.NUMBER );
StateTransition<State,TokenType> t3 =
new StateTransition<State,TokenType>( State.S0,
quote, skip, State.STRING );
StateTransition<State,TokenType> t4 =
new StateTransition<State,TokenType>( State.S0,
space, skip );
StateTransition<State,TokenType> t5 =
new StateTransition<State,TokenType>( State.S0,
empty, halt );
ins( t1, t2, t3, t4, t5 );
StateTransition<State,TokenType> u1 =
new StateTransition<State,TokenType>( State.IDENT,
letter, save );
StateTransition<State,TokenType> u2 =
new StateTransition<State,TokenType>( State.IDENT,
digit, save );
StateTransition<State,TokenType> u3 =
new StateTransition<State,TokenType>( State.IDENT,
space, emit, State.S0 );
StateTransition<State,TokenType> u4 =
new StateTransition<State,TokenType>( State.IDENT,
empty, halt );
ins( u1, u2, u3, u4 );
StateTransition<State,TokenType> v1 =
new StateTransition<State,TokenType>( State.NUMBER,
digit, save );
StateTransition<State,TokenType> v2 =
new StateTransition<State,TokenType>( State.NUMBER,
space, emit, State.S0 );
StateTransition<State,TokenType> v3 =
new StateTransition<State,TokenType>( State.NUMBER,
empty, halt );
ins( v1, v2, v3 );
StateTransition<State,TokenType> w1 =
new StateTransition<State,TokenType>( State.STRING,
letter, save );
StateTransition<State,TokenType> w2 =
new StateTransition<State,TokenType>( State.STRING,
digit, save );
StateTransition<State,TokenType> w3 =
new StateTransition<State,TokenType>( State.STRING,
space, save );
StateTransition<State,TokenType> w4 =
new StateTransition<State,TokenType>( State.STRING,
quote, emit, State.S0 );
StateTransition<State,TokenType> w5 =
new StateTransition<State,TokenType>( State.STRING,
escape, save, State.ESCAPE );
StateTransition<State,TokenType> w6 =
new StateTransition<State,TokenType>( State.STRING,
empty, halt );
ins( w1, w2, w3, w4, w5, w6 );
StateTransition<State,TokenType> x1 =
new StateTransition<State,TokenType>( State.ESCAPE,
letter, save, State.STRING );
StateTransition<State,TokenType> x2 =
new StateTransition<State,TokenType>( State.ESCAPE,
digit, save, State.STRING );
StateTransition<State,TokenType> x3 =
new StateTransition<State,TokenType>( State.ESCAPE,
space, save, State.STRING );
StateTransition<State,TokenType> x4 =
new StateTransition<State,TokenType>( State.ESCAPE,
quote, emit, State.STRING );
StateTransition<State,TokenType> x5 =
new StateTransition<State,TokenType>( State.ESCAPE,
escape, save, State.STRING );
StateTransition<State,TokenType> x6 =
new StateTransition<State,TokenType>( State.ESCAPE,
empty, halt );
ins( x1, x2, x3, x4, x5, x6 );
-W
On Wed, Sep 2, 2009 at 11:54 PM,
André Thieme <address.good.until.2009.dec.14(a)justmail.de>
wrote:
Hello group, I recently had the
idea:
"A rule system (like Drools) is ideal for making
programs with complex
rules simpler. Writing a lexer or parser can be
non-trivial. So, is it
possible and also meaningful to express such a task with
rules?"
Anyone here who maybe tried that already?
The two big questions for me are:
1) how easy is it to express a lexer with rules?
2) how bad (good?) will it perform?
If you happen to have a good idea of how to do it, could
you please give
me an example for a simple lexer?
Let's say it will get natural language (a string, such
as this email) as
input and should return a sequence (say, ArrayList) of
Tokens, which may
look like this:
public class Token {
public String value;
public String category;
Token(String value, String category) {
this.value = value;
this.category = category;
}
}
We could have three categories:
"word", "numeric" and
"whitespace".
An input String could be:
"We can see 500 cars"
And it should produce an ArrayList with the contents:
[
Token("We", "word"),
Token(" ", "whitespace"),
Token("can", "word"),
Token(" ", "whitespace"),
Token("see", "word"),
Token(" ", "whitespace"),
Token("500", "numeric"),
Token(" ", "whitespace"),
Token("cars", "word")
]
At the moment I have difficulties to see if/how this could
be achieved.
If you find this easy, please post a solution.
I am aware that JavaCC is really good for such tasks and
will also
perform extremly well.
Greetings,
André
_______________________________________________
rules-users mailing list
rules-users(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/rules-users
-----Inline Attachment Follows-----
_______________________________________________
rules-users mailing list
rules-users(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/rules-users