[rules-users] Drools as Lexer / Parser (sequential data processing)

Clandes Tino clandestino_bgd at yahoo.co.uk
Fri Oct 16 19:55:37 EDT 2009


Hi Wolfgang,
I am very interested in using Drools in common NLP tasks and would really like to have something to startup with, while I am still drools novice.

Do you mind if I ask, is the code you posted part of some library/project?

Thanks & best,
Milan


--- On Fri, 4/9/09, Wolfgang Laun <wolfgang.laun at gmail.com> wrote:

> From: Wolfgang Laun <wolfgang.laun at gmail.com>
> Subject: Re: [rules-users] Drools as Lexer / Parser (sequential data processing)
> To: "Rules Users List" <rules-users at lists.jboss.org>
> Date: Friday, 4 September, 2009, 8:16
> The rules for a lexer are bound to be all of
> the same standard pattern:
> 
> rule FirstDigit
>     salience 0
>     where
>         $i : Input( $ch : character matches
> "\d" )
> 
>         $l : Lexer( state == State.IDLE, $buf :
> buffer ) 
>     then
> 
>         $buf.setLength( 0 ); $buf.append( $ch );
>         modify( $l ){ setState( State.NUMBER ) }
>         modify( $i ){ next() }
> end
> 
> Rather boring.
> 
> If you prefer, you can write a single rule exercising the
> FSM mechanism (match input, execute action, change state)
> that is driven by a set of (static and immutable) facts
> defining the individual state/symbol/action/transition
> quadruples. 
> 
> 
> 
> rule Lex
>     salience 0
>     when
>         $k : CharClass( $pattern : pattern
> )                  // such as
> "\d" or "\w"
>         $i : Input( $ch : character matches $pattern
> )
> 
>         $l : Lexer( $state : state
> )                                 
> // also containing a StringBuilder buffer 
>         $t : StateTransition( state == $state,
> charClass == $k, $action : action, $succ : succ )
>     then
> 
> 
>         $action.execute( $i );
>         modify( $l ){ setState( $succ ) }
>         modify( $i ){ next() }
> end
> 
> An additional rule catches errors. - An advantage of this
> approach is that any lexer can now be configured by a set of
> fact definitions, e.g., this one for a sequence of
> identifiers, numbers and quoted strings:
> 
> 
>         CharClass letter = new CharClass(
> "[A-Za-z_]" );
>         CharClass digit  = new CharClass(
> "\\d" );
>         CharClass escape = new CharClass(
> "\\\\" );
>         CharClass space  = new CharClass(
> "\\s" );
> 
>         CharClass quote  = new CharClass(
> "\"" );
>         CharClass empty  = new CharClass(
> "\u0000" );
>         ins( letter, digit, escape, space, quote,
> empty );
>         
>         Action<State,TokenType> save = new
> SaveAction( this );
> 
>         Action<State,TokenType> skip = new
> SkipAction( this );
>         Action<State,TokenType> halt = new
> HaltAction( this );
>         Action<State,TokenType> emit = new
> EmitAction( this );
>         
> 
>         StateTransition<State,TokenType> t1 =
> new StateTransition<State,TokenType>( State.S0,
> letter, save, State.IDENT );
>         StateTransition<State,TokenType> t2 =
> new StateTransition<State,TokenType>( State.S0,
> digit,  save, State.NUMBER );
> 
>         StateTransition<State,TokenType> t3 =
> new StateTransition<State,TokenType>( State.S0,
> quote,  skip, State.STRING );
>         StateTransition<State,TokenType> t4 =
> new StateTransition<State,TokenType>( State.S0,
> space,  skip );
> 
>         StateTransition<State,TokenType> t5 =
> new StateTransition<State,TokenType>( State.S0,
> empty,  halt );
>         ins( t1, t2, t3, t4, t5 );
>         
>         StateTransition<State,TokenType> u1 =
> new StateTransition<State,TokenType>( State.IDENT,
> letter, save );
> 
>         StateTransition<State,TokenType> u2 =
> new StateTransition<State,TokenType>( State.IDENT,
> digit,  save );
>         StateTransition<State,TokenType> u3 =
> new StateTransition<State,TokenType>( State.IDENT,
> space,  emit, State.S0 );
> 
>         StateTransition<State,TokenType> u4 =
> new StateTransition<State,TokenType>( State.IDENT,
> empty,  halt );
>         ins( u1, u2, u3, u4 );
> 
>         StateTransition<State,TokenType> v1 =
> new StateTransition<State,TokenType>( State.NUMBER,
> digit,  save );
> 
>         StateTransition<State,TokenType> v2 =
> new StateTransition<State,TokenType>( State.NUMBER,
> space,  emit, State.S0 );
>         StateTransition<State,TokenType> v3 =
> new StateTransition<State,TokenType>( State.NUMBER,
> empty,  halt );
> 
>         ins( v1, v2, v3 );
> 
>         StateTransition<State,TokenType> w1 =
> new StateTransition<State,TokenType>( State.STRING,
> letter, save );
>         StateTransition<State,TokenType> w2 =
> new StateTransition<State,TokenType>( State.STRING,
> digit,  save );
> 
>         StateTransition<State,TokenType> w3 =
> new StateTransition<State,TokenType>( State.STRING,
> space,  save );
>         StateTransition<State,TokenType> w4 =
> new StateTransition<State,TokenType>( State.STRING,
> quote,  emit, State.S0 );
> 
>         StateTransition<State,TokenType> w5 =
> new StateTransition<State,TokenType>( State.STRING,
> escape, save, State.ESCAPE );
>         StateTransition<State,TokenType> w6 =
> new StateTransition<State,TokenType>( State.STRING,
> empty,  halt );
> 
>         ins( w1, w2, w3, w4, w5, w6 );
> 
>         StateTransition<State,TokenType> x1 =
> new StateTransition<State,TokenType>( State.ESCAPE,
> letter, save, State.STRING );
>         StateTransition<State,TokenType> x2 =
> new StateTransition<State,TokenType>( State.ESCAPE,
> digit,  save, State.STRING );
> 
>         StateTransition<State,TokenType> x3 =
> new StateTransition<State,TokenType>( State.ESCAPE,
> space,  save, State.STRING );
>         StateTransition<State,TokenType> x4 =
> new StateTransition<State,TokenType>( State.ESCAPE,
> quote,  emit, State.STRING );
> 
>         StateTransition<State,TokenType> x5 =
> new StateTransition<State,TokenType>( State.ESCAPE,
> escape, save, State.STRING );
>         StateTransition<State,TokenType> x6 =
> new StateTransition<State,TokenType>( State.ESCAPE,
> empty,  halt );
> 
>         ins( x1, x2, x3, x4, x5, x6 );
> 
> -W
> 
> On Wed, Sep 2, 2009 at 11:54 PM,
> André Thieme <address.good.until.2009.dec.14 at justmail.de>
> wrote:
> 
> 
> 
> Hello group, I recently had the
> idea:
> 
> "A rule system (like Drools) is ideal for making
> programs with complex
> 
>   rules simpler. Writing a lexer or parser can be
> non-trivial. So, is it
> 
>   possible and also meaningful to express such a task with
> rules?"
> 
> 
> 
> Anyone here who maybe tried that already?
> 
> The two big questions for me are:
> 
> 1) how easy is it to express a lexer with rules?
> 
> 2) how bad (good?) will it perform?
> 
> 
> 
> If you happen to have a good idea of how to do it, could
> you please give
> 
> me an example for a simple lexer?
> 
> Let's say it will get natural language (a string, such
> as this email) as
> 
> input and should return a sequence (say, ArrayList) of
> Tokens, which may
> 
> look like this:
> 
> 
> 
> public class Token {
> 
>    public String value;
> 
>    public String category;
> 
> 
> 
>    Token(String value, String category) {
> 
>      this.value = value;
> 
>      this.category = category;
> 
>    }
> 
> }
> 
> 
> 
> We could have three categories:
> 
> "word",  "numeric"  and
>  "whitespace".
> 
> 
> 
> An input String could be:
> 
> "We can   see 500 cars"
> 
> And it should produce an ArrayList with the contents:
> 
> [
> 
>   Token("We", "word"),
> 
>   Token(" ", "whitespace"),
> 
>   Token("can", "word"),
> 
>   Token("   ", "whitespace"),
> 
>   Token("see", "word"),
> 
>   Token(" ", "whitespace"),
> 
>   Token("500", "numeric"),
> 
>   Token(" ", "whitespace"),
> 
>   Token("cars", "word")
> 
> ]
> 
> 
> 
> At the moment I have difficulties to see if/how this could
> be achieved.
> 
> If you find this easy, please post a solution.
> 
> I am aware that JavaCC is really good for such tasks and
> will also
> 
> perform extremly well.
> 
> 
> 
> 
> 
> Greetings,
> 
> André
> 
> _______________________________________________
> 
> rules-users mailing list
> 
> rules-users at lists.jboss.org
> 
> https://lists.jboss.org/mailman/listinfo/rules-users
> 
> 
> 
> 
> -----Inline Attachment Follows-----
> 
> _______________________________________________
> rules-users mailing list
> rules-users at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/rules-users
> 


      




More information about the rules-users mailing list