Drools as Lexer / Parser (sequential data processing)

Wednesday, 2 September 2009

Hello group, I recently had the idea:
"A rule system (like Drools) is ideal for making programs with complex
  rules simpler. Writing a lexer or parser can be non-trivial. So, is it
  possible and also meaningful to express such a task with rules?"

Anyone here who maybe tried that already?
The two big questions for me are:
1) how easy is it to express a lexer with rules?
2) how bad (good?) will it perform?

If you happen to have a good idea of how to do it, could you please give
me an example for a simple lexer?
Let's say it will get natural language (a string, such as this email) as
input and should return a sequence (say, ArrayList) of Tokens, which may
look like this:

public class Token {
   public String value;
   public String category;

   Token(String value, String category) {
     this.value = value;
     this.category = category;
   }
}

We could have three categories:
"word",  "numeric"  and  "whitespace".

An input String could be:
"We can   see 500 cars"
And it should produce an ArrayList with the contents:
[
  Token("We", "word"),
  Token(" ", "whitespace"),
  Token("can", "word"),
  Token("   ", "whitespace"),
  Token("see", "word"),
  Token(" ", "whitespace"),
  Token("500", "numeric"),
  Token(" ", "whitespace"),
  Token("cars", "word")
]

At the moment I have difficulties to see if/how this could be achieved.
If you find this easy, please post a solution.
I am aware that JavaCC is really good for such tasks and will also
perform extremly well.

Greetings,
André

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006