As promised, here's my report on investigating Soundex and related
algorithms.
(1) MVEL2 has an utility returning a "key" String from a "word" String
that
is close to what the Original Soundex algorithm is supposed to return. Not
being exactly the same doesn't matter much as long you compare results from
the same algorithm.
(2) The National Archives and Records Administration has issued a modified
Soundex algorithm, which is supposedly slightly better.
(3) Then, there is the New York State Identification and Intelligence System
(NYSIIS) Phonetic Encoder, which is reported to be 2.7% better than Soundex.
(4) A modified version of NYSIIS has also been defined, for both of them see
http://www.dropby.com/NYSIIS.html. (I have some doubts whether this page
reflects a correct implementation of the original NYSIIS algorithm; e.g.,
"Bahr" returns "B", which can't be correct.)
All of the above are only useful for English pronounciations of proper
names.
(5) For German, there is something called "Kölner Phonetik".
I have implemented (1), (2), (3) and (5) in a class
SoundsLikeEvaluatorDefinition implements EvaluatorDefinition, implementing
the operator soundsLike (note the capital 'L') in the variants soundsLike or
soundsLike[orig] for (1), soundsLike[us] for (2), soundsLike[ny] for (3) and
soundsLike[de] for (5).
All of this has been an interesting (for me) exercise, but I really don't
know whether any of this should go into Drools. (There is the issue of
fixing a NPE with the current implementation that calls the MVEL2 code,
though.)
It's up to you, Team, to vote on this; I can contribute the aforementioned
class, with "soundsLike" replaced by "soundslike" as a replacement for
the
current implementation (not requiring the MVEL2 utility) with the option of
using the various operator parameters.
Cheers
Wolfgang