great - I guess if it shifts away from "fixed" soundex - probably should try and find out who is using it to ensure there are no surprises. I can't imagine it is widely used.
On 10 October 2010 23:41, Michael Neale <michael.neale@gmail.com> wrote:I just re-implemented this according to the algorithm I found in
> I think you should clean room implement it (or reuse some old code of yours
> if it is safe to do so). From what I have seen of the algorithm - it isn't
> huge - and it would make sense to have it re-implemented. As an alternative
> - consider taking a look at the MVEL soundex code and rewriting that - and
> we will see if we can make it upstream.
http://en.wikipedia.org/wiki/Soundex
I've also consulted a CPAN module, to learn what was intended by the
MVEL implementation, but it's undecidable (possibly due to omissions or
bugs).A correct implementation never returns null. An empty word might, but for
> I would say it is just slightly
> neglected - its not well known that it lives there. Using the MVEL one was
> just opportunistic for drools.
> I didn't know that it could return null, that is bad. I guess if it is null
> - that would mean that you just do a literal case insensitive compare?
our purpose "" would be preferable.Certainly.
> Also - AFAIK - soundex is only for english right?Soundex is coarse even for English. I've found the atrocious example that
> Is there an equivalent for other languages?
the Soundex for "Britney Spears" is the same as for
"bewährten Superzicke" (~ "proven super-b*"). NYSIIS is supposed
to be better.
For German, there is an equivalent: "Kölner Phonetik". It might
make sense to provide this for an operator "soundex[de]". (All of
/M[ae][iy]e?r/ sound alike in German, and all exist as proper names.)
I have also found one link to an implementation adapted for French.
Soundex is aimed at the pronunciation of proper names. There might be some
leeway for that even in a language like Hungarian, which is pronounced exactly
as written.
I think Drools should drop the MVEL version and go for a flexible approach,
possibly even s.th. better than Soundex/NARA for English. I'll research this
some more, and report back before I commit anything ;-)
-W
> If so, perhaps having it in the drools codebase makes sense
> and opens the way for people to plug in their own soundex.
> On Mon, Oct 11, 2010 at 2:54 AM, Wolfgang Laun <wolfgang.laun@gmail.com>
> wrote:
>>
>> The implementation of "soundslilke" is broken in more than one respect.
>> The conversion of a word to a Soundex string is provided by
>> org.mvel2.util.Soundex.
>> (.) There are words where Soundex.soundex returns null, so that the
>> calling code, in Drools, crashes with a NPE.
>> (.) The algorithm implemented in Soundex is erroneous. I'm not sure which
>> Soundex algorithm it is supposed to implement, but it just doesn't meet the
>> basic requirements.
>>
>> I have implemented, correctly, the version for the National Archives and
>> Records Administration (NARA) rule set for the official implementation of
>> Soundex used by the U.S. Government.
>>
>> Do we wait for MVEL to correct this bug, or do we just replace it with a
>> correct implementation?
>>
>> Regards
>> Wolfgang
_______________________________________________
rules-dev mailing list
rules-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/rules-dev