<span id="goog_1452677146"></span><span id="goog_1452677147"></span><a href="/"></a>On 10 October 2010 23:41, Michael Neale <<a href="mailto:michael.neale@gmail.com">michael.neale@gmail.com</a>> wrote:<br>> I think you should clean room implement it (or reuse some old code of yours<br>
> if it is safe to do so). From what I have seen of the algorithm - it isn't<br>> huge - and it would make sense to have it re-implemented. As an alternative<br>> - consider taking a look at the MVEL soundex code and rewriting that - and<br>
> we will see if we can make it upstream.<br><br>I just re-implemented this according to the algorithm I found in<br> <a href="http://en.wikipedia.org/wiki/Soundex">http://en.wikipedia.org/wiki/Soundex</a><br>I've also consulted a CPAN module, to learn what was intended by the<br>
MVEL implementation, but it's undecidable (possibly due to omissions or<br>bugs).<br><br>> I would say it is just slightly<br>> neglected - its not well known that it lives there. Using the MVEL one was<br>> just opportunistic for drools. <br>
> I didn't know that it could return null, that is bad. I guess if it is null<br>> - that would mean that you just do a literal case insensitive compare?<br><br>A correct implementation never returns null. An empty word might, but for<br>
our purpose "" would be preferable.<br><br>> Also - AFAIK - soundex is only for english right?<br>Certainly.<br><br>> Is there an equivalent for other languages?<br>Soundex is coarse even for English. I've found the atrocious example that<br>
the Soundex for "Britney Spears" is the same as for<br>"bewährten Superzicke" (~ "proven super-b*"). <a href="http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System" title="New York State Identification and Intelligence System">NYSIIS</a> is supposed<br>
to be better.<br><br>For German, there is an equivalent: "Kölner Phonetik". It might<br>make sense to provide this for an operator "soundex[de]". (All of<br>/M[ae][iy]e?r/ sound alike in German, and all exist as proper names.)<br>
<br>I have also found one link to an implementation adapted for French. <br><br>Soundex is aimed at the pronunciation of proper names. There might be some<br>leeway for that even in a language like Hungarian, which is pronounced exactly<br>
as written.<br><br>I think Drools should drop the MVEL version and go for a flexible approach,<br>possibly even <a href="http://s.th">s.th</a>. better than Soundex/NARA for English. I'll research this<br>some more, and report back before I commit anything ;-)<br>
<br>-W<br><br><br>> If so, perhaps having it in the drools codebase makes sense<br>> and opens the way for people to plug in their own soundex. <br>> On Mon, Oct 11, 2010 at 2:54 AM, Wolfgang Laun <<a href="mailto:wolfgang.laun@gmail.com">wolfgang.laun@gmail.com</a>><br>
> wrote:<br>>><br>>> The implementation of "soundslilke" is broken in more than one respect.<br>>> The conversion of a word to a Soundex string is provided by<br>>> org.mvel2.util.Soundex.<br>
>> (.) There are words where Soundex.soundex returns null, so that the<br>>> calling code, in Drools, crashes with a NPE.<br>>> (.) The algorithm implemented in Soundex is erroneous. I'm not sure which<br>
>> Soundex algorithm it is supposed to implement, but it just doesn't meet the<br>>> basic requirements.<br>>><br>>> I have implemented, correctly, the version for the National Archives and<br>
>> Records Administration (NARA) rule set for the official implementation of<br>>> Soundex used by the U.S. Government.<br>>><br>>> Do we wait for MVEL to correct this bug, or do we just replace it with a<br>
>> correct implementation?<br>>><br>>> Regards<br>>> Wolfgang<br><br>