[rules-dev] soundslike: report on phonetic matching

jschmied nabble at juergenschmied.de
Sat Oct 16 04:37:57 EDT 2010


Hi!

There are two groups of functions:

- first there are functions to build a key (Soundex, DoubleMetaphone). This
key you can store in a database and search for it with a regular index. Then
you have a set of likely hits, but these are usually much to many.
- second you must filter these hits. To accomplish  this task you calculate
a distance between the word you are looking for and the hits from the first
steps. For this you use a function like Levenshtein.

You can do this inside a ruleengine but i don't think it's the right way for
such search tasks. Usually you search over several 100.000 or more records
and have to be ready in milliseconds ....

By the was: Soundex and Levenshtein are the simpest but the worst choices.

for the first step:
 http://en.wikipedia.org/wiki/Phonetic_algorithm
(wrong there: Double Metaphone ist for many languages, not only for english)

for the second step:
http://en.wikipedia.org/wiki/SimMetrics its a quite comprehensive Java
library.

I've done a lot work in name matching, if you have any questions just ask.

juergen




-- 
View this message in context: http://drools-java-rules-engine.46999.n3.nabble.com/soundslike-report-on-phonetic-matching-tp1707485p1713444.html
Sent from the Drools - Dev mailing list archive at Nabble.com.


More information about the rules-dev mailing list