Multilingual Did-You-Mean Algorithm
Good news:
The Hebrew ‘Did You Mean’ algorithm has been abstracted and now supports virtually any language!!!
How?!
Both Keyboard typos module and Hebrew Phonetic/Soundex algorithm have been separated from the ‘did you mean’ algorithm. Now, the ‘did you mean’ web service accepts any langugage and returns its results likewise. Still, one might wish to pass the text through the keyboard typos/soundex pipes before it goes into the ‘did you mean’ module and have them all in one shot.
As stated in my initial project, I recently worked on adding support for composite terms (2+ search terms) more natively. That was the hardest part since it required the generatiion of N-gram corpus of all the data. It is now able to identify ‘encyclo pedia’ as ‘encyclopedia’ and ‘brownsugar’ as ‘brown sugar’, and, of course, correct misspelled phrases like ‘Gooogel Kalenderr’ into ‘Google Calendar’.
One last note:
The ‘did you mean’ algorithm is usually a byproduct of users correcting themselves, as in Google, or language/jargon specific dictionaries. My algorithm uses none of these methods and is able to provide best matches in day 1 with zero dependancy on user behavior nor external dictionaries.
See it in action:
Please contact me for an online demo.