Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

MetaSoundex Phonetic Matching for English and Spanish


Affiliations
1 Department of Computer Science, Sam Houston State University, TX 77341, United States
     

   Subscribe/Renew Journal


Researchers confront major problems while searching for various kinds of data in large imprecise databases, as they are not spelled correctly or in the way they were expected to be spelled. As a result, they cannot find the word they sought. Over the years of struggle, pronunciation of words was considered as one of the practices to solve the problem effectively. The technique used to acquire words based on sounds is known as “Phonetic Matching”. Soundex was the first algorithm developed and other algorithms such as Metaphone, Caverphone, DMetaphone, Phonex etc., are also used for information retrieval in different environments. The main contribution of this paper is to analyze and implement the newly proposed MetaSoundex algorithm for fixing ill-defined data in English and Spanish languages. The newly developed MetaSoundex algorithm addresses the limitations of well-known phonetic matching techniques, Metaphone and Soundex. Specifically, the new algorithm provided results that are more accurate compared to both Soundex and Metaphone algorithms and has higher precision compared to Soundex, thus reducing the noise in the considered arena.

Keywords

Information Retrieval, Metaphone, Metasoundex, Misspelled Words, Phonetic Matching, Soundex.
User
Notifications

Subscription Login to verify subscription
JOURNAL COVERS
  

  • Angeles PM, Gamez AE, Moncada GJ. Comparison of a Modified Spanish Phonetic, Soundex, and Phonex Coding Functions During Data Matching Process. International Conference on Informatics, Electronics and Vision (ICIEV). 2015. https://doi.org/10.1109/ICIEV.2015.7334028.
  • Amón I, Moreno F, Echeverri J. Algoritmo Fonético Para Detección De Cadenas De Texto Duplicadas En El Idioma Espa-ol, Revista Ingenierías Universidad de Medellín. 2012; 11(20):127–38.
  • Arkfeld MR. Audio: Solving the Riddle and Avoiding Sanctions for the Forgotten “Electronically Stored Information” (ESI), Law Technology News. 2013. http://www.nexidia.com/media/1768/ white-paper-audio-the-forgotten-esi-arkfeld.pdf.
  • Balabantaray RC, Sahoo B, Lenka SK, Sahoo DK, Swain M. An Automatic Approximate Matching Technique Based on Phonetic Encoding for Odia Query, IJCSI International Journal of Computer Science Issues. 2012; 9(3).
  • Beider, Morse SP. Phonetic Matching: A Better Soundex. 2010. http://stevemorse.org/phonetics/bmpm2.htm.
  • Bhattacharjee AK, Mallick A, Dey A, Bandypoadhay S. Enhanced Technique for Data cleaning in text files, International Journal of Computer Science Issues. 2013; 10(5).
  • Carstensen A. An Introduction to Double Metaphone and the Principles behind Soundex. 2005. http://www.b-eye-network.com/ view/1596.
  • Diccionario. http://www.deperu.com/diccionario/.
  • Hassan D, Aickelin U, Wagner C. Comparison of Distance metrics for hierarchical data in medical databases, International Joint Conference on Neural Networks (IJCNN). 2014. https://doi.org/10.1109/IJCNN.2014.6889554, https://doi.org/10.2139/ssrn.2828084.
  • Haunts S. Phonetic String Matching: Soundex. 2014. https://stephenhaunts.com/2014/01/17/phonetic-string-matching-soundex/.
  • Hempel B. Fuzzy tools. 2014. https://github.com/brianhempel/fuzzy_tools/blob/master/accuracy/test_data/sources/misspellings/ misspellings.txt.
  • Hobbs S. New York State Identification and Intelligence System (NYSIIS) Phonetic Encoder. 1990. http://www.dropby.com/NYSIIS.html.
  • Hood D. Caversham Project Occasional Technical Paper. 2004. http://caversham.otago.ac.nz/files/working/ctp060902.pdf.
  • Kelkar BA, Manwade KB. Identifying Nearly Duplicate Records in Relational Database, IRACST - International Journal of Computer Science and Information Technology and Security (IJCSITS). 2012; 2(3).
  • Koneru K, Pulla VSV, Varol C. Performance Evaluation of Phonetic Matching Algorithms on English Words and Street Names: Comparison and Correlation. 5th International Conference on Data Management Technologies and Applications, 2016. p. 57-64. https://doi.org/10.5220/0005926300570064.
  • Kukich K. Techniques for automatically correcting words in text, ACM Computing Surveys. 1992; 24(4). https://doi.org/10.1145/146370.146380.
  • Lawler J. An English Words List. 1999. http://www-personal.umich.edu/.
  • Lawrence P. Hanging on the Metaphone, Computer Language. 1990; 7(12).
  • Mosquera A. Phonetic Indexing with the Spanish Metaphone Algorithm. 2012. http://www.amsqr.com/2012/02/phonetic-indexingwith-spanish.html.
  • Most Widely Spoken Languages in the World. 2014. http://www.infoplease.com/ipa/A0775272.html.
  • Nikita. Phonetic Algorithms. 2011. http:// ntz-develop.blogspot.com/2011/03/phonetic-algorithms.html.
  • Odell MK, Russell RC. Patent nos. 1,261,167 and 1,435,683. 1918 and 1922.
  • Pande BP, Dhami HS. Application of Natural Language Processing Tools in Stemming, International Journal of Computer Applications (0975 – 8887). 2011; 27(6).
  • Philips L. The Double Metaphone Search Algorithm. 2000. http:// www.drdobbs.com/the-double-metaphone-search-algorithm.
  • Planeta C. Las 20 palabras peor pronunciadas en espa-ol. 2008. http://www.planetacurioso.com/2008/10/30/las-20-palabras-peropronunciadasen-espanol/.
  • SaiKrishna V, Rasool A, Khare N. String Matching and its Applications in Diversified Fields, International Journal of Computer Science Issues. 2012; 9(1).
  • Shah R, Singh DK. Analysis and Comparative Study on Phonetic Matching Techniques, International Journal of Computer Applications. 2014; 87(9). https://doi.org/10.5120/15236-3771.
  • Singh V, Saini B. An Effective Pre-Processing Algorithm for Information Retrieval Systems, International Journal of Database Management Systems (IJDMS). 2014; 6(6). https://doi.org/10.5121/ ijdms.2014.6602.
  • Singla N, Garg D. String Matching Algorithms and their Applicability in various Applications, International Journal of Soft Computing and Engineering (IJSCE). 2012; 1(6).
  • Snae C. A Comparison and Analysis of Name Matching Algorithms, International Journal of Computer, Electrical, Automation, Control and Information Engineering. 2007; 1(1).
  • Soundex Coding. 2016. http://www.jewishgen.org/InfoFiles/soundex.html.
  • Sundeep C, Srikantha R. Analysis of Phonetic Matching Approaches for Indic languages, In International Journal of Advanced Research in Computer and Communication Engineering. 2012; 1(2).
  • Varol C, Talburt JR. Pattern and Phonetic Based Street Name Misspelling Correction. Eighth International Conference on Information Technology: New Generations; 2011. https://doi.org/10.1109/ITNG.2011.101.
  • Zobel J, Dart P. Phonetic String Matching: Lessons from Information Retrieval. Nineteenth Annual International ACM SIGIR conference on Research and development in Information Retrieval; 1996. https://doi.org/10.1145/243199.243258.
  • Zhang S, Zhang C, Yang Q. Towards databases mining: Preprocessing collected data, Applied Artificial Intelligence. 2003; 17(5–6):545–61. DOI: 10.1080/713827180. https://doi.org/10.1080/713827180.

Abstract Views: 45

PDF Views: 0




  • MetaSoundex Phonetic Matching for English and Spanish

Abstract Views: 45  |  PDF Views: 0

Authors

K. Koneru
Department of Computer Science, Sam Houston State University, TX 77341, United States
C. Varol
Department of Computer Science, Sam Houston State University, TX 77341, United States

Abstract


Researchers confront major problems while searching for various kinds of data in large imprecise databases, as they are not spelled correctly or in the way they were expected to be spelled. As a result, they cannot find the word they sought. Over the years of struggle, pronunciation of words was considered as one of the practices to solve the problem effectively. The technique used to acquire words based on sounds is known as “Phonetic Matching”. Soundex was the first algorithm developed and other algorithms such as Metaphone, Caverphone, DMetaphone, Phonex etc., are also used for information retrieval in different environments. The main contribution of this paper is to analyze and implement the newly proposed MetaSoundex algorithm for fixing ill-defined data in English and Spanish languages. The newly developed MetaSoundex algorithm addresses the limitations of well-known phonetic matching techniques, Metaphone and Soundex. Specifically, the new algorithm provided results that are more accurate compared to both Soundex and Metaphone algorithms and has higher precision compared to Soundex, thus reducing the noise in the considered arena.

Keywords


Information Retrieval, Metaphone, Metasoundex, Misspelled Words, Phonetic Matching, Soundex.

References





DOI: https://doi.org/10.18311/gjeis%2F2018%2F19822