Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Supervised Alias Name Validation Using Statistical Similarity Coefficients


Affiliations
1 Department of Computer Science and Engineering, Manonmaniam Sundaranar University, India
     

   Subscribe/Renew Journal


Alias name is the surnames for a known name. Extracting and validating alias names is an interesting research topic in language processing and has a number of Natural language processing applications like Information extraction, Information retrieval, Sentimental analysis, Question and answering. Alias name validation involves the process of validating whether a name is alias name or not. In this work, seven statistical similarity coefficients were used as features in classifier to validate alias names. For each name-alias pair, seven statistical similarity coefficient values were calculated and used as features to train a classifier. The trained classifier is then employed to classify whether a name-alias pair is valid or not. Experiments were conducted using Indian name-alias data that has data for 15 persons containing 35 name-alias pairs. Results show that SVM classifier with Radial Basis Function Kernel outperforms all the other classifiers in terms of overall accuracy.

Keywords

Alias Name Extraction, Information Extraction, Web Mining.
Subscription Login to verify subscription
User
Notifications
Font Size

  • Paul Hsiung, Andrew Moore, Daniel Neill and Jeff Schneider, “Alias detection in link data sets”, Proceedings of the International Conference on Intelligence Analysis, 2005.
  • Wee Meng Soon, Hwee Tou Ng and Daniel Chung Yong Lim, “A machine learning approach to coreference resolution of noun phrases”, Association for Computational Linguistics, Vol. 27, No. 4, pp. 521-544, 2001.
  • James Mayfield, David Alexander, Bonnie J. Dorr, Jason Eisner, Tamer Elsayed, Tim Finin, Clayton Fink et al. “Cross-Document Coreference Resolution: A Key Technology for Learning by Reading”, AAAI Spring Symposium: Learning by Reading and Learning to Read, pp. 65-70, 2009.
  • Shalom Lappin and Herbert J. Leass, “An algorithm for pronominal anaphora resolution”, Association for Computational linguistics, Vol. 20, No. 4, pp. 535-561, 1994.
  • Danushka Bollegala, Yutaka Matsuo and Mitsuru Ishizuka, “Automatic discovery of personal name aliases from the web”, IEEE Transactions on Knowledge and Data Engineering, Vol. 23, No. 6, pp. 831-844, 2011.
  • Christopher D. Manning and Hinrich Schutze, “Foundations of Statistical Natural Language Processing”, MIT Press, 1999.
  • Tomoko Hokama and Hiroyuki Kitagawa, “Extracting mnemonic names of people from the web”, Digital Libraries: Achievements, Challenges and Opportunities: Lecture Notes in Computer Science, Vol. 4312, pp. 121-130, 2006.
  • Vinay Bhat, Tim Oates, Vishal Shanbhag and Charles Nicholas, “Finding aliases on the web using latent semantic analysis”, Data & Knowledge Engineering, Vol. 49, No. 2, pp. 129-143, 2004.
  • Hsin-Hsi Chen, Ming-Shun Lin and Yu-Chuan Wei, “Novel association measures using web search with double checking”, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp. 1009-1016, 2006.
  • William W. Cohen, Pradeep Ravikumar and Stephen E. Fienberg, “A comparison of string metrics for matching names and records”, KDD Workshop On Data Cleaning And Object Consolidation, Vol. 3, pp. 73-78, 2003.
  • Tarique Anwar, Muhammad Abulaish and Khaled Alghathbar, “Web content mining for alias identification: A first step towards suspect tracking”, Proceedings of the IEEE International Conference on Intelligence and Security Informatics, pp. 195-197, 2011.
  • Peter Mika, “Bootstrapping the foaf-web: An experiment in social networking network mining”, Proceedings of 1st Workshop on Friend of a Friend, Social Networking and the Semantic Web, 2004.
  • Yutaka Matsuo, Junichiro Mori, Masahiro Hamasaki, Takuichi Nishimura, Hideaki Takeda, Koiti Hasida and Mitsuru Ishizuka, “POLYPHONET: an advanced social network extraction system from the web”, Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 5, No. 4, pp. 262-278, 2006.
  • Danushka Bollegala, Yutaka Matsuo and Mitsuru Ishizuka, “A web search engine-based approach to measure semantic similarity between words”, IEEE Transactions on Knowledge and Data Engineering, Vol. 23, No. 7, pp. 977-990, 2011.
  • Danushka Bollegala, Yutaka Matsuo, Taiki Honma and Mitsuru Ishizuka, “Identification of personal name aliases on the web”, Proceedings of 17th International Conference on World Wide Web, 2008.
  • Kenneth Ward Church and Patrick Hanks, “Word association norms, mutual information, and lexicography”, Computational linguistics, Vol. 16, No. 1, pp. 22-29, 1990.
  • Rudi L Cilibrasi and Paul MB Vitanyi, “The Google similarity distance”, IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 3, pp. 370-383, 2007.
  • http://in.mathworks.com/help/stats/naive-bayes-classification.html
  • Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan et al. “Top 10 algorithms in data mining”, Knowledge and Information Systems, Vol. 14, No. 1, pp. 1-37, 2008.
  • Ian H Witten and Eibe Frank, “Data Mining: Practical Machine Learning Tools and Techniques”, Morgan Kaufmann, 2005.
  • Ning An, Lilli Jiang, Jianyonng Wang, Ping Luoe, Min Wange and Bing Nan Li, “Towards detecting of alias without string similarity”, Information Sciences, Vol. 261, pp. 89-100, 2014.
  • W. Bruce Croft, Donald Metzler and Trevor Strohman, “Search engines: Information retrieval in practice”, Addison-Wesley, 2010.

Abstract Views: 262

PDF Views: 2




  • Supervised Alias Name Validation Using Statistical Similarity Coefficients

Abstract Views: 262  |  PDF Views: 2

Authors

A. Suruliandi
Department of Computer Science and Engineering, Manonmaniam Sundaranar University, India
P. Selvaperumal
Department of Computer Science and Engineering, Manonmaniam Sundaranar University, India
T. Dhiliphan Rajkumar
Department of Computer Science and Engineering, Manonmaniam Sundaranar University, India

Abstract


Alias name is the surnames for a known name. Extracting and validating alias names is an interesting research topic in language processing and has a number of Natural language processing applications like Information extraction, Information retrieval, Sentimental analysis, Question and answering. Alias name validation involves the process of validating whether a name is alias name or not. In this work, seven statistical similarity coefficients were used as features in classifier to validate alias names. For each name-alias pair, seven statistical similarity coefficient values were calculated and used as features to train a classifier. The trained classifier is then employed to classify whether a name-alias pair is valid or not. Experiments were conducted using Indian name-alias data that has data for 15 persons containing 35 name-alias pairs. Results show that SVM classifier with Radial Basis Function Kernel outperforms all the other classifiers in terms of overall accuracy.

Keywords


Alias Name Extraction, Information Extraction, Web Mining.

References