Open Access Open Access  Restricted Access Subscription Access

Knowledge based Approach for English-Malayalam Parallel Corpus Generation


Affiliations
1 Centre for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Amrita University, Coimbatore - 641112, Tamil Nadu, India
 

Objective: This paper aims in providing an overview about a part of Natural Language Generation – Parallel sentence generation which involves the generation of the English sentence as well as its Malayalam translated version. Methods/Analysis: A template based sentence generator approach is followed here. A system is proposed which takes input from a manually created bilingual dictionary and fills the slots in the template for parallel sentence generation. Finding: Using the proposed method, we have generated a total of 25,208 parallel sentences. This can be used in bilingual Machine Translation dictionary. Application/Improvement: In the proposed case use only four templates but by increasing the number of templates and by updating the dictionary, we can increase the size of the parallel corpus that can be generated.

Keywords

Bilingual, English-Malayalam, Machine Translation, Parallel sentence, Templates.
User

Abstract Views: 160

PDF Views: 0




  • Knowledge based Approach for English-Malayalam Parallel Corpus Generation

Abstract Views: 160  |  PDF Views: 0

Authors

Sooraj Sudhakaran
Centre for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Amrita University, Coimbatore - 641112, Tamil Nadu, India
Shimil Jose
Centre for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Amrita University, Coimbatore - 641112, Tamil Nadu, India
M. Anand Kumar
Centre for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Amrita University, Coimbatore - 641112, Tamil Nadu, India
K. P. Soman
Centre for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Amrita University, Coimbatore - 641112, Tamil Nadu, India

Abstract


Objective: This paper aims in providing an overview about a part of Natural Language Generation – Parallel sentence generation which involves the generation of the English sentence as well as its Malayalam translated version. Methods/Analysis: A template based sentence generator approach is followed here. A system is proposed which takes input from a manually created bilingual dictionary and fills the slots in the template for parallel sentence generation. Finding: Using the proposed method, we have generated a total of 25,208 parallel sentences. This can be used in bilingual Machine Translation dictionary. Application/Improvement: In the proposed case use only four templates but by increasing the number of templates and by updating the dictionary, we can increase the size of the parallel corpus that can be generated.

Keywords


Bilingual, English-Malayalam, Machine Translation, Parallel sentence, Templates.



DOI: https://doi.org/10.17485/ijst%2F2016%2Fv9i45%2F128542