A Probabilistic Smoothing Approach for Language Models Applied to Protein Sequence Data

Gopal Suresh; Chellapa Vijayalakshmi

A Probabilistic Smoothing Approach for Language Models Applied to Protein Sequence Data

Gopal Suresh ¹, Chellapa Vijayalakshmi ²

Affiliations
1 Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli-627012, India
2 Department of Mathematics, Sathyabama University, Chennai, India

Most modern techniques for statistical processing of language modeling are widely applied to many domains such as Speech recognition, Machine translation and Information Retrieval etc. The basic idea behind the language model is probabilistic, which describes the task of probability estimation defined over strings frequently designed as a sentence. One of the core problem addresses a language model is termed as smoothing, its primitive goal is to improve the model accuracy by ajusting the maximum likelihood estimate of probabilities. To retrieve this challenge, the paper focuses a well-known smoothing technique called Good-Turing, applied over a bioinformatics task of protein sequence. Also, the computational procedure of this technique uses an R program to estimate bigram and trigram probabilities of language models for the protein sequence. Experimental results shows the appropriate fitting of exponential and linear smoothing curves defined over bigram and trigram sequences respectively, with very high model accuracy.

Keywords

Bigram Model, Language Model, Smoothing N-Gram Model, Trigram Model.

I-Scholar

Journal Help

User

Subscription Login to verify subscription

Notifications

Journal Content
Browse

Font Size

Information

Abstract Views: 152

PDF Views: 2

Biometrics and Bioinformatics

A Probabilistic Smoothing Approach for Language Models Applied to Protein Sequence Data

Subscribe/Renew Journal

Keywords

A Probabilistic Smoothing Approach for Language Models Applied to Protein Sequence Data

Authors

Abstract

Keywords

Username
Password
Remember me

Username
Password
Remember me