A protein family must be identified, so that the protein can be modified and controlled for using it in the identification of drug target interactions, structure prediction, etc. Protein families are identified using the similarity between protein sequences. Alignment-free approaches use machine learning (ML) techniques for protein family prediction. In this study, two novel ML-based models, viz. a stacked framework of random forest, and a stacked framework of random forest, decision tree and naive Bayes for protein family prediction have been developed for a better identification of protein families. Both the models outperform state-of-the-art methods with an accuracy of 98.21% and 98.49% respectively. The proposed models give better results for twilight zone protein datasets as well.
Keywords
Alignment Free Method, Machine Learning, Protein Family Prediction, Stacked Framework, Twilight-Zone Proteins.
User
Font Size
Information