A comparative analysis of amino acid encoding schemes for the prediction of flexible length linear B-cell epitopes

Tanmaya Kumar Sahu; Prabina Kumar Meher; Nalini Kanta Choudhury; Atmakuri Ramakrishna Rao

KRISHI

ICAR RESEARCH DATA REPOSITORY FOR KNOWLEDGE MANAGEMENT
(An Institutional Publication and Data Inventory Repository)

"Not Available": Please do not remove the default option "Not Available" for the fields where metadata information is not available
"1001-01-01": Date not available or not applicable for filling metadata infromation

Please use this identifier to cite or link to this item: http://krishi.icar.gov.in/jspui/handle/123456789/76967

Title:	A comparative analysis of amino acid encoding schemes for the prediction of flexible length linear B-cell epitopes
Other Titles:	Not Available
Authors:	Tanmaya Kumar Sahu Prabina Kumar Meher Nalini Kanta Choudhury Atmakuri Ramakrishna Rao
ICAR Data Use Licennce:	http://krishi.icar.gov.in/PDF/ICAR_Data_Use_Licence.pdf
Author's Affiliated institute:	ICAR::Indian Agricultural Statistics Research Institute
Published/ Complete Date:	2022-08-23
Project Code:	Not Available
Keywords:	epitope prediction machine learning peptide encoding random forest vaccine designing linear B-cell epitopes
Publisher:	Briefings in Bioinformatics
Citation:	Not Available
Series/Report no.:	Not Available;
Abstract/Description:	Linear B-cell epitopes have a prominent role in the development of peptide-based vaccines and disease diagnosis. High variability in the length of these epitopes is a major reason for low accuracy in their prediction. Most of the B-cell epitope prediction methods considered fixed length of epitope sequences and achieved good accuracy. Though a number of tools are available for the prediction of flexible length linear B-cell epitopes with reasonable accuracy, further improvement in the prediction performance is still expected. Thus, here we made an attempt to analyze the performance of machine learning approaches (MLA) with 18 different amino acid encoding schemes in the prediction of flexible length linear B-cell epitopes. We considered B-cell epitope sequences of variable lengths (11–56 amino acids) from well-established public resources. The performances of machine learning algorithms with the encoded epitope sequence datasets were evaluated. Besides, the feasible combinations of encoding schemes were also explored and analyzed. The results revealed that amino-acid composition (AC) and distribution component of composition–transition–distribution encoding schemes are suitable for heterogeneous epitope data, whereas amino-acid-anchoring-pair-composition (APC), dipeptide-composition and amino-acids-pair-propensity-scale (APP) are more appropriate for homogeneous data. Further, two combinations of peptide encoding schemes, i.e. APC + AC and APC + APP with random forest classifier were identified to have improved performance over the state-of-the-art tools for flexible length linear B-cell epitope prediction. The study also revealed better performance of random forest over other considered MLAs in the prediction of flexible length linear B-cell epitopes.
Description:	Not Available
ISSN:	Not Available
Type(s) of content:	Research Paper
Sponsors:	Not Available
Language:	English
Name of Journal:	Briefings in Bioinformatics
Impact Factor:	13.99
Volume No.:	23(5)
Page Number:	bbac356
Name of the Division/Regional Station:	Not Available
Source, DOI or any other URL:	https://doi.org/10.1093/bib/bbac356
URI:	http://krishi.icar.gov.in/jspui/handle/123456789/76967
Appears in Collections:	AEdu-IASRI-Publication

Files in This Item:

There are no files associated with this item.

Show full item record

KRISHI

ICAR RESEARCH DATA REPOSITORY FOR KNOWLEDGE MANAGEMENT (An Institutional Publication and Data Inventory Repository)

ICAR RESEARCH DATA REPOSITORY FOR KNOWLEDGE MANAGEMENT
(An Institutional Publication and Data Inventory Repository)