Record Details

An alignment-free method for classification of protein sequences

DSpace at IIT Bombay

View Archive Info
 
 
Field Value
 
Title An alignment-free method for classification of protein sequences
 
Creator DESHMUKH, S
KHAITAN, S
DAS, D
GUPTA, M
WANGIKAR, PP
 
Subject hidden markov-models
support vector machines
database
search
similarity
alignment free classification
remote homology detection
protein sequence classification
amino acid association rules
dipeptide frequency
 
Description Protein sequences vary in their length and are not readily amenable to conventional data mining techniques that need mapping in a fixed dimensional space. Thus, majority of the current methods for protein sequence classification are based on alignment of the query sequence either with a sequence or a pro. le of the sequence family. We present a method for mapping of protein sequences in a fixed dimensional descriptor space. The descriptors such as amino acid content and amino acid pair association rules were used along with routinely available classification methods. An experiment on one hundred Pfam families showed classification accuracy of 98% with support vector machines classifier. Information gain based feature selection helped simplify the model and improve accuracy. Interestingly, a large number of the selected features were based on the association rules of Glycine or Aspartic acid residues suggesting their role in the conserved loops among evolutionarily related proteins. Further, in another experiment, the approach was tested for classification of proteins from 39 Pfam families of protein kinases. Support vector machines classifier provided an accuracy of approximately 96%. The method provides an alternative to conventional pro. le based methods for protein sequence classification.
 
Publisher BENTHAM SCIENCE PUBL LTD
 
Date 2011-07-18T23:34:47Z
2011-12-26T12:50:53Z
2011-12-27T05:37:03Z
2011-07-18T23:34:47Z
2011-12-26T12:50:53Z
2011-12-27T05:37:03Z
2007
 
Type Article
 
Identifier PROTEIN AND PEPTIDE LETTERS, 14(7), 647-657
0929-8665
http://dspace.library.iitb.ac.in/xmlui/handle/10054/5101
http://hdl.handle.net/10054/5101
 
Language en