An alignment-free method for classification of protein sequences
DSpace at IIT Bombay
View Archive InfoField | Value | |
Title |
An alignment-free method for classification of protein sequences
|
|
Creator |
DESHMUKH, S
KHAITAN, S DAS, D GUPTA, M WANGIKAR, PP |
|
Subject |
hidden markov-models
support vector machines database search similarity alignment free classification remote homology detection protein sequence classification amino acid association rules dipeptide frequency |
|
Description |
Protein sequences vary in their length and are not readily amenable to conventional data mining techniques that need mapping in a fixed dimensional space. Thus, majority of the current methods for protein sequence classification are based on alignment of the query sequence either with a sequence or a pro. le of the sequence family. We present a method for mapping of protein sequences in a fixed dimensional descriptor space. The descriptors such as amino acid content and amino acid pair association rules were used along with routinely available classification methods. An experiment on one hundred Pfam families showed classification accuracy of 98% with support vector machines classifier. Information gain based feature selection helped simplify the model and improve accuracy. Interestingly, a large number of the selected features were based on the association rules of Glycine or Aspartic acid residues suggesting their role in the conserved loops among evolutionarily related proteins. Further, in another experiment, the approach was tested for classification of proteins from 39 Pfam families of protein kinases. Support vector machines classifier provided an accuracy of approximately 96%. The method provides an alternative to conventional pro. le based methods for protein sequence classification.
|
|
Publisher |
BENTHAM SCIENCE PUBL LTD
|
|
Date |
2011-07-18T23:34:47Z
2011-12-26T12:50:53Z 2011-12-27T05:37:03Z 2011-07-18T23:34:47Z 2011-12-26T12:50:53Z 2011-12-27T05:37:03Z 2007 |
|
Type |
Article
|
|
Identifier |
PROTEIN AND PEPTIDE LETTERS, 14(7), 647-657
0929-8665 http://dspace.library.iitb.ac.in/xmlui/handle/10054/5101 http://hdl.handle.net/10054/5101 |
|
Language |
en
|
|