Fast prediction of protein domain boundaries using conserved local patterns
DSpace at IIT Bombay
View Archive InfoField | Value | |
Title |
Fast prediction of protein domain boundaries using conserved local patterns
|
|
Creator |
JOSHI, RR
SAMANT, VV |
|
Subject |
assignment
propainor protein structures protein domain boundary points nonparametric statistics dgs psipred |
|
Description |
We have found certain conserved motifs and secondary structural patterns present in the vicinity of interior domain boundary points (dbps) by a data-driven approach without any a priori constraint on the type and number of such features, and without any requirement of sequence homology. We have used these motifs and patterns to rerank the solutions obtained by the well-known domain guess by size (DGS) algorithm. We predict, overall, five solutions. The average accuracy of overall (i.e., top five) predictions by our method [domain boundary prediction using conserved patterns (DPCP)] has improved the average accuracy of the top five solutions of DGS from 71.74 to 82.88 %, in the case of two-continuous-domain proteins, and from 21.38 to 80.56 %, for two-discontinuous-domain proteins. Considering only the top solution, the gains in accuracy are from 0 to 72.74 % for two-continuous-domain proteins with chain lengths up to 300 residues, and from 0 to 62.85 % for those with up to 400 residues. In the case of discontinuous domains, top_min solutions (the minimum number of solutions required for predicting all dbps of a protein) of DPCP improve the average accuracy of DGS prediction from 12.5 to 76.3 % in proteins with chain lengths up to 300 residues, and from 13.33 to 70.84 % for proteins with up to 400 residues. In our validation experiments, the performance of DPCP was also found to be superior to that of domain identification from secondary structure element alignment (DomSSEA), the best method reported so far for efficient prediction of domain boundaries using predicted secondary structure. The average accuracies of the topmost solution of DomSSEA are 61 and 52 % for proteins with up to 300 residues and 400, respectively, in the case of continuous domains; the corresponding accuracies for the discontinuous case are 28 and 21 %.
|
|
Publisher |
SPRINGER
|
|
Date |
2011-08-29T12:54:32Z
2011-12-26T12:58:36Z 2011-12-27T05:48:49Z 2011-08-29T12:54:32Z 2011-12-26T12:58:36Z 2011-12-27T05:48:49Z 2006 |
|
Type |
Article
|
|
Identifier |
JOURNAL OF MOLECULAR MODELING, 12(6), 943-952
1610-2940 http://dx.doi.org/10.1007/s00894-006-0116-0 http://dspace.library.iitb.ac.in/xmlui/handle/10054/12089 http://hdl.handle.net/10054/12089 |
|
Language |
en
|
|