Record Details

Fast prediction of protein domain boundaries using conserved local patterns

DSpace at IIT Bombay

View Archive Info
 
 
Field Value
 
Title Fast prediction of protein domain boundaries using conserved local patterns
 
Creator JOSHI, RR
SAMANT, VV
 
Subject assignment
propainor
protein structures
protein domain boundary points
nonparametric statistics
dgs
psipred
 
Description We have found certain conserved motifs and secondary structural patterns present in the vicinity of interior domain boundary points (dbps) by a data-driven approach without any a priori constraint on the type and number of such features, and without any requirement of sequence homology. We have used these motifs and patterns to rerank the solutions obtained by the well-known domain guess by size (DGS) algorithm. We predict, overall, five solutions. The average accuracy of overall (i.e., top five) predictions by our method [domain boundary prediction using conserved patterns (DPCP)] has improved the average accuracy of the top five solutions of DGS from 71.74 to 82.88 %, in the case of two-continuous-domain proteins, and from 21.38 to 80.56 %, for two-discontinuous-domain proteins. Considering only the top solution, the gains in accuracy are from 0 to 72.74 % for two-continuous-domain proteins with chain lengths up to 300 residues, and from 0 to 62.85 % for those with up to 400 residues. In the case of discontinuous domains, top_min solutions (the minimum number of solutions required for predicting all dbps of a protein) of DPCP improve the average accuracy of DGS prediction from 12.5 to 76.3 % in proteins with chain lengths up to 300 residues, and from 13.33 to 70.84 % for proteins with up to 400 residues. In our validation experiments, the performance of DPCP was also found to be superior to that of domain identification from secondary structure element alignment (DomSSEA), the best method reported so far for efficient prediction of domain boundaries using predicted secondary structure. The average accuracies of the topmost solution of DomSSEA are 61 and 52 % for proteins with up to 300 residues and 400, respectively, in the case of continuous domains; the corresponding accuracies for the discontinuous case are 28 and 21 %.
 
Publisher SPRINGER
 
Date 2011-08-29T12:54:32Z
2011-12-26T12:58:36Z
2011-12-27T05:48:49Z
2011-08-29T12:54:32Z
2011-12-26T12:58:36Z
2011-12-27T05:48:49Z
2006
 
Type Article
 
Identifier JOURNAL OF MOLECULAR MODELING, 12(6), 943-952
1610-2940
http://dx.doi.org/10.1007/s00894-006-0116-0
http://dspace.library.iitb.ac.in/xmlui/handle/10054/12089
http://hdl.handle.net/10054/12089
 
Language en