Protein structure prediction
While genome wide sequencing projects produce torrents of sequence information about the genes, transcripts and proteins, experimental methods for protein tertiary and quaternary structure determination are considerably lagging the need and facing multiple technical difficulties. This gap necessitates the development of prediction methods that would facilitate the access to reliable macromolecular models and provide insights into structural features of proteins without solving those structures experimentally. The prediction can be simplified by projecting a 3D structure onto strings of structural assignments, and therefore it can be cast as a classification problem. For example, one can assign a secondary structure state for each residue, or a number for the solvent accessibility of that residue. Such strings of per-residue assignments are essentially one-dimensional (Figure 1). These 1D predictions are often the first step to gaining insight into the 3D structure and function of a protein.
Accurate prediction of relative solvent accessibilities of amino acid residues in proteins may be used to facilitate protein structure prediction and functional annotation. Toward that goal, we have developed a novel method (SABLE) for improved prediction of relative solvent accessibilities (RSA) (Ref 1). Contrary to other machine learning based methods from the literature we do not impose a classification problem with arbitrary boundaries between the classes. Rather, we seek a continuous approximation of the real valued RSA using non-linear regression with several feed forward and recurrent neural networks, which are combined into a consensus predictor.
Accurate RSA prediction using SABLE, as proved by validation on a large set of non-redundant (with low or no sequence homology to the training set as well as between themselves) proteins, was found to improve performance of 1D prediction methods for other structural characteristics of proteins. By incorporating predicted RSA into the model, we were able to reduce the number of features used and to improve the generalization. Specifically, we showed that predicted RSA can increase accuracy of the prediction of secondary structure (Ref 2), trans-membrane domains (Ref 3), disorder regions and phosphorylation sites (Ref 4).
Figure 1.
Projection of protein structural aspects
onto strings. For example, α-helices are
assigned to 'H', β-sheets to 'E',
unstructured loops to 'C'; area exposed to
solvent can be represented by relative
solvent accessibility expressed in
percentile; trans-membrane regions vs
soluble parts can be mapped as 'T' and 'N',
respectively (here, TM regions highlighted
yellow).
References
- Adamczak R, Porollo A, Meller J. Accurate prediction of solvent accessibility using neural networks-based regression. Proteins. 2004 Sep 1;56(4):753-67. PubMed PMID: 15281128.
- Adamczak R, Porollo A, Meller J. Combining prediction of secondary structure and solvent accessibility in proteins. Proteins. 2005 May 15;59(3):467-75. PubMed PMID: 15768403.
- Cao B, Porollo A, Adamczak R, Jarrell M, Meller J. Enhanced recognition of protein transmembrane domains with prediction-based structural profiles. Bioinformatics. 2006 Feb 1;22(3):303-9. Epub 2005 Nov 17. PubMed PMID: 16293670.
- Swaminathan K, Adamczak R, Porollo A, Meller J. Enhanced prediction of conformational flexibility and phosphorylation in proteins. Adv Exp Med Biol. 2010;680:307-19. PubMed PMID: 20865514.