Summary: This module introduces students to the problem of Protein Folding, its importance, and current computational methods used to attack its complexity.
We have already convinced ourselves that most of the activities in living organisms are regulated by proteins. All proteins start out on a ribosome as a linear sequence of aminoacids. This linear sequence must fold during and after the synthesis so that the protein can take up its native conformation . Recall that the native conformation of a protein is a stable three-dimensional structure that strongly determines a protein's biological function. The native conformation of a protein is only marginally stable because it depends on the environment. Modest changes in the environment can cause structural changes in the protein, thus affecting its function. Proteins are very used to the cell environment. Therefore, environmental conditions different from those in the cell can result in structural changes. When a protein loses its biological function as a result of a loss of three-dimensional structure, we say that the protein has undergone denaturation. Proteins can be denatured not only by heat but also by extremes of pH, since these two extreme conditions affect the weak interactions and the hydrogen bonds, which are mainly responsible for a protein's three-dimensional structure. It is important to understand that the denatured state of the protein does not equate with the unfolding of the protein and randomization of conformation. Actually, denatured proteins exist in a set of partially folded states that are currently poorly understood.
The folding pathway of a large polypeptide chain is very complicated, and not all the principles that guide the process have been worked out. However, many plausible models have attempted to describe protein folding. One model views folding as a hierarchical process where local secondary structures form first. Under this model,
| Free Energy Funnel |
|---|
![]() |
It has been experimentally confirmed that not all proteins fold spontaneously in the cell. For many proteins the folding process is facilitated by the action of specialized proteins known as chaperones. Molecular chaperones are proteins that interact with partially folded or improperly folded polypeptides to faciliate correct folding pathways of provide microenvironments so that folding can occur. Chaperones are not the only proteins to facilitate protein folding. Two enzymes, protein disulfide isomerase (PDI) and peptide prolyl cis-trans isomerase(PPI), catalyze isomerization reactions and are required for the folding pathways of a number of proteins.
There are three major theoretical methods for predicting the structure of proteins: Comparative Modelling, Fold Recognition, and ab initio Prediction.
Comparative modelling makes use of the fact that evolutionarily related proteins with similar sequences have similar structures. Sequence similarity is measured by the percentage of identical residues at each position based on an optimal structural superposition. The similarity of structures is very high in the so-called ``core regions'', which typically consist of secondary structure elements such as
Threading uses a database of known three-dimensional structures to match sequences without known structure with protein folds. This is accomplished through a scoring function that assesses the fit of a sequence to a given fold. These scoring functions are usually derived from a database of known structures and generally include a pairwise atom contact and solvation terms. Threading methods are very similar to comparative modelling in that threading compares a target sequence against a library of structural templates, producing a list of scores. The scores are then ranked and the fold with the best score is assumed to be the one adopted by the sequence. The methods to fit a sequence against a library of folds can be extremely elaborate computationally, such as those involving double dynamic programming, Gibbs Sampling using a database of threading cores, and branch and bound heuristics, or sequence alignment methods based on Hidden Markov Models. For an example scoring function used in Threading, please read An empirical energy function for threading protein sequence through the folding motif.
The ab initio approach is a mixture of science and engineering. The science is in understanding how the three-dimensional structure of proteins is attained. The engineering portion is in deducing the three-dimensional structure given the sequence. The major challenge with regards to the folding problem is with regards to ab initio prediction, which can be broken down into two components: devising a scoring function that can distinguish between correct (native or native-like) structures from incorrect (non-native) ones, and a search method to explore the conformational space. In many ab initio methods, the two components are coupled together such that a search function drives, and is driven by, the scoring function to find native-like structures. Currently there is no reliable and general scoring function that can always drive a search to a native fold, and there is no reliable and general search method that can sample the conformation space adequately to guarantee a significant fraction of near-natives (less than 3.0 angstroems RMSD from the experimental structure). Some methods for ab initio prediction include Molecular Dynamics (MD) simulations of proteins, Monte Carlo (MC) simulations that do not use forces but rather compare energies, and Genetic Algorithms which try to improve on the sampling and the convergence of MC approaches. For a more detailed discussion, please visit Ab initio protein structure modeling methods.
Novel computational methods and large scale distributed computing are being used by Folding@Home to simulate folding and to examine folding related diseases. Please visit Folding@Home to learn more about this distributed computing project.
It is very important for proteins to achieve their native conformation since failure to do so may lead to serious problems in the accomplishment of its biological function. Defects in protein folding may be the molecular cause of a range of human genetic disorders. For example, cystic fibrosis is caused by defects in a membrane-bound protein called cystic fibrosis transmembrane conductance regulator (CFTR). This protein serves as a channel for chloride ions. The most common cystic fibrosis-causing mutation is the deletion of a Phe residue at position 508 in CFTR, which causes improper folding of the protein. Many of the disease-related mutations in collagen alco cause defective folding. A misfolded protein known as prion appears to be the agent of a number of rare degenerative brain diseases in mammals, like the mad cow disease. Related diseases include kuru and Creutzfeldt-Jakob. The diseases are sometimes referred to as spongiform encephalopathies, so named because the brain becomes riddled with holes. Prion, the misfolded protein, is a normal constituent of brain tissue in all mammals, whose function is not yet known. A complete understanding of prion diseases awaits new information about how prion protein affects brain function, as well as more detailed structural information about the protein. Therefore, improved understanding of protein folding may lead to new therapies for cystic fibrosis, Creutzfeldt-Jakob, and many other diseases.