Skip to content Skip to navigation

Connexions

You are here: Home » Content » ExPASy Proteomics Tools II

Navigation

Recently Viewed

This feature requires Javascript to be enabled.

ExPASy Proteomics Tools II

Module by: Susan Cates. E-mail the author

User rating (How does the rating system work?)
Ratings

Ratings allow you to judge the quality of modules. If other users have ranked the module then its average rating is displayed below. Ratings are calculated on a scale from one star (Poor) to five stars (Excellent).

How to rate a module

Hover over the star that corresponds to the rating you wish to assign. Click on the star to add your rating. Your rating should be based on the quality of the content. You must have an account and be logged in to rate content.

:
(0 ratings)

Summary: This is part II in a two-part series describing the many proteomics tools available from the ExPASy website. This module introduces tools for protein topology prediction, primary structure analysis, secondary structure prediction and tertiary structure prediction and visualization.

This module is a continuation of the tutorial in the proteomics tools accessible at the ExPASy (Expert Protein Analysis System) website (1). Once a protein sequence has been determined through proteomics techniques, bioinformatics can be used to predict certain types of topology. Topology is the sequence of secondary structure elements within a protein. The most basic secondary structure elements within proteins are the alpha helix, the beta sheet and the random coil. However, some algorithms will predict topological features that are closely related to in vivo localization, such as signal sequences and transmembrane helices.

At the ExPASy Proteomics Tools server, scroll down on the ExPASy tools webpage to the section entitled "topology prediction". This section contains tools that predict localization and sorting signals, as well as transmembrane regions within proteins. PSORT (2) is a computer program for the prediction of protein localization. It requires input of an amino acid sequence and its source organism; and it searches for known, organism-specific protein sorting signals. It returns a list of candidate localization sites, accompanied by a score indicating the probability the protein encoded by the input sequence would be localized to that site. To explore the use of PSORT, click on the PSORT link on the ExPASy tool page. Choose the "PSORT II" link. Cut and paste the following sequence for diacylglycerol kinase from Rattus norvegicus into the query box and click "Submit".


MEPRDPSPEARSSDSESASASSSGSERDADPEPDKAPRRLTKRRFPGLRLFGHRKAITKSGLQHLAPPPP
TPGAPCGESERQIRSTVDWSESAAYGEHIWFETNVSGDFCYVGEQYCVAKMLPKSAPRRKCAACKIVVHT
PCIGQLEKINFRCKPSFRESGSRNVREPTFVRHHWVHRRRQDGKCRHCGKGFQQKFTFHSKEIVAISCSW
CKQAYHSKVSCFMLQQIEEPCSLGVHAAVVIPPTWILRARRPQNTLKASKKKKRASFKRRSSKKGPEEGR
WRPFIIRPTPSPLMKPLLVFVNPKSGGNQGAKIIQSFLWYLNPRQVFDLSQGGPREALEMYRKVHNLRIL
ACGGDGTVGWILSTLDQLRLKPPPPVAILPLGTGNDLARTLNWGGGYTDEPVSKILSHVEEGNVVQLDRW
DLRAEPNPEAGPEERDDGATDRLPLDVFNNYFSLGFDAHVTLEFHESREANPEKFNSRFRNKMFYAGTAF
SDFLMGSSKDLAKHIRVVCDGMDLTPKIQDLKPQCIVFLNIPRYCAGTMPWGHPGEHHDFEPQRHDDGYL
EVIGFTMTSLAALQVGGHGERLTQCREVLLTTAKAIPVQVDGEPCKLAASRIRIALRNQATMVQKAKRRS
TAPLHSDQQPVPEQLRIQVSRVSMHDYEALHYDKEQLKEASVPLGTVVVPGDSDLELCRAHIERLQQEPD
GAGAKSPMCHPLSSKWCFLDATTASRFYRIDRAQEHLNYVTEIAQDEIYILDPELLGASARPDLPTPTSP
LPASPCSPTPGSLQGDAALPQGEELIEAAKRNDFCKLQELHRAGGDLMHRDHQSRTLLHHAVSTGSKEVV
RYLLDHAPPEILDAVEENGETCLHQAAALGQRTICHYIVEAGASLMKTDQQGDTPRQRAEKAQDTELAAY
LENRQHYQMIQREDQETAV


First, view the "k-NN" results by scrolling to the bottom of the page. The k-nearest neighbor (k-NN) algorithm takes the output of the many subprograms and determines a probability of localization at each candidate site within the cell using all of the predictions. Q1: What is the probability the sequence encodes a protein that is (a) secreted by vesicles? (b) localized to the endoplasmic reticulum? (c) cytoplasmic? or (d) localized to the nucleus? Now, scroll through the results of the subprograms. Clicking on the links will reveal a brief description of the algorithm each individual subprogram utilizes. Q2: What is the localization prediction and reliability score produced by the NNCN subprogram, Reinhardt's methods for cytoplasmic/nuclear discrimination? The first two subprograms, PSG and GvH, predict N-terminal signal peptide sequences. Just after their results are listed, there is a statement summarizing whether or not an N-terminal signal peptide has been predicted for the query sequence. Q3: Do these subprograms predict an N-terminal signal peptide for the diacylglycerol kinase query?; Q4: After looking over all the results, what is the most likely localization of our query protein? Read the title and abstract for this article on the Rat diacylglycerol kinase used for the query sequence. Q5: Was PSORT able to predict the correct localization, using the sequence information alone?

Return to the ExPASy tools, and scroll to the section entitled "primary structure analysis". Click on the link for the ProtParam tool. ProtParam is a suite of programs designed to predict various chemical and physical properties about a protein from its sequence. ProtParam will yield an estimated extinction coefficient at selected wavelengths based on protein sequence (3), an estimation of the in vivo half-life of the protein (4 5 6 7), an instability index (8), an aliphatic index (9), and an average value for hydropathicity (10). Cut and paste the Rat diacylglycerol kinase sequence above into the query box and click on "compute parameters". Q6: What is the molecular weight computed from the sequence?; Q7: What does the amino acid composition analysis show as the most common amino acid in this protein? (Is that unusual?); Q8: What is the chemical formula for the query protein?; Q9: What is the predicted extinction coefficient at 280 nm, in 6M guanidium HCl, 0.02M phosphate, pH6.5 buffer, assuming all cysteines appear as half cysteines?; Q10: In what way could it be helpful to know the extinction coefficient?; Q11: According to the instability index, is this protein classified as stable or unstable?

Return again to the ExPASy tools. Notice there are two sections dealing with structure prediction, secondary structure prediction tools and tertiary structure prediction and visualization tools. The secondary structure prediction tools are designed to predict features such as the helical content, the beta sheet formations, and the turns, loops, and coil regions within a protein, given the sequence. Tertiary structure prediction tools match the query sequence with sequences, or partial sequences, of proteins where the 3-D structure has been published in the Protein Data Bank (PDB). These tools will produce a model of the query protein by piecing together the structural regions from the best matches in the PDB, and threading the query sequence through the predicted structure. For more detailed explanations of available 3-D structure prediction software, view the Swiss-Model demo page and the Geno3D reference page. Although both of these tools are searching for templates from existing PDB entries, they are doing this in different ways. Q12: What program does Swiss-Model use to match the query sequence with sequences of known structures?; Q13: What program does Geno3D use to match the query sequence with sequences of known structures? Notice that the template selection process and the model structure refinement processes are different between these two programs as well.

Finally, in the tertiary structure section of the ExPASy tools page, Swiss PDB Viewer is a graphical tool for the visualization, comparison and analysis of 3-D coordinate files. Swiss PDB Viewer can superimpose 3-D structures by finding the rotation and translation that most closely aligns the two protein structures. Additionally, the Swiss PDB Viewer will perform amino acid mutations, prediction of hydrogen bonds, and calculation of angles and distances between atoms. Best of all, Swiss PDB Viewer is freeware and available for many different platforms, including Macintosh, PC, SGI IRIX, and Linux. Q14: View this supplemental SPDBV web page. What other function does Swiss PDB Viewer have, when used in conjunction with other applications such as OpenGL or POV-Ray?

ExPASy provides an excellent library of tools, for proteomics as well as other bioinformatics applications. BONUS QUESTION: (1 point): Explore the secondary structure tools independently, and submit the diacylglycerol kinase sequence above to any of the available secondary structure prediction tools. At the time of this writing, all of these tools will email the results, with at least a 20 minute delay between submission and receipt of results. Forward a results summary to the instructor, outlining the predictions created by the program of choice.

References

  1. Appel R.D., Bairoch A., Hochstrasser D.F. (1994). A new generation of information retrieval tools for biologists: the example of the ExPASy WWW server. Trends Biochem. Sci., 19:258-260.
  2. Paul Horton and Kenta Nakai. (1997). Better Prediction of Protein Cellular Localization Sites with the k Nearest Neighbors Classifier. Intelligent Systems for Molecular Biology, 5:147-152.
  3. Gill S.C., von Hippel P.H. (1989). Calculation of protein extinction coefficients from amino acid sequence data. Anal. Biochem., 182:319-326.
  4. Bachmair A., Finley D., Varshavsky A. (1986). In vivo half-life of a protein is a function of its amino-terminal residue. Science, 234:179-186.
  5. Gonda D.K., Bachmair A., Wunning I., Tobias J.W., Lane W.S., Varshavsky A. (1989). Universality and structure of the N-end rule. J. Biol. Chem., 264:16700-16712.
  6. Tobias J.W., Shrader T.E., Rocap G., Varshavsky A. (1991). The N-end rule in bacteria. Science, 254:1374-1377.
  7. Ciechanover A., Schwartz A.L. (1989). How are substrates recognized by the ubiquitin-mediated proteolytic system? Trends Biochem. Sci., 14:483-488.
  8. Guruprasad K., Reddy B.V.B., Pandit M.W. (1990). Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Engineering, 4:155-161.
  9. Ikai A. (1980). Thermostability and aliphatic index of globular proteins. J. Biochem., 88:1895-1898.
  10. Kyte, J., Doolittle, R.F. (1982). A simple method for displaying the hydropathic character of a protein. J. Mol. Biol., 157:105-132.

Content actions

Give Feedback:

E-mail the module author | Rate module ( How does the rating system work?)

Rating system

Ratings

Ratings allow you to judge the quality of modules. If other users have ranked the module then its average rating is displayed below. Ratings are calculated on a scale from one star (Poor) to five stars (Excellent).

How to rate a module

Hover over the star that corresponds to the rating you wish to assign. Click on the star to add your rating. Your rating should be based on the quality of the content. You must have an account and be logged in to rate content.

(0 ratings)

Download:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.

| A lens (?)

Definition of a lens

Lenses

A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual Connexions member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks