Skip to content Skip to navigation

OpenStax-CNX

You are here: Home » Content » Molecular Shapes and Surfaces

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice Digital Scholarship

    This module is included in aLens by: Digital Scholarship at Rice UniversityAs a part of collection: "Geometric Methods in Structural Computational Biology"

    Click the "Rice Digital Scholarship" link to see all content affiliated with them.

Also in these lenses

  • eScience, eResearch and Computational Problem Solving

    This module is included inLens: eScience, eResearch and Computational Problem Solving
    By: Jan E. OdegardAs a part of collection: "Geometric Methods in Structural Computational Biology"

    Click the "eScience, eResearch and Computational Problem Solving" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.
 

Molecular Shapes and Surfaces

Module by: Lydia E. Kavraki. E-mail the author

Summary: This module introduces students to a family of algorithms for assessing molecular shape, volume, surface area, and negative space (i.e., pockets and cavities).

Introduction

Many problems in structural biology, require a researcher to understand the shape of a protein. At first glance, this may seem obvious. By opening a molecular visualizer, one can easily see the shape of a protein. But what about calculating the surface area or volume of the protein? What about performing analyses of the surface, such as looking for concave pockets in a protein that might be binding sites for other molecules? What about calculating the volume and shape of those empty binding pockets, in order to find molecules that might fit in them? What about determining whether a particular small molecule can fit in a binding pocket?

All of these problems require some formal notion of the shape of a protein. A protein structure file usually provides no more information than a list of atom locations in space and their types. It will be assumed that for any given application, a radius may be defined for each atom type. This leads to the space filling representation of a protein, in which each atom is treated as an impenetrable sphere.

Figure 1: A space filling representation of HIV-1 protease (yellow) with an inhibitory drug (red) blocking its binding site.
HIV-1 Protease
 HIV-1 Protease  (hiv.png)
This representation allows for visualization, but it brings us no closer to being able to computationally decide which parts of which atoms are on the surface of the protein and which are buried inside the structure. Some additional tool is needed to capture notions of interior and exterior and spatial adjacency.

Representing Shape

Using the sphere model for atoms, one way to define the shape of a molecule is as the union of (possibly overlapping) balls in R 3 R 3 .

Figure 2: The space filling diagram models each atom as a sphere in 3D.
Space Filling Diagram
 Space Filling Diagram  (vdw-spheres.png)
Since proteins inside our cells are in an aqueous environment, considering a protein's interactions with solvent molecules, particularly water, is very important for appropriately modeling them. Recall that one of the phenomena that determines the structure of a protein is the hydrophobic effect: some amino acid residues are stabilized by the presence of water, and others are repelled. The extent of the interaction of a protein with the surrounding water depends on the surface area of the protein that can be reached by water molecules. Therefore, quantitave modeling of the strength of interaction with solvent often involves computing the solvent accessible surface area (SASA). Computing SASA can be done by regarding each solvent molecule as a sphere of set radius. This is of course a simplification, since water molecules are not spherical. When this sphere rolls about the molecule, its center delineates the SASA. One can think of the SASA of a molecule as the result of growing each atom sphere by the radius of the solvent sphere. Instead, by taking what is swept out by the front of the solvent sphere, we obtain the molecular surface (MS) model of the molecule. Alternatively, the MS can be obtained by removing a layer of solvent radius depth from the SASA model.
Figure 3: Two different notions and representations of the surface of a molecule.
Representations of Molecular Shape
VDW Representation Accessible Surface Area
(a) Each atom can be modeled as a Van der Waals sphere in three dimensions. The union of the spheres gives the molecular surface. (b) Not all molecular surface is accessible to solvent due to the existence of small cavities. Rolling a solvent ball over the Van der Waals spheres traces out the surface area experienced by the solvent. Solvent accessible surface area (SASA) is a very important measure for quantitatively determining the behavior and interaction tendencies of a protein.
 VDW Representation  (vdw-disks.png) Accessible Surface Area  (sa-ms.png)
The surface determined by SASA analysis depends on the size of a typical solvent molecule. The larger the solvent, the less contoured the resulting surface will appear, because a larger probe molecules would not be able to fit into some of the interatomic spaces that a smaller one would.
Figure 4: Solvent-accessible surface area (SASA) for two different solvent radii.
Solvent Accessible Surface Area
Probing the surface area with a solvent ball of radius 1.4 Å Probing the surface area with a solvent ball of radius 1.5 Å
(a) Typically, solvent is modeled as a ball of radius 1.4 Å. This delineates the solvent accessible surface shown. (b) Increasing the radius of the solvent ball reduces the solvent accessible surface area because there are more cavities that a bulkier ball cannot penetrate.
 Probing the surface area with a solvent ball of radius 1.4 Å (1_4_probe.png) Probing the surface area with a solvent ball of radius 1.5 Å (1_5_probe.png)

Alpha-Shapes

Part of the problem with defining the shape of a protein is that we start with nothing but a point set, and the "shape" of a set of discontinuous points is poorly defined. The problem is, what do we mean by shape? As you saw above, the shape of a molecule depends on what is being used to measure it. To handle this ambiguity, we will introduce a method of shape calculation based on a parameter, α, which will determine the radius of a spherical probe that will define the surface. The method defines a class of shapes, called α-shapes [4] for any given point set. It allows fast, accurate, and efficient calculations of volume and surface area.

α-shapes are a generalization of the convex hull. Consider a point set S. Define an α-ball as a sphere of radius α. An α-ball is empty if it contains no points in S. For any α between zero and infinity, the α-hull of S is the complement of the union of all empty α-balls.

  • For α of infinity, the α-shape is the convex hull of S.
  • For α smaller than the 1/2 smallest distance between two points in S, the α-shape is S itself.
  • For any α in between, one can think of the α-hull as the largest polygon (polyhedron) or set thereof whose vertices are in the point set and whose edges are of length less than 2α. The presence of an edge indicates that a probe of radius α cannot pass between the edge endpoints.
Figure 5: Some α-shapes are shown for a point set and various values of α. On the left, α is 0 or slightly more, such that an α-ball can fit between any two points in the set. The α-shape is therefore the original point set. On the right, α is infinity, so an α-ball can be approximated locally by a line. α on this scale yields the convex hull of the point set. The middle image shows the α-shape for α equal to the radius of the ball shown. This yields two disjoint boundaries, one of which has a significant indentation. Voids, or empty pockets completely enclosed by the α-shape, are also possible, for instance if the α-shape is ring-like (in 2D) or forms a hollow shell (in 3D).
Two-Dimensional α-Shapes
 Two-Dimensional α-Shapes  (alpha_illustration.jpg)

Computing the Alpha-Shape: Delaunay Triangulation

A triangulation of a three-dimensional point set S is any decomposition of S into non-intersecting tetrahedra (triangles for two-dimensional point sets). The Delaunay triangulation of S is the unique triangulation of S satisfying the additional requirement that no sphere circumscribing a tetrahedron in the triangulation contains any point in S. Although it is incidental to α-shapes, it is worth noting that the Delaunay triangulation maximizes the average of the smallest angle over all triangles. In other words, it favors relatively even-sided triangles over sharp and stretched ones.

Figure 6: The Delaunay triangulation of the four points given is shown on the right. Note that the circumscribing circles on the left each contain one point of S, whereas the circles on the right do not. The transition from the triangulation on the left to that on the right is called an edge flip, and is the basic operation of constructing a two-dimensional Delaunay triangulation. Face flipping is the analogous procedure for five points in three dimensions.
Two-Dimensional Delaunay Triangulation
 Two-Dimensional Delaunay Triangulation  (delaunaytriangle.png)
The Delaunay triangulation of a point set is usually calculated by an incremental flip algorithm as follows:
  • The points of S are sorted on one coordinate (x, y, or z). This step is not strictly necessary but makes the algorithm run faster than if the points were in arbitrary order.
  • Each point is added in sorted order. Upon adding a point:
    • The point is connected to previously added points that are "visible" to it, that is, to points to which it can be connected by a line segment without passing through a face of a tetrahedron.
    • Any new tetrhedra formed are checked and flipped if necessary.
    • Any tetrahedra adjacent to flipped tetrahedra are checked and flipped. This continues until further flipping is unnecessary, which is guaranteed to occur
This algorithm runs in worst case O(n^2) time, but expected O(n^(3/2)) time. Without the sort in the first step, the expected case would be O(n log n). A full description and analysis of Delaunay triangulation algorithms is given in [1], chapter 9.

From the Delaunay triangulation the α-shape is computed by removing all edges, triangles, and tetrahedra that have circumscribing spheres with radius greater than α. Formally, the α-complex is the part of the Delaunay triangulation that remains after removing edges longer than α. The α-shape is the boundary of the α-complex.

Pockets [3] can be detected by comparing the α-shape to the whole Delauney triangulation. Missing tetrahedra represent indentations, concavity, and generally negative space in the overall volume occupied by the protein. Particularly large or deep pockets may indicate a substrate binding site.

Weighted Alpha Shapes

Regular α-shapes can be extended to deal with varying weights (i.e., spheres with different radii, such as different types of atoms) [2]. The formal definitions become complicated, but the key idea is to use a pseudo distance measure that uses the weights. Suppose we have two atoms at positions p1 and p2 with weights w1 and w2. Then the pseudo distance is defined as the square of the Euclidean distance minus the weights. The pseudo distance is zero if and only if two spheres centered at p1 and p2 with radii equal to sqrt(w1) and sqrt(w2) are just touching.

Figure 7: Pseudo distance to account for atoms of different sizes.
Figure 7 (alpha_relation2.png)

Calculating Molecular Volume Using α-Shapes

The volume of a molecule can be approximated using the space-filling model, in which each atom is modeled as a ball whose radius is α, where α is selected depending on the model being used: Van der Waals surface, molecular surface, solvent accessible surface, etc. Unfortunately, calculating the volume is not as simple as taking the sum of the ball volumes because they may overlap. Calculating the volume of a complex of overlapping balls is non-trivial because of the overlaps. If two spheres overlap, the volume is the sum of the volumes of the spheres minus the volume of the overlap, which was counted twice. If three overlap, the volume is the sum of the ball volumes, minus the volume of each pairwise overlap, plus the volume of the three-way overlap, which was subtracted one too many times in accounting for the pairwise overlaps. In the general case, all pairwise, three-way, four-way and so on to n-way intersections (assuming there are n atoms) must be considered. Proteins generally have thousands or tens of thousands of atoms, so the general n-way case may be computationally expensive and may introduce numerical error.

Figure 8: Three overlapping discs (balls if three dimensional). Calculating the total area (volume if three dimensional) of the balls requires summing the areas of each ball, then subtracting out the pairwise intersection areas, since each was counted once for each ball it is inside. Then the intersection area of the three balls must be added back because, although it was added three times initially, it was also subtracted once in each of the three pairwise intersections. In the general case, with n balls, all of which may overlap, intersections of odd numbers of balls are added, and intersections of even numbers of balls subtracted, to calculate the total area or volume.
Figure 8 (circles.jpg)

α-shapes provide a way around this undesirable combinatorial complexity [2], and this issue has been one of the motivating factors for introducing α-shapes. To calculate the volume of a protein, we take the sum of all ball volumes, then subtract only those pairwise intersections for which a corresponding edge exists in the α-complex. Only those three-way intersections for which the corresponding triangle is in the α-complex must then be added back. Finally, only four-way intersections corresponding to tetrahedra in the α-complex need to be subtracted. No higher-order intersections are necessary, and the number of volume calculations necessary corresponds directly to the complexity of the α-complex, which is O(n log n) in the number of atoms.

An example of how this approach works is given on page 4 of the Liang et al. article in the Recommended Reading section below. A proof of correctness and derivation is also provided in the article. Surface area calculations, such as solvent-accessible surface area, which is often used to estimate the strength of interactions between a protein and the solvent molecules surrounding it, are made by a similar use of the α-complex.

Recommended Reading

  • H. Edelsbrunner, D. Kirkpatrick, and R. Seidel. [PDF]. "On the Shape of a Set of Points in the Plane." IEEE Transactions on Information Theory, 29(4):551-559, 1983. This is the original α-shapes paper (caution: the definition of α is different from that used in later papers--it is the negative reciprocal of α as presented above).
  • H. Edelsbrunner and E.P. Mucke. [PDF]. "Three-dimensional Alpha Shapes." Workshop on Volume Visualization, Boston, MA. pp 75-82. 1992. This article shows how to extend α-shapes to three-dimensional point sets.
  • J. Liang, H. Edelsbrunner, P. Fu, P.V. Sudhakar, and S. Subramaniam. [PDF] . Analytical shape computation of macromolecules: I. molecular area and volume through alpha shape. Proteins: Structure, Function, and Genetics, 33:1-17, 1998. This is a paper on using α-shapes to speed up volume and surface area calculations for molecular models.
  • H. Edelsbrunner, M.Facello and Jie Liang. [PDF] . On the definition and the construction of pockets in macromolecules. Discrete and Applied Mathematics, 88:83-102, 1998.

Software

References

  1. de Berg, M. and van Krefeld, M. and Overmars, M. and Schwarzkopf, O. (2000). Computational Geometry: Algorithms and Applications. (Second). Springer.
  2. J. Liang and H. Edelsbrunner and P. Fu and P.V. Sudhakar and S. Subramaniam. (1998). Analytical shape computation of macromolecules: I. molecular area and volume through alpha shape. [http://www3.interscience.wiley.com/cgi-bin/abstract/36315/ABSTRACT]. Proteins: Structure, Function, and Genetics, 33, 1-17.
  3. H. Edelsbrunner and M.Facello and Jie Liang. (1998). On the definition and the construction of pockets in macromolecules. [http://dx.doi.org/10.1016/S0166-218X(98)00067-5]. Discrete and Applied Mathematics, 88, 83-102.
  4. H. Edelsbrunner and E. P. Mücke. (1994). Three-Dimensional Alpha-Shapes. [http://portal.acm.org/citation.cfm?id=156635]. ACM Transaction on Graphics, 13, 43-72.
  5. E. W. Weisstein. (2005). Circle-Circle Intersection. [http://mathworld.wolfram.com/Circle-CircleIntersection.html]. MathWorld.
  6. E. W. Weisstein. (2005). Sphere-Sphere Intersection. [http://mathworld.wolfram.com/Sphere-SphereIntersection.html]. MathWorld.
  7. E. W. Weisstein. (2005). Cayley-Menger Determinant. [http://mathworld.wolfram.com/Cayley-MengerDeterminant.html]. MathWorld.

Content actions

Download module as:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks