# Connexions

You are here: Home » Content » Molecular Shapes and Surfaces

### Lenses

What is a lens?

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

#### Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
• Rice Digital Scholarship

This module is included in aLens by: Digital Scholarship at Rice UniversityAs a part of collection: "Geometric Methods in Structural Computational Biology"

Click the "Rice Digital Scholarship" link to see all content affiliated with them.

#### Also in these lenses

• eScience, eResearch and Computational Problem Solving

This module is included inLens: eScience, eResearch and Computational Problem Solving
By: Jan E. OdegardAs a part of collection: "Geometric Methods in Structural Computational Biology"

Click the "eScience, eResearch and Computational Problem Solving" link to see all content selected in this lens.

### Recently Viewed

This feature requires Javascript to be enabled.

# Molecular Shapes and Surfaces

Module by: Lydia E. Kavraki. E-mail the author

Summary: This module introduces students to a family of algorithms for assessing molecular shape, volume, surface area, and negative space (i.e., pockets and cavities).

## Introduction

Many problems in structural biology, require a researcher to understand the shape of a protein. At first glance, this may seem obvious. By opening a molecular visualizer, one can easily see the shape of a protein. But what about calculating the surface area or volume of the protein? What about performing analyses of the surface, such as looking for concave pockets in a protein that might be binding sites for other molecules? What about calculating the volume and shape of those empty binding pockets, in order to find molecules that might fit in them? What about determining whether a particular small molecule can fit in a binding pocket?

All of these problems require some formal notion of the shape of a protein. A protein structure file usually provides no more information than a list of atom locations in space and their types. It will be assumed that for any given application, a radius may be defined for each atom type. This leads to the space filling representation of a protein, in which each atom is treated as an impenetrable sphere.

This representation allows for visualization, but it brings us no closer to being able to computationally decide which parts of which atoms are on the surface of the protein and which are buried inside the structure. Some additional tool is needed to capture notions of interior and exterior and spatial adjacency.

## Representing Shape

Using the sphere model for atoms, one way to define the shape of a molecule is as the union of (possibly overlapping) balls in R 3 R 3 .

Since proteins inside our cells are in an aqueous environment, considering a protein's interactions with solvent molecules, particularly water, is very important for appropriately modeling them. Recall that one of the phenomena that determines the structure of a protein is the hydrophobic effect: some amino acid residues are stabilized by the presence of water, and others are repelled. The extent of the interaction of a protein with the surrounding water depends on the surface area of the protein that can be reached by water molecules. Therefore, quantitave modeling of the strength of interaction with solvent often involves computing the solvent accessible surface area (SASA). Computing SASA can be done by regarding each solvent molecule as a sphere of set radius. This is of course a simplification, since water molecules are not spherical. When this sphere rolls about the molecule, its center delineates the SASA. One can think of the SASA of a molecule as the result of growing each atom sphere by the radius of the solvent sphere. Instead, by taking what is swept out by the front of the solvent sphere, we obtain the molecular surface (MS) model of the molecule. Alternatively, the MS can be obtained by removing a layer of solvent radius depth from the SASA model. The surface determined by SASA analysis depends on the size of a typical solvent molecule. The larger the solvent, the less contoured the resulting surface will appear, because a larger probe molecules would not be able to fit into some of the interatomic spaces that a smaller one would.

## Alpha-Shapes

Part of the problem with defining the shape of a protein is that we start with nothing but a point set, and the "shape" of a set of discontinuous points is poorly defined. The problem is, what do we mean by shape? As you saw above, the shape of a molecule depends on what is being used to measure it. To handle this ambiguity, we will introduce a method of shape calculation based on a parameter, α, which will determine the radius of a spherical probe that will define the surface. The method defines a class of shapes, called α-shapes [4] for any given point set. It allows fast, accurate, and efficient calculations of volume and surface area.

α-shapes are a generalization of the convex hull. Consider a point set S. Define an α-ball as a sphere of radius α. An α-ball is empty if it contains no points in S. For any α between zero and infinity, the α-hull of S is the complement of the union of all empty α-balls.

• For α of infinity, the α-shape is the convex hull of S.
• For α smaller than the 1/2 smallest distance between two points in S, the α-shape is S itself.
• For any α in between, one can think of the α-hull as the largest polygon (polyhedron) or set thereof whose vertices are in the point set and whose edges are of length less than 2α. The presence of an edge indicates that a probe of radius α cannot pass between the edge endpoints.

### Computing the Alpha-Shape: Delaunay Triangulation

A triangulation of a three-dimensional point set S is any decomposition of S into non-intersecting tetrahedra (triangles for two-dimensional point sets). The Delaunay triangulation of S is the unique triangulation of S satisfying the additional requirement that no sphere circumscribing a tetrahedron in the triangulation contains any point in S. Although it is incidental to α-shapes, it is worth noting that the Delaunay triangulation maximizes the average of the smallest angle over all triangles. In other words, it favors relatively even-sided triangles over sharp and stretched ones.

The Delaunay triangulation of a point set is usually calculated by an incremental flip algorithm as follows:
• The points of S are sorted on one coordinate (x, y, or z). This step is not strictly necessary but makes the algorithm run faster than if the points were in arbitrary order.
• Each point is added in sorted order. Upon adding a point:
• The point is connected to previously added points that are "visible" to it, that is, to points to which it can be connected by a line segment without passing through a face of a tetrahedron.
• Any new tetrhedra formed are checked and flipped if necessary.
• Any tetrahedra adjacent to flipped tetrahedra are checked and flipped. This continues until further flipping is unnecessary, which is guaranteed to occur
This algorithm runs in worst case  O(n^2) time, but expected  O(n^(3/2)) time. Without the sort in the first step, the expected case would be O(n log n). A full description and analysis of Delaunay triangulation algorithms is given in [1], chapter 9.

From the Delaunay triangulation the α-shape is computed by removing all edges, triangles, and tetrahedra that have circumscribing spheres with radius greater than α. Formally, the α-complex is the part of the Delaunay triangulation that remains after removing edges longer than α. The α-shape is the boundary of the α-complex.

Pockets [3] can be detected by comparing the α-shape to the whole Delauney triangulation. Missing tetrahedra represent indentations, concavity, and generally negative space in the overall volume occupied by the protein. Particularly large or deep pockets may indicate a substrate binding site.

### Weighted Alpha Shapes

Regular α-shapes can be extended to deal with varying weights (i.e., spheres with different radii, such as different types of atoms) [2]. The formal definitions become complicated, but the key idea is to use a pseudo distance measure that uses the weights. Suppose we have two atoms at positions p1 and p2 with weights w1 and w2. Then the pseudo distance is defined as the square of the Euclidean distance minus the weights. The pseudo distance is zero if and only if two spheres centered at p1 and p2 with radii equal to  sqrt(w1) and  sqrt(w2) are just touching.

## Calculating Molecular Volume Using α-Shapes

The volume of a molecule can be approximated using the space-filling model, in which each atom is modeled as a ball whose radius is α, where α is selected depending on the model being used: Van der Waals surface, molecular surface, solvent accessible surface, etc. Unfortunately, calculating the volume is not as simple as taking the sum of the ball volumes because they may overlap. Calculating the volume of a complex of overlapping balls is non-trivial because of the overlaps. If two spheres overlap, the volume is the sum of the volumes of the spheres minus the volume of the overlap, which was counted twice. If three overlap, the volume is the sum of the ball volumes, minus the volume of each pairwise overlap, plus the volume of the three-way overlap, which was subtracted one too many times in accounting for the pairwise overlaps. In the general case, all pairwise, three-way, four-way and so on to n-way intersections (assuming there are n atoms) must be considered. Proteins generally have thousands or tens of thousands of atoms, so the general n-way case may be computationally expensive and may introduce numerical error.

α-shapes provide a way around this undesirable combinatorial complexity [2], and this issue has been one of the motivating factors for introducing α-shapes. To calculate the volume of a protein, we take the sum of all ball volumes, then subtract only those pairwise intersections for which a corresponding edge exists in the α-complex. Only those three-way intersections for which the corresponding triangle is in the α-complex must then be added back. Finally, only four-way intersections corresponding to tetrahedra in the α-complex need to be subtracted. No higher-order intersections are necessary, and the number of volume calculations necessary corresponds directly to the complexity of the α-complex, which is O(n log n) in the number of atoms.

An example of how this approach works is given on page 4 of the Liang et al. article in the Recommended Reading section below. A proof of correctness and derivation is also provided in the article. Surface area calculations, such as solvent-accessible surface area, which is often used to estimate the strength of interactions between a protein and the solvent molecules surrounding it, are made by a similar use of the α-complex.

• H. Edelsbrunner, D. Kirkpatrick, and R. Seidel. [PDF]. "On the Shape of a Set of Points in the Plane." IEEE Transactions on Information Theory, 29(4):551-559, 1983. This is the original α-shapes paper (caution: the definition of α is different from that used in later papers--it is the negative reciprocal of α as presented above).
• H. Edelsbrunner and E.P. Mucke. [PDF]. "Three-dimensional Alpha Shapes." Workshop on Volume Visualization, Boston, MA. pp 75-82. 1992. This article shows how to extend α-shapes to three-dimensional point sets.
• J. Liang, H. Edelsbrunner, P. Fu, P.V. Sudhakar, and S. Subramaniam. [PDF] . Analytical shape computation of macromolecules: I. molecular area and volume through alpha shape. Proteins: Structure, Function, and Genetics, 33:1-17, 1998. This is a paper on using α-shapes to speed up volume and surface area calculations for molecular models.
• H. Edelsbrunner, M.Facello and Jie Liang. [PDF] . On the definition and the construction of pockets in macromolecules. Discrete and Applied Mathematics, 88:83-102, 1998.

## References

1. de Berg, M. and van Krefeld, M. and Overmars, M. and Schwarzkopf, O. (2000). Computational Geometry: Algorithms and Applications. (Second). Springer.
2. J. Liang and H. Edelsbrunner and P. Fu and P.V. Sudhakar and S. Subramaniam. (1998). Analytical shape computation of macromolecules: I. molecular area and volume through alpha shape. [http://www3.interscience.wiley.com/cgi-bin/abstract/36315/ABSTRACT]. Proteins: Structure, Function, and Genetics, 33, 1-17.
3. H. Edelsbrunner and M.Facello and Jie Liang. (1998). On the definition and the construction of pockets in macromolecules. [http://dx.doi.org/10.1016/S0166-218X(98)00067-5]. Discrete and Applied Mathematics, 88, 83-102.
4. H. Edelsbrunner and E. P. Mücke. (1994). Three-Dimensional Alpha-Shapes. [http://portal.acm.org/citation.cfm?id=156635]. ACM Transaction on Graphics, 13, 43-72.
5. E. W. Weisstein. (2005). Circle-Circle Intersection. [http://mathworld.wolfram.com/Circle-CircleIntersection.html]. MathWorld.
6. E. W. Weisstein. (2005). Sphere-Sphere Intersection. [http://mathworld.wolfram.com/Sphere-SphereIntersection.html]. MathWorld.
7. E. W. Weisstein. (2005). Cayley-Menger Determinant. [http://mathworld.wolfram.com/Cayley-MengerDeterminant.html]. MathWorld.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks