5. Solvent Accessible Surfaces
Michael L. Connolly
A major advance in the study of protein surfaces was initiated by Fred Richards' group at Yale University. His group included: B.K. Lee, who moved to the University of Kansas and is now at the NIH, Tim Richmond, who moved to the MRC, and who is now at the ETH in ZŸrich, and Jonathan Greer, who moved to Columbia, and is now at Abbott Laboratories. Fred Richards has remained at Yale, in the Department of Molecular Biophysics and Biochemistry and at the Center for Structural Biology.
The motivation was the study of the protein-folding problem (Anfinsen, 1973) and hydrophobicity (Tanford, 1980). The model of a folded protein with the hydrophobic amino acid side chains in the interior forming an oil drop (Kauzmann, 1959) implied that as the protein folded the hydrophobic side chains were preferentially buried away from the external solvent. In order to quantitate hydrophobic burial (Chothia, 1974), B.K. Lee and Fred Richards introduced the solvent-accessible surface (Lee and Richards, 1971). The accessible surface is traced out by the probe sphere center as it rolls over the protein. It is a kind of expanded van der Waals surface. If you increase each atom's van der Waals radius by the probe radius, you get so-called expandedatom radii. The union of the expanded atoms is what Tim Richmond (1984) calls the solvent-excluded volume. It is the region enclosed by the accessible surface.
Lee and Richards (1971) computed the accessible areas of each atom in both the folded and extended state, and found that the decrease in accessible area in going from the unfolded to the folded state is greater for hydrophobic atoms than it is for hydrophilic atoms.
These ideas were further refined by (Richmond and Richards, 1978), when they introduced the contact surface. The contact surface is the part of the van der Waals surface that can be touched by a water-sized probe sphere.
Soon afterwards, Richards introduced the reentrant surface, which together with the contact surface form the "molecular surface" (Richards, 1977). The molecular surface is the surface traced by the inward-facing surface of the probe sphere. The reentrant surface consists of the inward-facing part of the probe sphere when it is in contact with more than one atom. In the diagram below the contact surface is in green and the reentrant surface is in blue.
In my own work, I use the term solvent-excluded volume to mean the volume enclosed by the molecular surface. It is the volume that the probe sphere is excluded from. In the diagram above the magenta region at the left is the solvent-excluded volume. On the right the van der Waals volume is in red and the interstitial volume is in magenta. The solvent-excluded volume = the van der Waals volume + the interstitial volume.
The molecular surface was first computed by Jonathan Greer and Bruce Bush (1978), after Greer moved from Yale to Columbia. (Bruce Bush later moved to Merck in New Jersey). Greer and Bush (1978) applied this method to the inter-subunit interface of hemoglobin, where they were able to both visualize the protein surfaces involved in the interface, and quantitate the amount of void volume between the two surfaces. The diagram below illustrates their molecular surface method. Probe spheres are rained down upon the atoms from above, stopping just before a collision (van der Waals overlap) would occur. A finer set of lines is used for the next step. The lowest intersection point between each of these finer lines and the bottom of the lowest sphere defines a point of the molecular surface. In three dimensions this produces a grid. The method works well for fairly flat surfaces, but cannot handle irregular topographies with overhangs.
Further diagrams of solvent-accessible surfaces can be found at the University of Leeds (Ligand Design Software), and the University of York (An Introduction To Molecular Surfaces).
The next advance in molecular surface computation occurred at the University of California at San Francisco Computer Graphics Laboratory. Robert Langridge had recently (1976) moved there from Princeton, and with his NIH Research Resource grant had set up a molecular modeling system consisting of a Digital PDP 11/70 minicomputer running the UNIX operating system and an Evans & Sutherland Picture System 2. His system manager Tom Ferrin had translated the proprietary Evans and Sutherland graphics library from assembly language into C (Ferrin and Langridge, 1980). Historically, there had been a close relationship between the Evans and Sutherland graphics systems and the Digital (DEC) minicomputers. The Digital operating systems run on the minicomputers in these combos were not as conducive to rapid software development as the platform-independent UNIX operating system developed by AT&T. Also, with the Fortran-callable E&S routines replaced by C routines, it was possible to develop interactive graphics software in the superior C language. From Princeton, Robert Langridge brought with him a graphics programmer, Martin Pensak, and Martin was joined by a local student, Conrad Huang, and also a former graphics programmer from the Yale group, Oliver Jones. Pensak and Huang wrote a molecular modeling program in C called MMS (Molecular Modelling System). This work was initiated in conjunction with Steve Dempsey and Joe Kraut of the U.C.S.D. Chemistry Department, but the two groups soon went their separate ways, with the San Diego group concentrating on crystallographic applications, and the Langridge group, which was located in the School of Pharmacy, concentrating on structure-based drug design. The MMS program was renamed MIDS in order to avoid a name conflict with the MMS-X system of Washington University in St. Louis. It was redesigned by Tom Ferrin , recoded by Conrad Huang and renamed MIDAS during the early eighties (Huang, Jarvis, Ferrin and Langridge, 1982; Ferrin and Langridge, 1986; Ferrin, 1987; Ferrin, Huang, Jarvis and Langridge, 1988). The most recent release is called MidasPlus.
In 1978 Ollie Jones coded the Greer & Bush (1978) molecular-surface-net algorithm and showed the output to me (Mike Connolly) using his general purpose display program BILD (from the German for picture). The next year, Ollie Jones moved to Chicago to do cartographic work, and I lost interest in molecular surfaces until they were again brought to my attention by Howard Schachman, in his Biochemistry 206 (Physical Chemistry of Proteins) course at U.C. Berkeley in the fall of 1979. Schachman handed out photocopies of the surface diagrams in Fred Richards 1977 article. At that time I was a graduate student of Irwin D. (Tack) Kuntz, working on the protein-protein docking problem. I was trying to dock proteins together based upon complementary surface curvature and needed a good protein surface representation. I tried to apply the surface-net algorithm, but ran into two limitations of the method: (a) it gave a surface for only one side of the protein, and (b) it did not represent the surface under overhangs. I tried turning Ollie's program into a subroutine that would generate many small nets over the protein surface, but could not think of a good way to glue the nets together to form a continuous surface. I kept increasing the number of nets, and decreasing their size, until eventually I was dropping nets for each atom of the molecule, from the six (±x, ±y, ±z) coordinate-axis directions.
At that point (early December, 1979) it occurred me to simply place the probe tangent to each atom of the protein at six positions: ±x, ±y, ±z, recording a contact point if there were no collisions. Then it occurred to me to place the probe tangent at more positions (independently discovering the Shrake and Rupley (1973) algorithm), and then tangent to two and three atoms simultaneously, in accordance with the definition of reentrant surface (Richards, 1977). When the probe sphere is tangent to three atoms, the inward-facing triangle defined by the three points of contact defines a concave patch of reentrant surface. When the probe sphere rolls around a pair of adjacent atoms, the inward-facing arc connecting the two points of contact traces out a saddle-shaped toroidal patch of reentrant surface. The concave and toroidal patches are represented by points. This method solves the overhang problem, but there is still no information on connecting the small surface patches together. There is only a set of discrete surface points in three-dimensional space. A surface normal vector, pointing towards the probe center, was added to each point, for docking purposes. An area, in square angstroms, was added to each point. These areas were not accurate, and were sensitive to the spacing between the points. The next step, taken in late December, 1979, was to deal with self-intersections in the surface by removing surface points of one probe sphere that lay inside an opposing sphere.
The MS program was written in RATFOR (Rational Fortran), a language developed at Bell Labs in the UNIX group (Kernigan & Plauger, 1976). The MS program wrote out an ascii file of dots that could be read by Ollie Jone's BILD program and displayed on the Evans and Sutherland (E&S) Picture System 2. In 1980 the U.C.S.F. acquired the new color calligraphic monitor recently developed by E&S. The MMS/MIDS and BILD programs were modified to handle color , and it became possible to show interfacing molecules in different colors (Langridge, Ferrin, Kuntz and Connolly, 1981). One of the advantages of the molecular surface over the accessible surface or the van der Waals surface is its ability to visualize the shape complementarity at interfaces:
Having read an early protein-protein docking paper by Wodak and Janin (1978a), I studied the bpti/trypsin interface. Tom Ferrin, Tack Kuntz and I made a 16mm film of this interface, which was widely shown. Kuntz and I also studied protein packing defects or cavities (Connolly, 1981a), which were better visualized by the molecular surface than with the original Lee & Richards' accessible surface (Lee & Richards, 1971). Interior cavities were identified by an automatic algorithm that clustered together nearby surface points. Some packing defects were connected to the external surface by narrow tunnels. Not being able to algorithmically define the limits of such invaginations, I decided that it would be necessary to select which points belonged to an invagination by hand. This Handle program (Connolly, 1981a) never got to the point of actually being able to interactively select points, but it continued to be used at U.C.S.F. after I completed my degree and left, because of its ability to display huge numbers of surface points.
At U.C.S.F. Jeff Blaney applied MS to model the binding of thyroxine to prealbumin (Blaney, Jorgensen, Connolly, Ferrin, Langridge, Oatley, Burridge and Blake, 1982), and Paul Weiner computed the electrostatic potential at the probe center associated with each surface point and colored the points accordingly (Weiner, Langridge, Blaney, Schaeffer and Kollman, 1982).
Before leaving U.C.S.F., I rewrote MS in Fortran 77. This was not done by running the original Ratfor code through the Ratfor pre-processor, but rather by hand. The purpose of this rewrite was to make the program more portable, since at the time (1981), the operating system of choice for DEC's VAX-11/780 was VMS, not UNIX. The Fortran 77 version of MS was tested at the National Resource for Computation in Chemistry at Lawrence Berkeley Laboratory, with the help of Art Olson and T.J. O'Donnell, using GRAMPS (O'Donnell and Olson, 1981). It was then mailed to the Quantum Chemistry Program Exchange, where it became program #429 (Connolly, 1981b).
Solvent-accessible and molecular surface areas have been computed by many methods. While the original Lee & Richards' method computed areas by multiplying accessible arc lengths by the spacing between the planes, the Shrake and Rupley (1973) method placed 92 points on the expanded atomic sphere and determined which points were accessible to solvent (i.e., not inside any other expanded sphere). The first application of solvent-accessibility to nucleic acids was made by Alden and Kim (1979). The areas computed depend not only upon the method, but also upon the van der Waals radii used. For my own work, I have taken nucleic acid radii from Alden and Kim (1979) and protein atom radii from McCammon, Wolynes and Karplus (1979). The areas computed by means of surface points, even if closely spaced, are not very accurate. In order to attack this problem, the GEPOL program has been developed (Pascual-Ahuir, Silla, Tomasi and Bonaccorsi, 1987; Pascual-Ahuir and Silla, 1990; Silla, Villar, Nilsson, Pascual-Ahuir and Tapia, 1990; Silla, Tu–—n and Pascual-Ahuir, 1991; Pascual-Ahuir, Silla, Tu–—n, 1994). In order to combat the slowness of numerical surface area methods, an analytical approximation to the accessible surface areas has been developed (Wodak and Janin, 1980). Another approach to increasing computational efficiency has been to vectorize the calculation (Wang and Levinthal, 1991). Le Grand and Merz (1993) have developed a rapid approximation to molecular surface area extending the Shrake & Rupley method and using look-up tables. A recent area method for personal computers has been described in both hardcopy (Pacios, 1994) and online form (Pacios, 1995) in the Journal of Molecular Modeling.
The term buried surface area has two related meanings: (a) the surface buried away from solvent when the protein folds, and (b) the surface buried away from solvent when two proteins or subunits associate to form a complex. The thermodynamic importance of buried hydrophobic surface area has been investigated by Chothia (1974, 1976) and the surface buried in interfaces has also been investigated (Chothia and Janin, 1975; Lesk, Janin, Wodak and Chothia, 1985). Recently Pattabiraman, Ward and Fleming (1995) have introduced something they call the occluded surface.
Faster (than MS) dot surface algorithms have been developed by Pearl and Honegger (1983), and (Moon and Howe, 1989). The latter algorithm has been re-coded by Silicon Graphics as an Explorer module and is called ATOMICSURF.
Spline curves passing through the molecular surface points have been computed by Colloc'h and Mornon (1988; 1990). Perrot and Maigret (1990) have developed a program MSEED that rolls the probe sphere continuously over the outer surface of the protein. It gains considerably in speed over algorithms that try to place the sphere tangent to all triples of neighboring spheres, at the expense of missing interior cavities. However, for most applications, the interior cavity surfaces are not needed.
While a numerical method samples the surface at a finite number of discrete points, cubes, triangles or plane curves, an analytical method describes the surface as a collection of pieces of spheres, each defined by the center, radius, and arcs forming the boundary. For the reentrant surface, pieces of tori are also included. The analytical surface may be either slower or faster to compute than a numerical surface, depending on the fineness of the numerical surface. An advantage of an analytical method is that atomic accessible areas and the solvent-excluded volume are represented as formulas that can be differentiated. A number of researchers have computed area and volume derivatives: (Richmond, 1984; Perrot, Cheng, Gibson, Vila, Palmer, Nayeem, Maigret and Scheraga, 1992; Gogonea and Osawa, 1994b; Sridharan, Nicholls and Sharp, 1994).
My early analytical molecular surface algorithm suffered from an inability to deal with self-intersecting surfaces (Connolly,1983a, 1983b), a problem that I was able to solve only partially in later work (Connolly, 1985). Even the rewriting of my molecular surface software in C (Connolly, 1993) has not eliminated these problems. Better methods for dealing with self-intersecting surfaces, cusps and singularities have been developed by Michel Sanner and colleagues (Sanner, 1992; Sanner, Olson and Spehner, 1995; Sanner, Olson and Spehner, 1996) and by Gogonea and Osawa (1994a). Another robustness problem has been dealing with the situation where the probe sphere is simultaneously tangent to four atoms. This situation has been successfully dealt with by Eisenhaber and Argos (1993). Fast and parallel methods for computing molecular surfaces have been developed at UNC (Varshney and Brooks, 1993; Varshney, Brooks and Wright, 1994; Varshney, Brooks, Richardson, Wright and Manocha, 1995). The UNC Computer Science department has a history of parallel computation in their Pixel-Planes project.
Modern interactive graphics systems have the ability to rotate and render polyhedral surfaces in real time. Therefore, the most practical molecular surface representation is the polyhedral molecular surface (Connolly, 1985; Zauhar and Morgan, 1990; Weber, Morgantini, Fluekiger and Roch, 1989; Weber and Morgantini, 1990; Weber, Fluekiger and Field, 1990; Zhexin, Yunyu, Yingwu, 1995). A recent triangulation algorithm, SMART, has curved triangles that lie on the actual spherical-toroidal molecular surface (Zauhar, 1995). Recent polyhedral surface generators have used algorithms based upon a cubical grid (Lorensen and Cline, 1987; Heiden, Goetze and Brickmann, 1993; Eisenhaber, Lijnzaad, Argos, Sander and Scharf, 1995). The Darmstadt group has displayed their surfaces in conjunction with a modeling program called MolCad. You and Bashford (1994) have developed an algorithm for identifying which points on a cubical grid lie inside the protein's solvent-excluded volume.
There are alternative ways to smooth a molecular surface besides rolling a probe sphere over it (Agishtein, 1992). Blinn (1982) used Gaussian densities to blend the atoms together. The results of this work can be seen in the DNA segment on Carl Sagan's Cosmos public television series. Purvis and Culberson (1986) also used Gaussians, but with electrostatic coloring.
[ 1. Introduction ] [ 2. Physical Molecular Models ] [ 3. Electron Density Fitting ] [ 4. Molecular Graphics ] [ *** 5. Solvent-Accessible Surfaces *** ] [ 6. Molecular Surface Graphics ] [ 7. Molecular Volume and Protein Packing ] [ 8. Shapes of Small Molecules and Proteins ] [ 9. Structure-based Drug Design ] [ 10. Protein-Protein Interactions ] [ 11. Surface Biology, Chemistry and Physics ] [ 12. Bibliography ]
All material in ths article Copyright © 1996 by Michael L. Connolly
NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:
- Network Science Corporation
- 4411 Connecticut Avenue NW, STE 514
- Washington, DC 20008
- Tel: (828) 817-9811
- E-mail: TheEditors@netsci.org
- Website Hosted by Total Choice