Adding Chemical Information to CoMFA Models with Alternative 3D QSAR Fields

Chris L. Waller

Experimental Toxicology Division
National Health and Environmental Effects Research Laboratory
U.S. Environmental Protection Agency
Research Triangle Park, NC 27711

Glen E. Kellogg

Department of Medicinal Chemistry
School of Pharmacy
Virginia Commonwealth University
Richmond, VA 23298-0540



http://www.netsci.org/Science/Compchem/feature10.html

Introduction

Since its introduction several years ago CoMFA (Comparative Molecular Field Analysis) [1] has become one of the most powerful tools for QSAR and drug design. In fact, CoMFA has pioneered a new paradigm of three-dimensional QSAR [2] where the shapes, properties, etc. of molecules are related to specific molecular features (substituents, etc.) and their spatial relationship. Thus molecular modification to improve biological performance based on QSAR can be more rooted in the actual chemistry of the involved molecules. While "standard" CoMFA is extremely powerful in its native form, there are a variety of ways to supplement the information supplied to the model by enhancing the field set. In this review we will describe a handful of approaches that have been applied to add new information fields to CoMFA models.

Standard CoMFA Fields

The standard potential energy fields produced by the out-of-the-box CoMFA program are steric (van der Waals) and electrostatic (Coulombic). The standard CoMFA probe is an sp3 hybridized carbon atom with an effective radius of 1.53 A and a +1.0 charge. The probe atom to ligand atom distance-dependence of the potential functions (i.e., the standard 6-12 of the Lennard-Jones potential and r-square term of the Coulombic potential) result in steep changes as the probe nears the surface of the molecule. It is the convention to truncate steric value at some arbitrary level (on the order of 4.0 to 30.0 kcal/mol) to eliminate points both within the van der Waals shells of molecules and at the periphery of the region so that effectively a shell of points is used. Electrostatic values are also truncated at similar levels and most commonly ignored at points inside the molecules.

New CoMFA Fields

The relatively straightforward nature of the CoMFA paradigm makes it potentially very powerful. While steric and electrostatic properties of molecules are major physicochemical properties related to biological activity, they are purely enthalpic. It is desirable in many cases to characterize additional properties on a three-dimensional basis. Efforts to include entropic properties within a CoMFA framework have been to characterize the hydrophobic nature of molecules. More recently, reactivity-based fields such as those of molecular orbitals have also been imported into CoMFA studies. The type of field to be generated and included in a CoMFA model is limited only by the creativity of the research and the validity of the underlying theory.

It was the intention of the authors that this mini-review of Alternative CoMFA fields be as complete as possible. If we have omitted work in the field, we wish to apologize in advance.

Mechanics

It is, in fact, a rather simple matter to create a field. All that is actually necessary is an atomistic parameter and some mathematical functional form for the distance dependence for that parameter. Then the field is created by summing the effect of all atoms on each grid point in the cage surrounding the molecule. Some form of arbitrary cut-off can be imposed to eliminate or truncate the contribution of grid points that are within van der Waals radii of molecular atoms.

Inclusion of the Hydrophobic Effect

Steric, electronic, and hydrophobic effects are considered to be among the primary forces in ligand:receptor interactions. In classical (Hansch-type) QSAR studies, these three forces are described using scalar descriptors such as Taft (Es), Hammett (sigma), parameters or partition coefficients (logP), respectively. In 3D-QSAR studies, the steric and electronic effects are approximated using molecular shape (Lennard-Jones) and charge distribution (Coulombic) potential energy fields. The missing piece of the puzzle is, of course, a field to represent the hydrophobic "binding" component. Various approaches have been advocated for the description of this effect within 3D-QSAR studies. Among these are empirically-based hydropathic interaction (HINT) field, a large variety of lipophilic potential fields, the molecular mechanically based H2O probe for the description of hydrogen bonds, and the Poisson-Boltzman finite difference approach based calculation of desolvation energy fields.

HINT Fields

One of the earliest efforts is the hydropathic interaction (HINT) technique of Kellogg et al. The HINT formalism is strongly rooted in the C-LogP technique of Hansch and Leo [3]. Beginning with the fragment constants used in the computation of the octanol:water partition coefficient (C-LogP), Abraham and Leo [4] suggested the further deconstruction of these values into atomic contributions to overall molecular hydrophobicity. Wireko, Kellogg and Abraham [5] demonstrated that such values can be calculated and that a hydrophobicity field for a given molecule can be computed. At each grid intersection point, the net sum of the following empirical equation is evaluated over all the atoms for a given molecule:

In 1991 Kellogg, Semus and Abraham [6] used the HINT field in a re-examination of the classic steroid data set with mixed results. While the HINT field contributed significantly to three-field (steric, electrostatic, HINT) models, the additional field did not improve the statistical measures of the model. However, the authors proposed that the additional field adds interpretability to CoMFA models by being easy to understand in chemical (synthetic/drug design) terms. Later studies of ryanodines [7], barbiturates [8], and other systems [9] have confirmed this hypothesis. These fields have also been demonstrated to effectively model experimentally-determined logP values in cases where calculated logP (CLog-P) values fail [10] - positional isomers, etc.

Molecular Lipophilicity Potential (MLP) Fields.

Others, for example Norinder [11] and Altomare et al. [12], have used fields based on Molecular Lipophilicity Potentials (MLP) that were described by Fauchere et al. [13] in 1988. These fields add lipophilic information through the use of atomistic hydrophobic parameters as derived by a variety of researchers [14,15].

H-bonding Fields.

Kim [16] has reported the use of the direction-dependent 6-4 function of the GRID [17] program to generate hydrogen bonding fields as descriptors of the hydrophobic interactions. Specifically, rather than using a raw atom as a molecular probe, Kim uses a neutral H2O "molecule" with an effective radius of 1.7 A. Two hydrogen bond donating and two hydrogen bond accepting properties are assigned to the probe. The probe is allowed to freely rotate about the grid point in order to optimize the interaction as computed using the GRID function. The hydrogen bonding potential energy is computed at each lattice intersection according to the following function:

E{hb} = (C/d[6] - D/d[4]) cos (m{theta})

where C and D are taken from tables and m is the angle described by the trio of donor, hydrogen and acceptor atoms.

This approach has been successfully applied to model the hydrophobic effect of substituents on the aromatic ring of several series of compounds with respect to alterations in pharmacodynamic as well as chemical equilibrium constants. In the three cases presented by Kim [16], the 3D-QSAR models were consistently more statistically robust than the corresponding classical QSAR equations based on {pi} substituent constants.

The GRID-based H2O probe has also been used in conjunction with GRID-based steric (CH3, 1.95 A radius, 0.0 charge) and electrostatic (H+, 0.0 A radius, 1.0 charge) probes to model the receptor binding affinities of a series of benzodiazepines [16]. In this particular case, a significant correlation (r > 90%) was found to exist between the steric and hydrophobic fields. The electrostatic and hydrophobic fields were significantly less collinear (r < 70%) and were included in the model which best described the binding data. The resulting GRID-CoMFA model indicated that hydrophobic fields explained 78% of the variance in the binding data. The electrostatic field accounted for 18%. A standard CoMFA study using standard probes and steric and electrostatic potential functions performed on this same data set yielded qualitatively similar results in that the model based on steric fields alone was found to be the most descriptive of the data. The statistical significance of the GRID-CoMFA model suggests that hydrophobicity information is crucial in this particular case and that the SYBYL-based (Tripos, Inc.) sp3 carbon probe is not sufficient to describe these effects.

Desolvation Energy Fields.

An exercise (some might say, in futility) was performed to compute the desolvation free energy fields as a function of hydrophobicity. This was accomplished using the finite difference approximation method as implemented in the Delphi [19] program (Biosym-MSI). In Delphi, the linearized Poisson-Boltzmann equation is numerically solved to compute the electrostatic contribution to solvation on a regularly-spaced field of points constructed around a given molecule. It is this feature which ideally suits the results of Delphi computation for inclusion into a 3D-QSAR model. Desolvation energy fields are computed as the difference between the solvated (grid dielectric = 80) and in vacuo (grid dielectric = 1) field calculations.

In preliminary studies [20] using inhibitors of angiotensin-converting enzyme (ACE) and thermolysin, the desolvation energy fields did not successfully model the hydrophobicities nor the reported binding affinities of training set molecules. It was interesting, although not totally surprising, that in both of the above studies, the desolvation energy fields were found to be highly collinear with the SYBYL generated Coulombic electrostatic potential fields (r > 90%). The Delphi technique does provide for the generation of mixed desolvated/solvated energy fields. In structure-based 3D-QSAR studies (i.e., where the target is known), it may be possible to compute the energy afforded by partial desolvation of the ligand upon complexation with the target.

Other Potential Functions

Molecular Similarity Indices.

An alternative approach to the computation of molecular potential fields has been described as comparative molecular similarity indices analysis (CoMSIA) by Klebe et al [21]. The form of the distance functions in the standard Lennard-Jones and Coulomb type potentials provides for the generation of unrealistically extreme values as the surface of the molecules under examination is approached. The net result is drastic changes in the shape of the potential functions. Klebe et al. implemented the steric, electrostatic, and hydrophobic similarity indices as utilized in the rigid body molecular alignment program, SEAL [22]. The indices replace the distance functions of the Lennard-Jones and Coulomb-type potentials with Gaussian-type functions (i.e., exp[-{alpha}r2]). The {alpha} term provides for a "local smearing" effect which places more weight on interactions close to the molecular surface with a smooth transition to more distal points. No arbitrary cutoff values need be imposed. The hydrophobic potential used in this case is an atom-based method developed by Viswanadhan et al. [23]. The net result of this field is to lessen the effect of changes in the field descriptors associated with minor variations in molecular superposition or conformation.

The validity of the approach was demonstrated initially with the standard steroid set of Cramer et al. [1] and thermolysin inhibitor set of DePriest et al. [24]. In both cases, the CoMSIA approach yielded similar statistical results. The most practical aspect of this technique with respect to drug design is the generation of more readily interpretable CoMFA-type contour maps. In contrast to the fragmentary nature of standard CoMFA maps, CoMSIA derived maps are contiguous and located closer to the molecular skeletons thus providing a more direct representation of the physicochemical features localized in the design space (i.e., occupied by training set molecules) which are required for bioactivity.

Other Electrostatic Fields

Molecular Orbital Fields.

In certain instances, a simple Coulomb type field may not be adequate to represent the electronic characteristics of molecules. This is illustrated by cases which attempt to model endpoints in which an ionic or charge-transfer reaction is part of the ligand: target interaction. In these cases, the three-dimensional characteristics (i.e., size and localization on/around a molecule) of molecular orbital fields have proven be useful descriptors. As with the other fields generated external to the SYBYL-CoMFA program, it is possible to import these fields into a CoMFA framework. In the case of molecular orbital fields, molecules with their CoMFA alignment referenced are subjected to semiempirical MOPAC single-point (keyword:1SCF) calculations. A selected orbital (i.e., HOMO or LUMO) for a given molecule is then imported into the CoMFA defined region, and the electron density at the lattice intersections in the region is extracted and recorded in the QSAR table as an electrostatic type field.

HOMO fields have been shown to be beneficial for the refinement of 3D-QSAR models for data sets such as the Angiotensin-Converting Enzyme [25] set in order to more completely describe the interaction between the ionized ligand and the metal in the molecular binding domain. More recently, molecular orbital fields have been used in the construction of 3D-QSAR models for molecular reactivity endpoints (i.e., metabolic rate constants) [26].

Roy Vaz [27] has examined the sigma (induction and resonance) constants of amines (NH2-X) with CoMFA fields comprised of total electron density (calculated with AM1/MOPAC5) and obtained excellent one and two component PLS models. Vaz has also looked at OH radical formation rates for substituted phenols and naphthalenes with the electron density fields and obtained meaningful CoMFA models [28].

Electrotopological State Fields

Kellogg, Kier and Hall [29] have recently created a 3-D field from the atomistic Electrotopological State parameter of Kier and Hall [30]. This parameter, which represents a contraction of free valence (electronegativity) along with topological information, is totally non-empirical. The "distance function" for the field decay of the E-State was chosen through multiple CoMFA runs to be inverse r-cubed. The E-State fields provide remarkably good statistical results in PLS, comparing quite favorably (q2 = 0.803, 3 components) to the "standard" CoMFA steric and electrostatic fields.

Conclusions

It should be apparent that the creation of new fields for 3-D QSAR is a fairly straightforward process. One of the major advantages to treating atomistic QSAR parameters as 3-D fields is that with fields, one does not have to restrict investigations to "common atoms". That is, in conventional QSAR, the most valid comparisons involving atomistic parameters are those where the same atom is conserved by all molecules in the data set. This is quite restrictive, as it disallows many compounds that may have different backbones, etc., but that are clearly related chemically to other molecules in the set. However with fields, the data parameters are now the values at grid points. This means that any molecule (accounting for superposition issues, of course) can be included in a model. The addition of new fields to CoMFA models does come with a caveat: having a wide range of potential fields to include in a model does not preclude the user from applying chemical and physical sense to development of the model. For example, if the biological property being modeled clearly does not involve hydrophobicity, the addition of a hydrophobic field will likely add useless information. On the other hand, the palette of fields can give new insight into the drug design/ligand binding process by indicating (through statistical means) the types of properties, parameters, etc., that correlate with binding.

References

  1. Cramer, R.D., Patterson, D.E., Bunce, J. D., J. Am. Chem. Soc. 1988, 110, 5959-5967.
  2. Kubyini, H., ed. 3D QSAR in Drug Design, Theory, Methods, and Applications, ESCOM Science Publishers, B.V., Leiden, 1993.
  3. Hansch, C., Leo, A., Substituent Constants for Correlation Analysis In Chemistry and Biology, Wiley, New York, 1979, pp. 1-379.
  4. Abraham, D.J., Leo, A.J., Proteins: Struct. Funct. Genetics, 1987, 2, 130.
  5. Wireko, F.C., Kellogg, G.E., Abraham, D.J., J. Med. Chem. 1991, 34, 758.
  6. Kellogg, G.E., Semus, S.F., Abraham, D.J., J. Comput-Aided Molec. Design, 1991, 5, 545.
  7. Welch, W., Ahmad, S., Airey, J.A, Gerzon, K., Humerickhouse, R.A., Besch, H.R., Jr., Ruest, L., Deslongchamps, P., Sutko, J.L., Biochem. 1994, 33, 6074.
  8. Nayak, V.R., Kellogg, G.E. Med. Chem. Res., 1994, 3, 491.
  9. Oprea, T.I., Waller, C.L., Marshall, G.R., Drug Design and Discov. 1994, 12, 29.
  10. Waller, C.L., Quant. Struct.-Act. Relat. 1994, 13, 172.
  11. Norinder, U., J. Comput.-Aided Molec. Des. 1991, 5, 419.
  12. Altomare, C., Cellamare, S., Carotti, A., Casini, G., Ferappi, M, Gavuzzo, E., Mazza, F., Carrupt, P.-A., Gaillard, P., Testa, B., J. Med. Chem., 1995, 38, 170.
  13. Fauchere, J.L., Quarendon, P., Kaetterer, L. J. Mol. Graph., 1988, 6, 202.
  14. Ghose, A.K., Crippen, G.M., J. Comp. Chem. 1986, 7, 565.
  15. Broto, P., Moreau, G., Vandycke, C., Eur. J. Med. Chem. 1984, 19, 71.
  16. Kim, K. H., Quant. Struct.- Act. Relat. 1993, 12, 232-238.
  17. Goodford, P.J., J. Med. Chem. 1985, 28, 849-858.
  18. Kim, K.H., Greco, G., Novellino, E., Silipo, C., Vittoria, A., J. Comput.-Aided Molec. Des. 1993, 7, 263-280.
  19. Gilson, M., Honig, B., Nature 1987, 330, 84-86.
  20. Waller, C.L., unpublished results.
  21. Klebe, G., Abraham, U., Mietzner, T., J. Med. Chem. 1994, 37, 4130-4146.
  22. Kearsley, S.K., Smith, G.M., Tet. Comp. Met 1990, 3, 615-633.
  23. Viswanadhan, V.N., Ghose. A.K., Revankar, G.R., Robbins, R.K., J. Chem. Inf. Comput. Sci. 1989. 29, 163-172.
  24. DePriest, S.A., Mayer, D. Naylor, C.B., Marshall, G.R., J. Am. Chem. Soc. 1993, 115, 5372-5384.
  25. Waller, C.L., Marshall, G.R., J. Med. Chem. 1993, 36, 2390-2403.
  26. Waller, C.L., Evans, M.V., McKinney, J.D., Drug Metab. Disp., 1995, in press.
  27. Vaz, R., J. Am. Chem. Soc., submitted for publication.
  28. Vaz, R., unpublished results.
  29. Kellogg, G.E., Kier, L.B., Hall, L.H., J. Comput-Aided Molec. Design, submitted for publication.
  30. Kier, L.B., Hall, L.H., in Advances in Drug Design, Vol. 22, Testa, B., ed., Academic, Press, 1992.


NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:

Network Science Corporation
4411 Connecticut Avenue NW, STE 514
Washington, DC 20008
Tel: (828) 817-9811
E-mail: TheEditors@netsci.org
Website Hosted by Total Choice