CoMFA: A field of dreams?
Simon F. Semus
Astra Arcus USA
P.O. Box 20890
Rochester, New York 14602
http://www.netsci.org/Science/Compchem/feature11.html
Introduction
March, 1988 and baseball season was almost upon us. Perhaps the greatest culture shock I received after arriving here from the Sceptred Isle was that I experienced any shock at all. I had imagined that speaking almost the same language and having grown up on Columbo, Starsky and Hutch, Dynasty and Dallas that I knew all there was to know about the United States.
March, 1988 and baseball season was almost upon us. With just my suitcase of clothing, $500 to my name and a position at the Medical College of Virginia, I had assumed that I would have little trouble in settling in. I planned to rent an apartment, obtain a credit card and purchase a car shortly after my arrival. What I hadn't bargained for was that without a social security number and any previous credit history the American dream was slightly more elusive than I had anticipated.
March, 1988 and baseball season was almost upon us. I have always considered it strange that in a country so large, the Confederate capital at Richmond was a mere one hundred miles from Washington D.C. Despite the passage of 130 years, the Confederacy still appears to be alive and well. In a radio poll conducted to find the most popular president of all time the winner was Jefferson Davis!
March, 1988 and baseball season was almost upon us. Had the Civil War gone the other way Richmond would have major league sports and touring artists would stop in the city. The mecca is 100 miles north on route I95 and so the craving is satisfied by following the Redskins and Orioles.
March, 1988 and baseball season was almost upon us. As a newcomer to these shores I was unfamiliar with the difference between Orioles and Oreos. I had often wondered why they named a baseball team after a biscuit. Then I saw the Baltimore Orioles break the longest losing streak in MLB history and I knew the answer.
"Build it and he will come"
A baseball stadium is erected in the middle of an Iowa cornfield and he came. The field that was built for a particular purpose achieves a different goal. The search that begun for one result finally arrives at another conclusion. Where the first may be predictable, the actual end point is a surprise and for that is much more rewarding.
And so it is with Comparative Molecular Field Analysis (CoMFA)[1]. Sometimes!
![]()
The CoMFA methodology is a 3D quantitative structure-activity relationship (QSAR) technique which ultimately allows one to design and predict activities of molecules. The database of molecules with known properties, the training set, are suitably aligned in 3D space according to various methodologies. Superimposition techniques include those that maximize the steric overlap[2], those that are based upon crystallographic data[3], those based upon a pharmacophore theory[4],[5], those employing a steric and electrostatic alignment algorithm[6], those based upon automated field fit methods[7] and those utilizing pharmacophore mapping programs such as DISCO[8]. Having arrived at whatever one considers to be the alignment of choice, charges are then calculated for each molecule at a level of theory deemed appropriate. Now one can construct a field. Steric and electrostatic fields are calculated for each molecule by interaction with a probe atom at a series of grid points surrounding the aligned database in 3D space. One then attempts to correlate these field energy terms with a property of interest by the use of partial least squares (PLS) with cross-validation, giving a measure of the predictive power of the model. So much for the crude description of the technique. The question is; does it work and do the results mean anything? Perhaps that is a little harsh. Maybe the real question we should be asking is whether this method provides us with information that was not available or apparent by other techniques.
Whilst I was on the faculty at the Medical College of Virginia, teaching graduate courses in molecular modeling, one of the most frequently debated topics was whether CoMFA told you anything more than one may have gathered by simple examination of the data. In other words, can a medicinal chemist using his knowledge and intuition make "ball park" predictions of activity for designed compounds, based on the structure and biological data of a series of analogues, and will they be similar to those generated by CoMFA? In a congeneric series where steric factors predominate the answer is obviously that one can. The literature is replete with examples where the sole benefit of CoMFA was the confirmation of a previously determined model[4, 9-14]. Can we look beyond this limited information and reveal more significant insights in our model?
By the very nature of the technique, the most crucial step in this 3D approach is the relative orientation of the test molecules in space. That is, the chosen alignment of the compounds in the training set is going to have the most profound impact on the predictive ability of the model. We have already noted the plethora of methods available to us for structural superimposition. From a purely drug design perspective, in those instances where one has no knowledge of the three-dimensional shape of the receptor, a set of alignment rules based upon a pharmacophore hypothesis will probably be the most valuable. Indeed, perhaps the principle value of this methodology is the evaluation of such alignments based upon the predictive power of the derived model. One may conclude that the greater the predictive power of the model the more the alignment reflects the bioactive conformation of the molecules. As a simple example of such a phenomena, we were interested in a series of oxime analogues as cholinergic agonists at the m1 receptor sub-type for the treatment of Alzheimer's disease. The compounds concerned were active as both their syn and anti isomers, although there was not a strict parallel in their respective SAR's. The molecules were originally modeled with their oxime substituted side-chains orientated in opposite positions in space. That is to say that the syn oximes had their side-chains aligned to one side of the molecule and the anti had theirs placed on the other side:

Interestingly, when all the analogues were combined into a single CoMFA model no relationship could be found between the structure and biological activity. The predictive value, or cross-validated r-square, was a negative number for this model, indicating that any predictions made based on this alignment was no better than simply taking the average of all the compounds biological activities. However, if one performed CoMFA on the isomeric pairs individually, a good predictive model was found for one set of oximes, but not for the other. This led us to the conclusion that instead of the oxime side-chains reaching out into opposite points in 3D space, they were probably aligned to a common site. Re-orientation of the side-chains in such a manner provided a CoMFA model with high predictive power, thus validating our hypothesis.
The aforementioned example illustrates the utility of CoMFA in prioritizing alternative alignments of the dataset. Can this type of limited model be extended to a non-congeneric series of molecules? A large number of research groups have clearly shown that it can[2, 3, 15-18]. A number of years ago, we demonstrated that CoMFA could be used to predict the biological activity of a series of classical and non-classical cannabinoids[19]. Although their structures are not dramatically different and the derivation of the non-classical series is perhaps obvious, a common pharmacophore hypothesis and an embracing QSAR had not been previously demonstrated.

The principal psychoactive constituent of cannabis, delta-9-tetrahydrocannabinol (delta-9-THC), is representative of the classical cannabinoids. CP55,940, a product of the Pfizer research group, is the prototypical non-classical cannabinoid. The compounds were first described in the mid 1980's and were being developed as analgesics with the hope of avoiding the detrimental side-effects of the opiates. Cannabinoids are currently being employed as anti-emetics in chemotherapy and as appetite stimulants for AIDS patients. Recent interest has focused on their potential as neuroprotective agents and in the treatment of multiple sclerosis. However, despite the beneficial medical effects of these compounds, their use has been impaired by the inability to separate out the hallucinogenic properties of these molecules. We were able to develop a CoMFA model that not only encompassed a wide variety of structural variants, but also demonstrated a strong relationship between structure, binding and intrinsic activity in four in vivo animal assays. This experiment was a very clear demonstration of the utility of the technique in not only confirming a pharmacophore hypothesis, but also correlating an in vitro binding assay with whole animal behavioral models.
Following our earlier successful endeavors with CoMFA, we concentrated much of our effort in the importation of a third hydropathic field into the model. My colleagues, Don Abraham and Glen Kellogg, at the Medical College of Virginia, had previously developed the program HINT for the study of hydrophobic/hydrophilic interactions[20]. In simple terms, the program takes the clogP fragment constants of Hansch and Leo and breaks them down to atomic values. One is then able to graphically represent the relative hydrophobic and hydrophilic domains of the molecules being studied. Similar approaches have been pursued by Ghose and Crippen[21] and the research group of Brickmann, as implemented in the program MOLCAD[22]. We concentrated our efforts is using these individual hydropathic fields as descriptors of what has euphemistically been described as the "hydrophobic effect" or perhaps more appropriately as an indication of short range hydrophobic interactions that are typical of bimolecular complexation. In our early work we took the steroid database used by Cramer in his original treatise on CoMFA in order to avoid any controversy over our choice of molecules and to provide a direct comparison with literature results[1]. Perhaps, with the benefit of hindsight, one may conclude that this was a poor choice for the methodology under development. Steroid molecules have historically been one of the more difficult series to predict or calculate their logP values. Consequently, although our results were more chemically interpretable, there was no statistical improvement by the addition of the hydropathic field. These were still the pioneering days of CoMFA with little experience in the addition of fields to the standard analysis. Much work had been done by Ki Kim and workers[23-28] at Abbott using fields generated by GRID[29], but there was still an aura of black art about this approach. Indeed, it was our distinct impression in working with people from Tripos that we were one of a few groups taking the first tentative steps in this direction. It was not obvious to us as to why the statistical element of the analysis should not simply weigh out the hydropathic field to zero if it 's inclusion actually reduced the predictive power of the model. Many aspects of the first release of CoMFA were hidden from the user which made our task all the more difficult. However, many of these limitations have now been overcome and with the increasing use of this technique have come many improvements. Since that time, Glen Kellogg and Vrinda Nayak have shown that one can improve the statistical result using the hydrophobic element of the HINT field in calculating binding of barbiturates to cyclodextrins[30] and we have shown in our own work in the cholinergic area that the hydrophilic field calculated by HINT increases the predictive power of our model. It is our continuing belief that the addition of this hydropathic field can provide information in the drug design process above and beyond that available from the standard steric and electrostatic data.
Did the addition of an extra field into CoMFA open "Pandora's box" or a "can of worms"? Well maybe neither. It certainly has added more information to the model, but has it also added more noise? Are the hydropathic fields covariant, to some extent, with the steric and electrostatic fields? Probably. By the very nature of their empirical derivation, the HINT generated hydropathic fields must also encode some steric and electrostatic information. The obvious solution that we and other researchers have taken is to examine the effect of both individual and combined fields on the predictive power of the model. It is here that one of the more disconcerting revelations about CoMFA manifested itself. Experienced users will know that one can define the type of field required in the analysis during the initial set up procedure. A limitation that is less obvious and that we were unaware of prior to this work is that the electrostatic field only option is essentially useless. Without the necessary steric information pertaining to the molecules under investigation, the probe atom has no information as to the molecular extent of each compound and therefore will continue to calculate electrostatic values within the volume occupied. Consequently, the results obtained are basically meaningless. The ability to accurately calculate the electrostatic field is particularly important if one chooses to combine different probes in the analysis. This is an approach that we have employed in a number of our models, where the use of an sp3 carbon probe is augmented by additional fields calculated using sp3 oxygen and hydrogen probes. The latter two probes were used to provide information as to possible hydrogen bond acceptor and donor interactions. Obviously, the individual atom probes are going to generate essentially identical steric fields and aversion of such duplication is paramount in obtaining a meaningful analysis.
For our final reflections on the uses and abuses of CoMFA let us return to the model that was developed to investigate and design potential ligands for the cholinergic m1 receptor sub-type. I have previously mentioned the method in which CoMFA was employed to distinguish between alternative molecule alignment protocols. As a result we were able to develop a model consisting of 92 divergent molecules encompassing both literature standards and in-house research compounds. The analysis obtained correlated structural information against receptor binding assays for the m1, m2 and m3 muscarinic sub-types. We routinely employ an automated procedure to eliminate outliers from the model by systematically removing each molecule from the dataset and performing the statistical analysis in their absence. In this particular example, having run through the 92 compounds, the one that adversely affects the analysis the most is removed and the procedure repeated. This process may be repeated as many times as required. Here, the procedure was performed to discover the 5 worst compounds. This approach affords the development of a model with optimum predictive power, whilst also giving information as to the structural nature of the outliers. This information can in itself be of equal importance as the analysis in the continued refinement of the model.
Consequently, we were able to take this model and employ it in the ligand design process. Since the model was based upon a pharmacophore hypothesis it could be readily extended to 3D database searching using suitable constraints. The searches were performed on both commercial and in house compound databases using the Unity program (Tripos Associates, St. Louis, MO). A number of potential ligands were identified and incorporated into the model. Their binding potencies were predicted within CoMFA and where appropriate the compounds were tested in our biological assays. The CoMFA model was able to accurately predict the activities of the majority of compounds evaluated. However, there was one major failing with this approach and that was the total inability to predict efficacy with this model. Thus, although we were able to identify compounds with high affinity, unlike the cannabinoid case there was no correlation with functional activity. Any model that could be arrived at by eliminating outliers included molecules that formed the basis of the pharmacophore hypothesis. Naturally, omission of these ligands tends to invalidate the model. The fact that we were able to predict potency but not efficacy with CoMFA, combined with models of the trans-membrane domains of the muscarinic receptor sub-types, led us to adopt a two stage interaction hypothesis. In broad terms, the model embraced two distinct sites, one being the address or recognition site and the other being the message or functional site. So CoMFA provided useful information in terms of ligand design and the prediction of binding potency, which under the best circumstances is the result one would have expected. However, above and beyond this obvious function of the model, it also directed the research in terms of differentiating alternative alignment protocols and supported the notion of both an address and message site on the receptor.
One of the major issues encountered in this and many other drug design programs is that of receptor sub-type selectivity. In this particular case the same molecular alignment could be employed to obtain a predictive model with CoMFA for both the m1 and m2 receptor population. In order to generate a model that might predict sub-type selectivity, the ratios of receptor binding were taken as a measure of biological activity. A good predictive model was obtained with CoMFA when correlating structure to the ratio of m1/m2 activity. Inspection of the (standard deviation * coefficient) steric and electrostatic fields highlighted those areas in space where modification of structure leads to an increase or decrease in selectivity. So far so good. However, if one tries to correlate structure to the ratio of m2/m1 activity no model can be obtained. This raises some interesting questions. If, by taking the ratio of m1/m2 one can obtain a good model showing where structural changes will increase m1 selectivity, why is one unable to invert the model to predict where changes will increase m2 selectivity? From a simple mathematical point of view the relationship between a number and it's reciprocal is not linear. From a statistical analysis viewpoint if one model tells you that increasing the steric bulk in a particular region will increase activity at a particular receptor sub-type, shouldn't the inverse of the model tell you where a modification will improve activity at the other receptor sub-type? Well not exactly. They are in essence two separate models. In this particular example we were fortunate enough to obtain a good predictive model for the m1/m2 ratio dataset and not where we inverted the numbers. The situation is somewhat of an anathema since mathematically one may possibly reason the discrepancy but intuitively the result appears to be incorrect.
Lastly, let us take a somewhat cursory glance at another relatively recent addition to the computational chemist's armory. De novo drug design based upon CoMFA results has been implemented in the Leapfrog algorithm within the SYBYL suite of programs. In principle, the program will extrapolate the resulting CoMFA fields and employ them as a "pseudo receptor site" for automated ligand design. In practice, well in practice who knows exactly what is going on. Even in the case when one has a receptor model available the output from Leapfrog is frankly less than inspiring. In fact it is largely chemical garbage. In the best of circumstances the approach is like trying to find a diamond in a cess pit. When using CoMFA as the basis for the ligand design the resulting compounds apparently ignore the information contained within the fields. In the optimize mode and employing a common nucleus from the original dataset, Leapfrog will append chemical moieties in regions where there is no variance in the CoMFA fields. Consequently, one has to take the additional step of pre-defining the atoms to remain constant through the experiment in order to generate even vaguely interesting structures. Having thus achieved this minimal goal whereby Leapfrog will produce improved ligands based upon calculation of a binding energy with the "receptor site", one can then return to the CoMFA model to predict activities. My naive expectation was that since the ligands were designed using CoMFA fields and that the increasing binding energy was calculated based on these same properties then the original CoMFA model from which they were derived should predict the biological potency of the designed molecules with a direct correlate to the binding energy. However, in my experience there is absolutely no correlation between calculated binding energy from Leapfrog and predicted activity from the CoMFA model.
In the rather more informal forum that this paper is being presented, I would be extremely interested in hearing of other researchers experience with CoMFA and Leapfrog. Of particular interest would be of any similar encounters and also of any successes with this de novo design algorithm. I have touched upon many facets of the use of CoMFA and the information that may be gleaned from such models. The aim of this article was not to be an exhaustive review, but more a personal reflection. It is my hope that a number of issues have been highlighted and that these may generate useful discussion. When we embark on this journey we all have our dreams of the final outcome. Will we "dream a dream" that is doomed to failure like Fantine or like Sting will we be playing in "fields of gold"?
"And so this is CoMFA and what have we done? Another year over and a new one just begun", John Lennon nearly wrote.
![]()
References
- Cramer, R.D., III, D.E. Patterson, and J.D. Bunce, Comparative
molecular field analysis ( CoMFA ). 1. Effect of shape on binding
of steroids to carrier proteins. J. Am. Chem. Soc., 1988.
110(18): p. 5959-67.
- Calder, J.A., et al., CoMFA validation of the superposition of
six classes of compounds which block GABA receptors
non-competitively. J. Comput.-Aided Mol. Des., 1993.
7(1): p. 45-60.
- DePriest, S.A., et al., 3D-QSAR of angiotensin-converting
enzyme and thermolysin inhibitors: a comparison of CoMFA models
based on deduced and experimentally determined active site
geometries. J. Am. Chem. Soc., 1993. 115(13): p.
5372-84.
- Greco, G., et al., Comparative molecular field analysis on a
set of muscarinic agonists. Quant. Struct.-Act. Relat.,
1991. 10(4): p. 289-99.
- Prendergast, K., et al., Derivation of a 3D pharmacophore model
for the angiotensin-II site one receptor. J. Comput.-Aided Mol.
Des., 1994. 8(5): p. 491-512.
- Horwitz, J.P., et al., Comparative molecular field analysis of
in vitro growth inhibition of L1210 and HCT-8 cells by some
pyrazoloacridines. J. Med. Chem., 1993. 36(23): p.
3511-16.
- Klebe, G. and U. Abraham, On the prediction of binding
properties of drug molecules by comparative molecular field
analysis. J. Med. Chem., 1993. 36(1): p. 70-80.
- Myers, A.M., et al., Conformational Analysis, Pharmacophore
Identification, and Comparative Molecular Field Analysis of Ligands
for the Neuromodulatory .sigma.3 Receptor. J. Med. Chem.,
1994. 37(24): p. 4109-17.
- Czaplinski, K.-H.A. and G.L. Grunewald, A comparative molecular
field analysis derived model of the binding of Taxol analogs to
microtubules. Bioorg. Med. Chem. Lett., 1994. 4(18):
p. 2211-16.
- Ablordeppey, S.Y., M.B. El-Ashmawy, and R.A. Glennon, Analysis
of the structure-activity relationships of sigma receptor ligands.
Med. Chem. Res., 1991. 1(6): p. 425-38.
- Diana, G.D., et al. The use of the 3-dimensional structures
of rhinoviruses in the design of antiviral agents.
- Harpalani, A.D., et al., Alkylamides as inducers of human
leukemia cell differentiation: A quantitative structure-activity
relationship study using comparative molecular field analysis.
Cancer Res., 1993. 53(4): p. 766-71.
- Osabe, H., et al., Quantitative structure-activity
relationships of light-dependent herbicidal
4-pyridone-3-carboxanilides III. 3-D (comparative molecular field)
analysis including light-dependent diphenyl ether herbicides.
Pestic. Sci., 1992. 35(2): p. 187-200.
- Waller, C.L., et al., Conformational analysis, molecular
modeling, and quantitative structure-activity relationship studies
of agents for the inhibition of astrocytic chloride transport.
Pharm. Res., 1994. 11(1): p. 47-53.
- Agarwal, A., et al., Three-dimensional quantitative
structure-activity relationships of 5-HT receptor binding data for
tetrahydropyridinylindole derivatives: a comparison of the Hansch
and CoMFA methods. J. Med. Chem., 1993. 36(25): p.
4006-14.
- Akamatsu, M., et al. 3D QSAR of insecticidal
dioxatricycloalkene and its related compounds.
- Neuwels, M., Approach to an adenosine pharmacophore by
molecular modeling. J. Pharm. Belg., 1992. 47(4): p.
351-63.
- Waller, C.L. and J.D. McKinney, Three-Dimensional Quantitative
Structure-Activity Relationships of Dioxins and Dioxin-like
Compounds: Model Validation and Ah Receptor Characterization.
Chem. Res. Toxicol., 1995. 8(6): p. 847-58.
- Thomas, B.F., et al., Modeling the cannabinoid receptor: a
three-dimensional quantitative structure-activity analysis.
Molecular Pharmacology, 1991. 40(5): p. 656-65.
- Kellogg, G.E., S.F. Semus, and D.J. Abraham, HINT: a new method
of empirical hydrophobic field calculation for CoMFA. J.
Comput.-Aided Mol. Des., 1991. 5(6): p. 545-52.
- Ghose, A. and G. Crippen, J. Comp. Chem., 1986.
7(4): p. 565-577.
- Zachmann, C.D., et al., J. Comp. Chem., 1992.
13(1): p. 76-84.
- Kim, K.H. Use of the hydrogen-bond potential function in
comparative molecular field analysis (CoMFA): An extension of
CoMFA.
- Kim, K.H. and Y.C. Martin, Evaluation of electrostatic and
steric descriptors for 3D-QSAR: the hydrogen ion and methyl group
probes using comparative molecular field analysis (CoMFA) and the
modified partial least squares method. Pharmacochem. Libr.,
1991. 16 (QSAR: Ration. Approaches Des. Bioact. Compd.): p.
151-4.
- Kim, K.H., 3D-Quantitative structure-activity relationships:
description of electronic effects directly from 3D structures using
a GRID- comparative molecular field analysis (CoMFA) approach.
Quant. Struct.-Act. Relat., 1992. 11(2): p.
127-34.
- Kim, K.H., Description of nonlinear dependence directly from 3D
structures in 3D-quantitative structure-activity relationships.
Med. Chem. Res., 1992. 2(1): p. 22-7.
- Kim, K.H., 3D-quantitative structure-activity relationships:
Describing hydrophobic interactions directly from 3D structures
using a comparative molecular field analysis (CoMFA) approach.
Quant. Struct.-Act. Relat., 1993. 12(3): p.
232-8.
- Kim, K.H., et al., Use of the hydrogen bond potential function
in a comparative molecular field analysis (CoMFA) on a set of
benzodiazepines. J. Comput.-Aided Mol. Des., 1993.
7(3): p. 263-80.
- Goodford, P.J., A computational procedure for determining
energetically favourable binding sites on biologically important
macromolecules. J. Med. Chem., 1985. 28: p.
849-857.
- Nayak, V.R. and G.E. Kellogg, Cyclodextrin-barbiturate inclusion complexes: a CoMFA /HINT 3-D QSAR study. Med. Chem. Res., 1994. 3(8): p. 491-502.
NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:
- Network Science Corporation
- 4411 Connecticut Avenue NW, STE 514
- Washington, DC 20008
- Tel: (828) 817-9811
- E-mail: TheEditors@netsci.org
- Website Hosted by Total Choice