Using Theoretical Descriptors in Quantitative Structure
Activity Relationships and Linear Free Energy Relationships
Mark Surles
George R. Famini
U.S. Army Edgewood Research Development and Engineering Center
Leland Y. Wilson
Department of Chemistry
La Sierra University
http://www.netsci.org/Science/Compchem/feature08.html
Introduction
Inherent to chemistry is the concept that there is a relationship between bulk properties of compounds and the structure of the molecules of those compounds. This provides a connection between the macroscopic and the microscopic properties of matter, and has been the backbone of chemistry for a long time (for instance, compounds with carbonyl groups are known to be acids; they have a sour taste, form red litmus solutions and neutralize bases). It is the basic tenet of chemistry to attempt to identify these relationships between molecular structure and activity/property and to quantify them.
There are two primary approaches available today to the computation of physical and chemical properties. The first, and more aesthetically pleasing, is the direct computation through the implementation of quantum mechanical or statistical mechanical means [1]. While the number of properties available in this manner are growing, it is still fairly limited, and the molecule size is still limited. The second approach, the use of linear free energy relationships (LFER) and quantitative structure activity/property relationships (QSAR/QSPR) is much more empirical in nature, but within the confines of requiring experimental data for a training set, provides more flexibility for computing physical, chemical and biological properties [2]. Until fairly recently, QSAR and LFER correlations have used primarily empirically based descriptors (this is not entirely true, this point will be expanded in a later section), but the literature is now being filled with examples of QSAR/LFER with computationally derived descriptors.
It is not possible within the scope of this article to detail the extensive history of LFER, QSAR or the use of theoretical descriptors in QSAR. Instead, we will attempt to provide a brief (and incomplete, apologies to those excluded) description of the basic tenets of LFER, and QSAR, and our current efforts to attempt to merge this area with quantum chemistry. As such, this paper will be divided into four sections: a) a discussion of LFER and QSAR, b) a description of the linear solvation energy relationship of Kamlet and Taft, c) a discussion of our attempts at incorporating computationally derived descriptors into the Kamlet-Taft paradigm, and d) a few comparative results that demonstrate the TLSER techniques.
Linear Free Energy Relationships
LFER techniques, as originally developed by Hammett were intended to purely quantify the effect of substituents and leaving groups on ester hydrolysis [3-5]. What it in actuality did, was to provide the pioneering work on showing the usefulness of parametric procedures in describing an empirical property (equilibrium constant, rate constant) in terms of a parameter describing molecular structure (sigma, also the slope, rho). This relationship really does provide the thermodynamic basis for all of the recent (and not so recent) work in QSAR by the relations:
![]()
By assuming that the equilibrium constant is a function of the
structure of a molecule, which in turn affects
, one arrives at most of the
implementations of LFERs today. The classical descriptors developed
first by Hammett, then by Taft, assume these relationships, and
indeed, verify them.
QSAR was a natural extension of the LFER approach, with a biological activity correlated against a series of parameters that described the structure of a molecule [6]. The most well known and most used of these descriptors in QSAR has been the LOG (Octanol/Water) partition coefficient (usually referred to as LOG P or LOG P[oct]). LOG P has been very useful in correlating a wide range of activities due to its excellent modeling of the transport across the blood/brain barrier [7]. Unfortunately, many regressions do not work well for LOG P, usually because other effects are important, such as steric and electronic effects. Therefore, many other descriptors have been used in QSAR in addition to the LOG P to incorporate these additional effects; such as
![]()
In the past, theoretical chemistry has been used to provide descriptors for QSAR [8-11]. Ford and Livingstone point out some of the advantages of computationally derived descriptors over the extra-thermodynamic descriptors typically used in QSAR [12]. By "theoretical", one can imagine three types of descriptors: a) empirical based parameters that can be computed via estimation programs (such as LOG P); b) graph theoretical and information indices (such as Kier and Hall's X [13]); and molecular orbital based parameters. This paper focuses on the latter descriptor.
Linear Solvation Energy Relationship
An enormous number of descriptors have been used in both LFER and QSAR, both empirical and computational, to develop usable regressions. Some of the difficulties with such a large base from which to choose (more on this is a few paragraphs) is which ones will provide the best regressions. By "best", one must take into account both goodness of fit (through t-score, F statistic, standard deviation, variance inflation factors) and the chemical meaning of the regression(s). An excellent correlative regression with difficult to interpret parameters provides one with neither characterization capabilities nor the ability to predict properties of new compounds a priori. Therefore, it is imperative that parameters be chosen that have chemical meaning. Using these arguments, Kamlet and Taft (K-T) developed a new generalized relationship for studying LFERs of solute/solvent interactions [14]. Termed the linear solvation energy relationship (LSER), it has the form:
Property = bulk/cavity + dipolarity/polarizability +
hydrogen bonding acidity
+ hydrogen bonding basicity
Each of these descriptors, as developed by K-T, were empirically derived: the cavity term was usually the molar volume, the other three terms derived directly from UV-Vis spectral shifts (hence the descriptors are sometimes referred to as the solvatochromic parameters). New parameter scales (separated for solutes and solvents) have been independently developed by Abraham [15].
There are several advantages to the LSER. Over 250 solute/solvent based properties have been correlated using either the K-T or the Abraham parameter scales, ranging from simple physical properties to very complex biological activities [16]. By having a common parameter set (all regressions use the same parameters, or a subset thereof), one can compare regression equations easily, and infer mechanisms and the like.
Theoretical Linear Solvation Energy Relationship
While the above methodology provides an excellent means for characterization of interactions, our group was most interested in the ability to predict properties a priori. This meant a method like the LSER was not appropriate, as it was necessary to synthesize the compound in order to measure the descriptors. Therefore, we embarked on an endeavor to develop parameters that could be computed from molecular orbital methods that could be inserted one for one into the LSER [17-20]. Termed the theoretical linear solvation energy relationship (TLSER), the relationship attempts to maintain the same characteristics as the LSER, but under a theoretical basis. As a brief aside, there are two philosophies on the generation of computationally derived descriptors. Our philosophy has been to maintain a small (six descriptors, to be described shortly) parameter base, and all regressions are developed using these or a subset of these parameters. This provides the same comparative capability as the LSER, and because the TLSER and LSER parameters are attempting to describe the same (basically) phenomena, offers a comparison between the two techniques as well. The other approach has been to derive a plethora of descriptors, then process them through a best subsets regression to develop a final regression. While potentially more comprehensive than the TLSER, it suffers from several drawbacks. It will generally not permit comparison across data sets, and sometimes not across related regressions. Different regressions will result depending on which statistical program is used to perform the best subsets regression. Finally, much more care must be taken when analyzing the results because the final descriptors may be significantly cross correlated.
Table 1 shows the TLSER descriptors and the fundamental property being modeled.
Table 1: TLSER Descriptors
| Property | Name | Definition | Units | Meaning |
|---|---|---|---|---|
| Vmc/100 | Molecular Volume | Molecular Volume | 100 cubic Angstroms | Cavity |
| Pi[I] | Polarizability Index | Polarizability/Vmc | None | Polarizability |
| Epsilon Beta | Covalent Basicity | 0.30-|DE(h,lw)|/100 | hev | Acceptor HBB |
| Epsilon Alpha | Covalent Acidity | 0.30-|DE(hw,l)|/100 | hev | Donor HBA |
| q- | Electrostatic Basicity | maximum |(-) charge) | acu | Acceptor HBB |
| q+ | Electrostatic Acidity | maximum H(+) charge | acu | Donor HBA |
hev=hecto-electronvolt; acu=atomic charge unit;
HBB/A= hydrogen bond basicity/acidity; DE(h,lw)=E(h)-E(lw);
E(h)=HOMO energy of substrate; E(lw)= LUMO energy of water;
E(hw)=HOMO energy of water; E(l)=LUMO energy of substrate
Table 1: TLSER Descriptors
For Non-TABLE Browsers
Property Name Definition Units Meaning Vmc/100 Molecular Volume Molecular Volume 100A3 Cavity pi[I] Polarizability Index Polarizability/Vmc None Polarizability e[beta] Covalent Basicity 0.30-|DE(h.lw)|/100 hev Acceptor HBB e[alpha] Covalent Acidity 0.30-|DE(hw,l)|/100 hev Donor HBA q- Electrostatic Basicity maximum |(-) charge) acu Acceptor HBB q+ Electrostatic Acidity maximum H(+) charge acu Donor HBA
hev=hecto-electronvolt; acu=atomic charge unit;
HBB/A= hydrogen bond basicity/acidity; DE(h,lw)=E(h)-E(lw);
E(h)=HOMO energy of substrate; E(lw)= LUMO energy of water;
E(hw)=HOMO energy of water; E(l)=LUMO energy of substrate
The bulk/cavity term for multiple solutes in a single solvent system is taken as the molecular volume. We have typically divided by 100 so that all magnitudes are roughly the same. Vmc represents the hole that must be created in the solvent matrix necessary to solvate the solute molecule. As such Vmc is always an endoergic quantity. The polarizability term is arrived at by computing the polarization volume and dividing by the Vmc. The resulting quantity, a unitless quantity we call the polarizability index, pI, reflects the ability of the electrons to move throughout the molecule, without any volume dependence. In the LSER, the HBB and HBA are represented by single scales (a and b, respectively), because of the empirical nature of the solvatochromic parameters. In developing the TLSER, we found it necessary to divide the HBB and HBA into two components each, covalent and electrostatic scales. This is very similar to the definitions developed by both Pearson in his HSAB theory and Drago in developing his EC equation [21]. The final generalized TLSER equation therefore has six parameters, although a case where all six are significant (based on the t=.95 level) has not yet been found.
Although it, in theory, possible to use any molecular orbital method that permits the computation of HOMO/LUMO levels, charges and polarizabilities for generating the TLSER descriptors, we have used the MNDO algorithm almost exclusively. The number of molecules necessary for computation for each regression is extensive, and when this project began in 1985, we were limited to MicroVAX technology. Therefore, it was deemed appropriate to utilize semi- empirical molecular orbital methods for obtaining the optimized structure and the generation of the TLSER descriptors. Because of the need for electronic information, molecular mechanics calculations were not feasible (although they were useful originally when we were concerned only with the cavity effects). The only program available at that time that met these needs was MNDO as contained within MOPAC [22]. At that time, AM1 and PM3 had not been fully developed, and were not available. By the time both of these methodologies were "useful" we had developed a significant library of TLSER parameters.
Representative TLSER Regressions
The TLSER has been used to date to correlate almost 100 different solute/solvent properties, with many of the same regressions as the LSER. Two representative correlations will be presented here in order to demonstrate the usefulness of the technique and the applicability to both characterization and prediction of properties. In the first, the use of the TLSER methodology in examining the cytochrome P-450 mediated acute nitrile toxicity is examined [23]. This regression is fairly unique in that it correlates an LD50 of a mouse, rather than IC50 or Ki's which are looking at specific receptor sites or inhibition of specific proteins. As such, most LD50's have a much lower correlation coefficient. In addition, this correlation examined a complex reaction. In the second, the TLSER is correlated with the GC retention index for a series of organosulfur compounds. Because of the accuracy in GC experimental data, one would expect excellent correlations to be found [24]. Each of the descriptions below is necessarily short, although each has been previously published.
Cyotchrome P-450 Mediated Acute Nitrile Toxicity
Nitriles represent a large class of compounds which have wide industrial application as solvents and chemical intermediates. Studies have shown that exposure of humans and animals to certain nitriles results in cyanide release and acute cyanide poisoning. The release of cyanide from nitriles is now known to proceed via a cytochrome P450 mediated mechanism, including H* abstraction, OH* addition and dissication of the hydroxylated nitrile:

A number of researchers have attempted to use QSAR and LFER methodologies to correlate acute toxicity (LD50, usually in mice). Tanii and Hashimoto observed a parabolic relationship with the LOG P for a limited data set. Pearlman and Devito developed a relationship using the LOG P (again parabolic) and a computed rate for alpha-hydrogen abstraction (ln k). Because of the nature of the TLSER, we were interested if: a) the TLSER could result in a comparable regression; and b) if the TLSER could impart any additional fundamental information or inferences. Using the data compiled by Grogan et. al., we developed a TLSER regression for LD50, and resulted in:

n=23 R=0.90 SD=0.288 F=18.3
One important consideration in this regression is the ability of ground state parameters to fairly effectively model the ability of the cytochrome P-450 to mediate hydrogen abstraction. Also, by considering the importance (through the t-statistic, not provided in this overview) of each descriptor, it is possible to infer a qualitative mechanism, based on cavity effects and polarization effects (primarily).
GC Retention Indices of Sulfur Mustards
The Gulf War (Operation Desert Storm) highlighted the critical need for adequate detection and identification techniques for the common chemical warfare agents. Certain organosulfur compounds still pose a significant threat as a physical contact hazard, and results in blistering of exposed skin or membranes. The persistency of these materials and their overall danger are largely responsible for the continuing concern surrounding them. The most commonly identified organosulfur agent is mustard, bis(2-chloroethyl) sulfide, and has been the basis of a number of theoretical investigations:
![]()
Sulfur Mustard
A number of analytical methodologies have been used to identify mustard and related derivatives. These include mass spectrometry, ion mobility spectroscopy, gas chromatography, thin layer chromatography, and high performance liquid chromatography. Of these, capillary gas chromatography provides perhaps the best separations, even when the sample is heavily contaminated. D’Agastino and Provost determined the gas chromatography retention index for 37 sulfur containing compounds using three polysiloxane base column packings.
Woloszyn and Jurs were very successful at correlating (R=0.998) the RIx against a series of four computationally derived descriptors. Our objective was to see if the TLSER approach (a standard set of descriptors) could be successful in correlating the GC retention data. Further, we used the D'Agastino data to compare MNDO, AM1 and PM3 generated descriptors.
A complete analysis of the regressions is provided in an upcoming paper. However, a brief description is pertinent here. The tables (Tables 2-4) showing the coefficients of the TLSER regressions for each of the descriptors are reproduced here. Table 2 shows the results using MNDO, table 3 using AM1 and table 4 using PM3. As is immediately obvious with the F and SD, MNDO provides a better statistical fit than either AM1 and PM3. One must, though, go one step further and examine the parameters which were determined to be significant. In this case, AM1 reduces to two descriptors, and both the PM3 and AM1 regressions seem to make more chemical "sense" than MNDO, even though MNDO provides a better regression. This is most notable in the reversal of the polarizability coefficient from MNDO to AM1/PM3 (because of the nature of the columns, one would expect to see the results seen in the AM1/PM3 regressions).
Table 2: Ri[x] Retention Index Correlations (MNDO)

| Column | a | b | d | e | g | N | R | SD | F |
|---|---|---|---|---|---|---|---|---|---|
| DB-1 | 8.834 | -21.747 | n/s | 8.111 | 968.9 | 34 | 0.991 | 64.10 | 587 |
| DB-5 | 9.170 | -23.215 | n/s | 9.004 | 1027.4 | 34 | 0.991 | 68.83 | 548 |
| DB-1701 | 10.423 | -29.215 | n/s | 10.309 | 1422.3 | 34 | 0.988 | 88.69 | 424 |
Table 2: Ri[x] Retention Index Correlations (MNDO)

Formatted for Non-Table Browsers
Column a b d e g N R SD F DB-1 8.834 -21.747 n/s 8.111 968.9 34 0.991 64.10 587 DB-5 9.170 -23.215 n/s 9.004 1027.4 34 0.991 68.83 548 DB-1701 10.423 -29.215 n/s 10.309 1422.3 34 0.988 88.69 424
Table 3: Ri[x] Retention Index Correlations (AM1)

| Column | a | b | d | e | g | N | R | SD | F |
|---|---|---|---|---|---|---|---|---|---|
| DB-1 | 8.323 | 5519 | n/s | n/s | -339.1 | 35 | 0.984 | 85.3 | 505 |
| DB-5 | 8.625 | 5830 | n/s | n/s | -358.2 | 35 | 0.983 | 92.6 | 461 |
| DB-1701 | 9.944 | 5446 | n/s | n/s | -353.9 | 34 | 0.984 | 106.0 | 435 |
Table 3: Ri[x] Retention Index Correlations (AM1)

Formatted for Non-Table Browsers
Column a b d e g N R SD F DB-1 8.323 5519 n/s n/s -339.1 35 0.984 85.3 505 DB-5 8.625 5830 n/s n/s -358.2 35 0.983 92.6 461 DB-1701 9.944 5446 n/s n/s -353.9 34 0.984 106.0 435
Table 4: Ri[x] Retention Index Correlations (PM3)

| Column | a | b | d | e | g | N | R | SD | F |
|---|---|---|---|---|---|---|---|---|---|
| DB-1 | 8.209 | 5763 | 5763 | n/s | -437.0 | 37 | 0.982 | 90.7 | 297 |
| DB-5 | 8.494 | 6130 | 6130 | n/s | -470.5 | 37 | 0.980 | 98.7 | 270 |
| DB-1701 | 9.723 | 7067 | 7270 | n/s | -648.9 | 34 | 0.984 | 105.4 | 298 |
Table 4: Ri[x] Retention Index Correlations (PM3)

Formatted for Non-Table Browsers
Column a b d e g N R SD F DB-1 8.209 5763 5763 n/s -437.0 37 0.982 90.7 297 DB-5 8.494 6130 6130 n/s -470.5 37 0.980 98.7 270 DB-1701 9.723 7067 7270 n/s -648.9 34 0.984 105.4 298
Conclusions
The above examples have attempted to provide a brief overview of the usefulness of utilizing computationally derived descriptors in QSAR and LFER. Further, the TLSER has also demonstrated the advantage of utilizing these descriptors in a consistent manner. TLSER based regressions have been developed for a very large range of properties. In each case, the resulting correlation coefficients have averaged around 0.94, with the physical/chemical property correlations significantly higher, and the biological property correlations somewhat lower (in vivo correlations are typically the lowest, ranging around 0.85 to 0.90).
References
- References here are too numerous to list. A good review is Politzer, P. and Murray, J.S., Quantitative Treatments of Solute/Solvent Interactions, Elsevier, 1994.
- Cramer, C.J., Famini, G.R., Lowrey, A.H., Acc Chem Res, 26, 599 (1993).
- Burkhardt, Nature (London), 17, 684 (1935).
- Hammett, L.P., Chem Rev, 17, 125 (1937).
- Hammett, L.P., J Am Chem Soc, 59, 125 (1937).
- Hansch, Acc Chem Res, B2, 232 (1969).
- Gupta, S., Chem Rev, 87, 1183 (1987).
- Loew, G.H., et al, Environ Health Perspect, 61, 69 (1985).
- Pederson, Environ Health Perspect, 61, 185 (1985).
- Chastrette, M., et al, J Am Chem Soc, 107, 1 (1985).
- Lewis, D.F.V., in Bridges, J.W., Chasseaud (eds), Progress in Drug Metabolism, p205, (1990).
- Ford, M.G., Livingstone, D.J., QSAR, 9, 107 (1990).
- Kier, L.B., Hall, L., Molecular Connectivity in Structure Activity Relationships, Research Studies Press, Letchworth, 1986.
- Kamlet, M.J., Taft, R.W., Abboud, J-L.M., J Am Chem Soc, 91, 825 (1977).
- Abraham, M.H., et al, J Chem Soc Perkins II, 291 (1990).
- Kamlet, M.J., Taft, R.W., Famini, G.R., Doherty, R.M., Acta Chem Scand, 41, 589 (1987).
- Famini, G.R., Wilson, L.Y., in Politzer, P. and Murray J.S. (eds), Quantitative Treatments of Solute/Solvent Interactions, p213, (1994).
- Lowrey, A.H., Famini, G.R., Structural Chemistry, 6(4/5), 357 (1995).
- Famini G.R., Wilson, L.Y., J Phys Org Chem, 6, 539 (1993).
- Wilson, L.Y., Famini, G.R., J Med Chem, 34, 1668 (1991).
- Pearson, R.G., Hard and Soft Acids and Bases, Dowden, New York, 1980.
- Stewart, J.J.P., Mopac Manaul, FJSRL-TR-88-007, Frank J. Seiler Research Laboratory, U.S. Air Force Academy, 1988.
- Famini, G.R., Wilson, L.Y., DeVito, S.C., in Saleh, M.A., Blancato, J.N., Nauman, C.H. (eds), Biomarkers of Human Exposure to Pesticides, p 22 (1994).
- Donovan, W.H., Famini, G.R., J Chem Soc Perkins II, in press (1995)
NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:
- Network Science Corporation
- 4411 Connecticut Avenue NW, STE 514
- Washington, DC 20008
- Tel: (828) 817-9811
- E-mail: TheEditors@netsci.org
- Website Hosted by Total Choice