A New QSAR Algorithm Combining Principal Component
Analysis with a Neural Network: Application to
Calcium Channel Antagonists
Vellarkad N. Viswanadhan*[1], Geoffrey A. Mueller[1],
Subhash C. Basak[2] and John N. Weinstein*[1]
[1] Laboratory of Molecular Pharmacology,
National Cancer Institute,
Developmental Therapeutics Program,
National Institutes of Health,
Bethesda, MD 20892
[2] Center for Water and Environment,
Natural Resources Research Institute,
University of Minnesota,
5013 Miller Trunk Highway,
Duluth, MN 55811
[1] Present address: Gensia, Inc., 9390 Towne Center Drive, San
Diego, CA 92121
* Correspondence may be addressed to either author.
http://www.netsci.org/Science/Compchem/feature07.html
Summary
In this paper, a new QSAR algorithm has been developed and applied to a set of calcium channel blockers which are of special interest because of their role in cardiac disease and also because many of them interact with p170 (MDR-1), a membrane protein associated with multi-drug resistance to anti-cancer agents. A set of 46 calcium (Ca2+) channel blockers of the 1,4-dihydropyridine type with known calcium channel binding affinities were employed for the present analysis. The present QSAR algorithm can be summarized as follows:
- A set of 90 graph theoretic and information theoretic
descriptors representing various structural/topological
characteristics of these molecules were calculated;
- Principal component analysis (PCA) was used to compress these
90 into the 8 best orthogonal composite descriptors. These 8
sufficed to explain 96% of the variance in the original descriptor
set;
- Two important empirical descriptors, the Leo-Hansch lipophilic
constant and the Hammet electronic parameter were added to the list
of 8;
- The 10 resulting descriptors were used as inputs to a
back-propagation neural network whose output was the predicted
binding affinity;
- The predictive ability of the network was assessed by cross-validation. When experimental and predicted binding data were compared, three-layer neural networks with 4 - 8 hidden layer processing elements yielded higher correlations than did standard multiple linear regression.
The present approach may prove useful when (i) the initial number of potentially important QSAR descriptors is large, and (ii) the descriptors (and their principal components) are non-linearly related predictors of a structural or functional feature, for example, a binding constant.
![]()
Introduction
We recently used neural networks to analyze the relationship
between function and mechanism of action in the large, diverse set
of drug molecules tested in the National Cancer Institute's cancer
drug screening program [1]. The next logical step was to
integrate structure-function and structure-activity relationships
into the analysis as well. As a start in that direction, we have
begun with a test case based on a set of homologous
1,4-dihydropyridine calcium (Ca2+) channel blockers.
These Ca2+ channel blockers have previously been the
subject of several SAR, QSAR, and 3D-QSAR studies [2-6]. The
data set for the present analysis included
2,6-dimethyl-3,5-dicarbomethoxy-4-phenyl-1,4-dihydropyridine and 45
ortho-, meta-, and/or para- substituted derivatives. An earlier
QSAR analysis of this set by Coburn and coworkers [5] used
two well-known empirical descriptors, the Leo-Hansch lipophilicity
term (
)
and the Hammet electronic parameter (
), as well as
three other sterimol parameters. Since our aim was to build toward
the analysis of a bigger and more diverse set of compounds, we
employed a much larger set of molecular properties, which included
a set of 90 topological indices from graph theoretic and
information theoretic analyses [7-10].
The most familiar approaches to QSAR [11] are based on multiple linear regression (MLR) and partial least squares (PLS) [12, 13]. As the names imply, these approaches can capture only linear relationships between molecular characteristics and functional features to be predicted. In contrast, neural networks [14, 15] are capable of recognizing highly non-linear relationships; hence, they provide an interesting new approach to QSAR [see e.g., 16,17] and QSPR analysis [18,19]. To reduce the number of input variables, we first subjected the matrix of theoretical descriptors to principal component analysis (PCA), thereby reducing the number of variables, before adding the empirical descriptors and using the combination as inputs to the network. PCA, it should be noted, uses information only from the internal variations in the input matrix and does not take any account of the measured binding constants, unlike a PLS approach. In the present work, we describe the PCA and neural net analyses as applied to a set of Ca2+ channel blockers and show that this approach can be used to obtain reasonable predictions of binding data.
![]()
METHODS
Overview of the QSAR algorithm
The essential steps of the present QSAR algorithm can be summarized as follows:
(1) Theoretical descriptors for each molecule in the database are calculated using the computer program POLLY [9]. In the present version of the algorithm, we use 90 topological descriptors based on information-theoretic and graph theoretic analysis [7-10].
(2) Principal component analysis is then used to assess the intrinsic dimensionality of the problem and extract the components (linear combinations of the descriptors) that explain most of the variance in the original data.
(3) Properties dependent on conformation, stereochemical
configuration, and charge distribution (such as lipophilicity,
dipole moment and hydrophobic moment) are added (as appropriate) to
the descriptor list from step (2). In the present version, we use
two well-known substituent constants, the Leo-Hansch lipophilic
term (
)
[20] and the Hammet electronic parameter (
)
[21].
(4) A feed-forward, back-error propagating neural network is constructed to model the structure-activity relationships. The input vector is the set of descriptors for each molecule in the series, as generated by the previous steps. The network is configured with at least one hidden layer of processing elements. The presence of a hidden layer makes it possible to classify categories (or outputs) that are not linearly separable. The network is trained to reproduce the binding affinities by repetitive presentation of the set of input vectors, randomized each time.
(5) Cross-validation is performed by dividing the input dataset into several distinct training and test subsets such that each training set covers all of the substituent positions and represents the structural diversity in the original dataset.
Fig. 1 shows the skeletal
2,6-dimethyl-3,5-dicarbomethoxy-4-phenyl-1,4-dihydropyridine
structure. This compound and the forty five derivatives listed in
Table 1 constituted the database for
analysis. Table I also lists values of log(1/EC50), a
pharmacological measure of the effect of a calcium channel
antagonist on the tonic contractile response of longitudinal muscle
strips from guinea pig ileum. These data and two empirical
parameters for each compound in the dataset, the lipophilic
substituent constant (
) and the Hammet electronic constant (
), are from
an earlier publication [5]. In addition, ninety theoretical
descriptors based on information-theoretic and graph theoretic
analyses were considered for each compound, as described below.
Graph Theoretic and Information Theoretic Indices
The graph theoretic and information theoretic indices used in
our analysis are derived from the adjacency matrix and distance
matrix of a chemical graph [7-10]. These indices are
well-documented in a number of publications [22]. The Weiner
index, the information-theoretic index on graph distance, the mean
information index and path length parameters of order h (
) are derived
from the distance matrix (topological) of the hydrogen-suppressed
chemical graph. D(G) is a symmetric n x n matrix, where n is the
number of non-hydrogen atoms in the molecule. Each element of D(G)
is the topological distance (smallest number of intervening bonds
in the graph, G) between two vertices.
are the number of
paths of length h in the graph. In this study we included indices
covering paths of order h=0,1....,6. To calculate the information
indices, a set A of n elements is derived from the molecular graph
G, depending upon specific structural characteristics. This set is
then partitioned into disjoint subsets A(i) of order n(i)
(i=1,2,...h:
n(i) = n), based on an equivalence relation defined on A.
Two atoms were considered equivalent if they possessed an identical
r-th order topological neighborhood. A probability distribution is
then assigned to the equivalence classes which gives the
probability that a randomly selected element of A (A[i]) will occur
in the i-th subset. Information indices used here included those
derived from the distance matrices, measures of graph complexity,
structural information measures and complementarity information
measures.
Principal Component Analysis
The descriptor data for the set of molecules defines a 90-dimensional parameter space in which each compound corresponds to a point. The method of principal component analysis (PCA or Karhunen-Loeve transformation) [23] reduces the dimensionality by embedding all of the points (compounds) in a sub-space dimensionally less than 90 such that a desired degree of variance in the original data is captured in the subspace. Each principal component is a linear combination of the original variables, with coefficients given by the eigen vectors of the covariance matrix. The first principal component (PC) is the axis that minimizes the sum of squared Euclidean distances from the points to that axis. The second PC is given by projections onto a basis vector orthogonal to the first PC; the third PC is orthogonal to the first two PC's and so on. For the data in this investigation, it was most appropriate to transform the indices transformed by taking the log of the index plus one and then standardizing to a mean of zero and variance one. For reducing the dimensionality, we have chosen to retain the PCs with eigenvalues greater than one [24].
Neural Network Modeling
For this study we developed a set of feed-forward,
back-propagation-of-error networks using the NeuralWare
Professional II software package [25]. The training
procedure used was similar to that of Rumelhart et al [26].
Network weights (W[ji](s) ) for a processing element (P.E.) 'j'
receiving output from the P.E. 'i' in the layer 's' were initially
assigned random values between -0.1 and +0.1. We chose the
hyperbolic tangent as the transfer function that generates the
output of a neuron from the weighted sum of inputs from the
preceding P.E.'s. Consecutive layers were fully interconnected;
there were no connections within a layer. As usual with
back-propagation networks, a bias unit was employed as part of
every layer except the output layer. This unit has a constant
activation of unity and is connected to all units in the next
higher layer. During one "epoch," all compounds in the training set
were presented, and weights in the network were then adjusted on
the basis of the discrepancy between network output for
log(affinity) and the experimental value. The training set was
presented in a different random order during each epoch to avoid
cyclical behavior and the local minima common to most optimization
algorithms. Weight adjustment proceeded on the basis of a gradient
descent in the root mean square error between output values
(predicted binding affinities) and target values (experimental
affinities). Convergence properties of the network were enhanced by
using a "momentum" term proportional to the previous weight change
[15]. Input to the network consisted of (i) the first eight
principal components derived from principal component analysis of
ninety topological variables as described in the previous section;
and (ii) two other variables, the lipophilic substituent constant
(
) and the
Hammet electronic parameter (
), which appear to be independently
important.
Cross-validation
For this purpose, five different pairs of training and test sets were constructed from the data of Table I. The union of each training set and corresponding test set equaled the original database and each of the molecules was present once in one of the test sets (except for the unsubstituted original compound (entry 13 in Table I) which was present in all training sets). At least one entry corresponding to each substituent position was placed in each training set. The last column of Table I identifies the test set to which each compound belonged. Cross-validation methods have been used previously in QSAR work [12], but the procedure usually involved a random deletion of rows (molecules) from the QSAR table. In contrast, the present procedure is designed to offer similar information to all training networks and is therefore statistically more appropriate. The same sets were used for cross-validation of both MLR and neural net procedures.
![]()
RESULTS AND DISCUSSION
The compounds represented in this study share the same molecular skeleton, 2,6-dimethyl-3,5-dicarbomethoxy-4-phenyl-1,4-dihydropyridine. Earlier QSAR analysis [5] clearly indicated the need for inclusion of lipophilic, electronic and steric terms in the QSAR equation. Specifically, the Hammet constant for the substituent at the meta position has been shown to be important. In the present study, we have included this and the lipophilic substituent constant explicitly to model electrostatic and hydrophobic interactions; these aspects are unlikely to be fully represented in the information-theoretic and graph-theoretic descriptors. Though these empirical descriptors have the limitation that conformation-dependence is not accounted for, they have the advantage of simplicity and can easily be adapted for treating large datasets. Steric effects, however, are encoded by topological indices which represent size and shape. Hence our QSAR model does not contain any specific steric terms.
Principal component analysis of the ninety topological indices for the present set of molecules showed that the first eight components explained 96% of the variance in the original data, indicating that almost all of the information in the original ninety-variable set was represented in these eight components. We, therefore, used these eight components as part of the input to the neural nets. Neural nets were constructed using NeuralWare Professional II on a desktop personal computer. Details of the network configurations will be described later. Table II lists the final results of analysis; it represents 'test' correlations obtained after 100,000 epochs of training of each of the five training networks. It shows that good correlations are obtained with a neural network model that uses 8 processing elements in the hidden layer. Increasing the number of PE's beyond 8 did not improve the correlations further. Table II indicates that neural net models perform better than the multiple regression method. In the literature, one finds examples of network performance being relatively insensitive to an increase in the number of processing elements in the hidden layer. The study by Andrea and Kalayeh [17], for example, shows that the reduction in test-set variance between 3 and 8 hidden PE's is small, though the minimum in the variance is reached at a size of 8 PE's for the hidden layer. The present work indicates that an appropriate configuration for the network is dependent on the problem at hand.
In Table I, entries in the third through ninth columns correspond to the observed and predicted (i.e., cross-validated) values for each compound obtained by MLR and by neural network methods. Prediction statistics seem to improve with an increase in the size of the network, but the increase is not monotonic and has to be interpreted with caution. A notable disagreement between observed and calculated values occurs in the case of compound 27, and this is seen in both MLR and neural net models. While no specific rationalization for the discrepancy is offered here, this study points to the difficulties posed by outliers in neural network modeling. Strong correlations (up to r = 0.95) have been reported in earlier QSAR work [5] for this data set. However, in contrast to the present work, no cross-validation study was performed earlier. As expected, we obtained very strong correlations (r greater than 0.95) for each trained network when only the training compounds were used in obtaining correlation, but these were not reported as we felt that only cross-validation can justify the use of the many parameters associated with weights in the neural networks.
Topological indices are shown to possess considerable
discriminatory power and utility in classifying structure and in
predicting chemical and biological properties [27, 28]. An
examination of the components [29] obtained from principal
component analysis of a large database of chemical structures
revealed that the first four components largely represented shape
and size, symmetry, degree of branching and cyclicity,
respectively. Since the principal components are, by definition,
orthogonal to one another, structural characteristics encoded by
components five through eight, are likely to be novel and to
require attention in QSAR studies. It is obviously impossible to
correlate such attributes of molecular structure as shape and size,
symmetry, degree of branching or cyclicity, unless those attributes
are quantified; chemical graph theory and information theory
provide the necessary tools in the form of topological indices that
can be used in a QSAR framework. The extent of cross-validated
correlations seen in this study points to the role that can be
played by these indices in combination with parameters such as
and
in the
prediction of ligand binding.
![]()
Acknowledgements
We thank Mr. G. D. Grunwald of the Natural Resources Institute for his help in statistical analysis.
![]()
References
(1) Weinstein, J. N.; Kohn, K. W.; Driscoll, J. S.; Grever, M. R.; Viswanadhan, V. N.; Rubinstein, L. V.; Paull, K. D. Science, 258, 1992, 447.
(2) Mahmoudian, M.; Richards, G. W. J. Pharm. Pharmacol., 38, 1986, 372.
(3) Norrington, F. E.; Hyde, R. M.; Williams, S. G.; Wooton, R. J. Med. Chem., 18, 1975, 604.
(4) Rovnyak, G.; Anderson, N.; Gougoutas, J.; Hedberg, A.; Kimball, S. D.; Malley, M.; Moreland, S.; Porubcan, M.; Pudzianowski, A. J. Med. Chem., 34, 1991, 2521.
(5) Coburn, R. A.; Weirzba, M.; Suto, M. J.; Solo, A. J.; Triggle, A. M.; Triggle, D. J. J. Med. Chem., 31, 1988, 2103.
(6) Belvisi, L.; Brossa, S.; Salimbeni, A.; Scolastico, C.; Todeschini, R. J. Comput. Aided Mol. Design, 5, 1991, 571.
(7) Basak, S. C.; Magnuson, V. R.; Niemi, G. J.; Regal, R. R.; Veith, G. D. Math. Modeling, 8, 1987, 300.
(8) Basak, S. C.; Magnuson, V. R.; Niemi, G. J.; Regal, R. R. Discrete Applied Math., 19, 1988, 17.
(9) Basak, S. C.; Hariss, D. K.; Magnuson, V. R. POLLY, University of Minnesota, Duluth, MN, 1988.
(10) Basak, S. C.; Niemi, G. J.; Veith, G. D. J. Math. Chemistry, 4, 1990, 185.
(11) Hansch, C. Acc. Chem. Res., 2, 1969, 232.
(12) Cramer, R. D.; Patterson, D. E.; Bunce, J. D. J. Am. Chem. Soc., 110, 1988, 5959.
(13) Geladi, P.; Kowalski, B. R. Analytica Chimica Acta, 185, 1986, 1.
(14) Khanna, T. Foundations of Neural Networks; Addison-Wesley: New York, 1991.
(15) Dayhoff, J. Neural Network Architectures; van Nostrand Reinhold: New York, 1990.
(16) Hirst, J. D.; King, R. D.; Sternberg, M. J. E. J. of Comput.- Aided Mol. Design, 8, 1994, 405.
(17) Andrea, T.; Kalayeh, H. J. Med. Chem., 34, 1991, 2824.
(18) Zupan, J.; Gasteiger, J. Anal. Chim. Acta, 248, 1991, 1.
(19) Lohninger, H. J. Chem. Inf. Comput. Sci., 33, 1993, 736.
(20) Hansch, C.; Leo, A. Substituent constants for Correlation Analysis in Chemistry and Biology; Wiley: New York, 1979.
(21) Hammett, L. P. Physical Organic Chemistry; McGraw Hill: New York, 1970.
(22) Basak, S. C.; Grunwald, G. D. J. Chem. Inf. Comput. Sci., 35, 1995, 366.
(23) Kshirsagar, A. M. Multivariate Analysis; Marcell Dekker, Inc.: New York, 1972.
(24) Greenacre, M. J. Theory and Application of Correspondence Analysis; Academic Press: New York, 1984.
(25) Neural Works Professional II / PLUS 386/387, NeuralWare, Inc, Pittsburgh, PA, 1991.
(26) Rumelhart, D. E.; Hinton, G. E.; Williams, R. J. Nature, 323, 1986, 533.
(27) Basak, S. C.; Niemi, G. J.; Veith, G. D. J. Math. Chem., 4, 1990, 185.
(28) Basak, S. C.; Bertelsen, S.; Grunwald, G. D. Toxicology Letters, 79, 1995, 239.
(29) Basak, S. C.; Niemi, G. J.; Veith, G. D. J. Math. Chem., 7, 1991, 243.
![]()
![]()
Figure 1

The skeletal 1,4-dihydropyridine structure with
substitutions/changes indicated by X on the phenyl ring. These
substitutions are listed in Table I.
![]()
Table 1
Results of MLR and neural network prediction models for compounds identified by the substituent ('X' in Fig. 1). Log(1/EC50) is the 50% effective concentration for blocking Ca2+ channel. The predicted values were obtained from multiple linear regression (MLR) and neural network models with 2, 3, 4, 8 or 10 processing elements in the hidden layer of the back-propagation network. The last column indicates the test set in which each compound fell. Please note: If your browser does not support the HTML TABLE function, CLICK HERE
| Cmpd # | Substituent X | Exp. Log(1/EC50) | Predicted by MLR | Predicted by Neural Net | Test Set | ||||
|---|---|---|---|---|---|---|---|---|---|
| 2 PE's | 3 PE's | 4 PE's | 8 PE's | 10 PE's | |||||
| 1 | 3-Br | 8.89 | 6.80 | 6.86 | 6.85 | 6.72 | 7.08 | 7.14 | 3 |
| 2 | 2-CF3 | 8.82 | 6.96 | 7.21 | 6.88 | 7.16 | 7.13 | 7.13 | 1 |
| 3 | 2-Cl | 8.66 | 8.74 | 8.40 | 8.37 | 8.29 | 8.32 | 8.39 | 2 |
| 4 | 3-NO2 | 8.40 | 7.14 | 7.39 | 7.51 | 7.46 | 7.66 | 7.60 | 2 |
| 5 | 2-CH=CH2 | 8.35 | 8.48 | 8.05 | 7.91 | 8.00 | 7.62 | 7.65 | 3 |
| 6 | 2-NO2 | 8.29 | 7.71 | 7.97 | 8.60 | 8.24 | 8.71 | 7.81 | 4 |
| 7 | 2-Me | 8.22 | 7.79 | 7.88 | 7.59 | 7.50 | 7.26 | 7.18 | 5 |
| 8 | 2-Et | 8.19 | 7.79 | 7.59 | 7.59 | 7.69 | 7.92 | 7.86 | 1 |
| 9 | 2-Br | 8.12 | 8.59 | 8.57 | 8.52 | 8.42 | 8.45 | 8.51 | 2 |
| 10 | 2-CN | 7.80 | 7.79 | 8.46 | 8.23 | 8.00 | 8.36 | 8.34 | 3 |
| 11 | 3-Cl | 7.80 | 7.80 | 7.62 | 8.10 | 7.73 | 8.28 | 8.24 | 1 |
| 12 | 3-F | 7.68 | 7.44 | 7.44 | 7.09 | 7.54 | 7.10 | 7.69 | 4 |
| 13 | H* | 7.55 | ****** | ****** | ****** | ****** | ****** | ****** | * |
| 14 | 3-CN | 7.46 | 7.65 | 7.62 | 7.58 | 7.76 | 7.54 | 7.61 | 5 |
| 15 | 3-I | 7.38 | 6.38 | 6.71 | 6.60 | 6.38 | 6.71 | 6.81 | 3 |
| 16 | 2-F | 7.37 | 8.15 | 8.43 | 7.89 | 8.51 | 7.90 | 8.29 | 4 |
| 17 | 2-I | 7.33 | 7.45 | 7.59 | 7.34 | 7.92 | 8.01 | 8.01 | 3 |
| 18 | 2-OMe | 7.24 | 7.01 | 6.79 | 6.71 | 7.34 | 6.60 | 6.72 | 5 |
| 19 | 3-CF3 | 7.13 | 8.39 | 8.45 | 8.62 | 8.48 | 8.57 | 8.59 | 2 |
| 20 | 3-Me | 6.96 | 7.52 | 7.43 | 7.58 | 7.55 | 7.56 | 7.54 | 1 |
| 21 | 2-OEt | 6.96 | 7.33 | 7.34 | 7.16 | 7.39 | 7.41 | 7.40 | 1 |
| 22 | 3-OMe | 6.72 | 6.44 | 6.46 | 6.18 | 6.27 | 5.86 | 6.57 | 4 |
| 23 | 3-NMe2 | 6.05 | 4.84 | 5.31 | 4.55 | 5.38 | 4.33 | 5.11 | - |
| 24 | 3-OH | 6.00 | 6.77 | 6.49 | 6.62 | 6.50 | 6.23 | 6.14 | 3 |
| 25 | 3-NH2 | 5.70 | 5.35 | 5.85 | 5.14 | 5.59 | 4.98 | 4.77 | 2 |
| 26 | 3-OAc | 5.22 | 6.26 | 6.52 | 6.71 | 6.05 | 6.29 | 6.43 | 1 |
| 27 | 3-O-COPh | 5.20 | 10.35 | 7.49 | 8.99 | 8.83 | 7.23 | 8.80 | 4 |
| 28 | 2-NH2 | 4.40 | 6.60 | 6.29 | 5.89 | 6.16 | 5.45 | 5.25 | 2 |
| 29 | 3-NMe3 | 4.30 | 6.06 | 4.54 | 4.61 | 4.26 | 4.65 | 4.31 | 5 |
| 30 | 4-F | 6.89 | 6.17 | 5.54 | 4.96 | 5.16 | 6.18 | 5.16 | 4 |
| 31 | 4-Br | 5.40 | 5.80 | 5.31 | 5.37 | 5.66 | 4.93 | 5.68 | 5 |
| 32 | 4-I | 4.64 | 5.95 | 5.41 | 5.70 | 5.60 | 5.66 | 5.67 | 1 |
| 33 | 4-NO2 | 5.50 | 5.63 | 6.03 | 6.19 | 6.14 | 6.34 | 6.24 | 2 |
| 34 | 4-NMe2 | 4.00 | 4.51 | 3.43 | 3.28 | 3.44 | 2.94 | 2.96 | 3 |
| 35 | 4-CN | 5.46 | 5.92 | 4.97 | 5.42 | 5.18 | 5.90 | 4.16 | 4 |
| 36 | 4-Cl | 5.09 | 6.31 | 6.74 | 6.40 | 6.60 | 5.80 | 6.76 | 5 |
| 37 | 2,6-(Cl)2 | 8.72 | 6.68 | 5.09 | 5.70 | 7.43 | 6.05 | 6.98 | 5 |
| 38 | (F)5 | 8.36 | 7.29 | 9.40 | 9.46 | 9.11 | 8.53 | 9.06 | 4 |
| 39 | 2-F,6-Cl | 8.12 | 8.37 | 7.83 | 7.51 | 7.64 | 6.97 | 7.03 | 3 |
| 40 | 2,3-(Cl)2 | 7.72 | 9.74 | 8.50 | 8.49 | 8.40 | 8.60 | 8.57 | 2 |
| 41 | 2-Cl,5-NO2 | 7.52 | 7.51 | 7.72 | 7.07 | 7.72 | 6.86 | 6.93 | 1 |
| 42 | 3,5-(Cl)2 | 7.03 | 5.95 | 4.79 | 6.01 | 6.10 | 6.49 | 4.32 | 5 |
| 43 | 2-OH,5-NO2 | 7.00 | 6.03 | 7.24 | 7.86 | 7.16 | 8.18 | 6.52 | 4 |
| 44 | 2,5-Me2 | 7.00 | 6.97 | 7.05 | 6.92 | 6.80 | 6.67 | 6.81 | 3 |
| 45 | 2,4-(Cl)2 | 6.40 | 8.59 | 8.40 | 8.41 | 8.37 | 8.42 | 8.41 | 2 |
| 46 | 2,4,5-(OMe)3 | 3.00 | 4.68 | 5.22 | 5.45 | 4.47 | 4.53 | 4.45 | 1 |
![]()
Table 2
Results of neural net and mutiple regression analyses on 46 calcium channel antagonists. For each model, the corresponding Pearson correlation, rms error and two-tailed p-value are indicated. Please note: If your browser does not support the HTML TABLE function, CLICK HERE
| Model and Description | Correlation Coefficient | rms error | p |
|---|---|---|---|
| Neural net, 2 P.E.'s in Hidden Layer | 0.653 | 1.150 | < 0.0001 |
| Neural net, 3 P.E's in Hidden Layer | 0.611 | 1.234 | < 0.0001 |
| Neural net, 4 P.E's in Hidden Layer | 0.715 | 1.046 | < 0.0001 |
| Neural net, 8 P.E's in Hidden Layer | 0.733 | 1.019 | < 0.0001 |
| Neural net, 10 P.E's in Hidden Layer | 0.676 | 1.159 | < 0.0001 |
| Multiple Linear Regression | 0.540 | 1.297 | < 0.0001 |
![]()
Table 1
Results of MLR and neural network prediction models for compounds identified by the substituent ('X' in Fig. 1). Log(1/EC50) is the 50% effective concentration for blocking Ca2+; channel. The predicted values were obtained from multiple linear regression (MLR) and neural network models with 2, 3, 4, 8 or 10 processing elements in the hidden layer of the back-propagation network. The last column indicates the test set in which each compound fell.
CPD # Substit Expt. Pred Predicted by neural networks Tst Set
X Log(1/EC50) by MLR 2 PE's 3 PE's 4 PE's 8 PE's 10 PE's
1 3-Br 8.89 6.80 6.86 6.85 6.72 7.08 7.14 3
2 2-CF3 8.82 6.96 7.21 6.88 7.16 7.13 7.13 1
3 2-Cl 8.66 8.74 8.40 8.37 8.29 8.32 8.39 2
4 3-NO2 8.40 7.14 7.39 7.51 7.46 7.66 7.60 2
5 2-CH=CH2 8.35 8.48 8.05 7.91 8.00 7.62 7.65 3
6 2-NO2 8.29 7.71 7.97 8.60 8.24 8.71 7.81 4
7 2-CH3 8.22 7.79 7.88 7.59 7.50 7.26 7.18 5
8 2-Et 8.19 7.79 7.59 7.59 7.69 7.92 7.86 1
9 2-Br 8.12 8.59 8.57 8.52 8.42 8.45 8.51 2
10 2-CN 7.80 7.79 8.46 8.23 8.00 8.36 8.34 3
11 3-Cl 7.80 7.80 7.62 8.10 7.73 8.28 8.24 1
12 3-F 7.68 7.44 7.44 7.09 7.54 7.10 7.69 4
13 H* 7.55 ****** ****** ****** ***** ****** ****** *
14 3-CN 7.46 7.65 7.62 7.58 7.76 7.54 7.61 5
15 3-I 7.38 6.38 6.71 6.60 6.38 6.71 6.81 3
16 2-F 7.37 8.15 8.43 7.89 8.51 7.90 8.29 4
17 2-I 7.33 7.45 7.59 7.34 7.92 8.01 8.01 3
18 2-OCH3 7.24 7.01 6.79 6.71 7.34 6.60 6.72 5
19 3-CF3 7.13 8.39 8.45 8.62 8.48 8.57 8.59 2
20 3-CH3 6.96 7.52 7.43 7.58 7.55 7.56 7.54 1
21 2-OEt 6.96 7.33 7.34 7.16 7.39 7.41 7.40 1
22 3-OCH3 6.72 6.44 6.46 6.18 6.27 5.86 6.57 4
23 3-NMe2 6.05 4.84 5.31 4.55 5.38 4.33 5.11
24 3-OH 6.00 6.77 6.49 6.62 6.50 6.23 6.14 3
25 3-NH2 5.70 5.35 5.85 5.14 5.59 4.98 4.77 2
26 3-OAc 5.22 6.26 6.52 6.71 6.05 6.29 6.43 1
27 3-O-COPh 5.20 10.35 7.49 8.99 8.83 7.23 8.80 4
28 2-NH2 4.40 6.60 6.29 5.89 6.16 5.45 5.25 2
29 3-NMe3 4.30 6.06 4.54 4.61 4.26 4.65 4.31 5
30 4-F 6.89 6.17 5.54 4.96 5.16 6.18 5.16 4
31 4-Br 5.40 5.80 5.31 5.37 5.66 4.93 5.68 5
32 4-I 4.64 5.95 5.41 5.70 5.60 5.66 5.67 1
33 4-NO2 5.50 5.63 6.03 6.19 6.14 6.34 6.24 2
34 4-NMe2 4.00 4.51 3.43 3.28 3.44 2.94 2.96 3
35 4-CN 5.46 5.92 4.97 5.42 5.18 5.90 4.16 4
36 4-Cl 5.09 6.31 6.74 6.40 6.60 5.80 6.76 5
37 2,6-(Cl)2 8.72 6.68 5.09 5.70 7.43 6.05 6.98 5
38 F(5) 8.36 7.29 9.40 9.46 9.11 8.53 9.06 4
39 2-F,6-Cl 8.12 8.37 7.83 7.51 7.64 6.97 7.03 3
40 2,3-(Cl)2 7.72 9.74 8.50 8.49 8.40 8.60 8.57 2
41 2-Cl,5-NO2 7.52 7.51 7.72 7.07 7.72 6.86 6.93 1
42 3,5-(Cl)2 7.03 5.95 4.79 6.01 6.10 6.49 4.32 5
43 2-OH,5-NO2 7.00 6.03 7.24 7.86 7.16 8.18 6.52 4
44 2,5-Me2 7.00 6.97 7.05 6.92 6.80 6.67 6.81 3
45 2,4-Cl2 6.40 8.59 8.40 8.41 8.37 8.42 8.41 2
46 2,4,5-(OMe)3 3.00 4.68 5.22 5.45 4.47 4.53 4.45 1
![]()
Table 2
Results of neural net and mutiple regression analyses on 46 calcium channel antagonists. For each model, the corresponding Pearson correlation, rms error and two-tailed p-value are indicated.
Model and Description | Correlation | RMS |
| Coefficient | Error | p
| | |
Neural net, 2 P.E's in Hidden Layer | 0.653 | 1.150 | < 0.0001
Neural net, 3 P.E's in Hidden Layer | 0.611 | 1.234 | < 0.0001
Neural net, 4 P.E's in Hidden Layer | 0.715 | 1.046 | < 0.0001
Neural net, 8 P.E's in Hidden Layer | 0.733 | 1.019 | < 0.0001
Neural net, 10 P.E's in Hidden Layer | 0.676 | 1.159 | < 0.0001
Multiple Linear Regression | 0.540 | 1.297 | < 0.0001
NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:
- Network Science Corporation
- 4411 Connecticut Avenue NW, STE 514
- Washington, DC 20008
- Tel: (828) 817-9811
- E-mail: TheEditors@netsci.org
- Website Hosted by Total Choice