A New QSAR Algorithm Combining Principal Component
Analysis with a Neural Network: Application to
Calcium Channel Antagonists

Vellarkad N. Viswanadhan*[1], Geoffrey A. Mueller[1],
Subhash C. Basak[2] and John N. Weinstein*[1]

[1] Laboratory of Molecular Pharmacology,
National Cancer Institute,
Developmental Therapeutics Program,
National Institutes of Health,
Bethesda, MD 20892

[2] Center for Water and Environment,
Natural Resources Research Institute,
University of Minnesota,
5013 Miller Trunk Highway,
Duluth, MN 55811

[1] Present address: Gensia, Inc., 9390 Towne Center Drive, San Diego, CA 92121
* Correspondence may be addressed to either author.



http://www.netsci.org/Science/Compchem/feature07.html

Summary

In this paper, a new QSAR algorithm has been developed and applied to a set of calcium channel blockers which are of special interest because of their role in cardiac disease and also because many of them interact with p170 (MDR-1), a membrane protein associated with multi-drug resistance to anti-cancer agents. A set of 46 calcium (Ca2+) channel blockers of the 1,4-dihydropyridine type with known calcium channel binding affinities were employed for the present analysis. The present QSAR algorithm can be summarized as follows:

  1. A set of 90 graph theoretic and information theoretic descriptors representing various structural/topological characteristics of these molecules were calculated;

  2. Principal component analysis (PCA) was used to compress these 90 into the 8 best orthogonal composite descriptors. These 8 sufficed to explain 96% of the variance in the original descriptor set;

  3. Two important empirical descriptors, the Leo-Hansch lipophilic constant and the Hammet electronic parameter were added to the list of 8;

  4. The 10 resulting descriptors were used as inputs to a back-propagation neural network whose output was the predicted binding affinity;

  5. The predictive ability of the network was assessed by cross-validation. When experimental and predicted binding data were compared, three-layer neural networks with 4 - 8 hidden layer processing elements yielded higher correlations than did standard multiple linear regression.

The present approach may prove useful when (i) the initial number of potentially important QSAR descriptors is large, and (ii) the descriptors (and their principal components) are non-linearly related predictors of a structural or functional feature, for example, a binding constant.

Introduction

We recently used neural networks to analyze the relationship between function and mechanism of action in the large, diverse set of drug molecules tested in the National Cancer Institute's cancer drug screening program [1]. The next logical step was to integrate structure-function and structure-activity relationships into the analysis as well. As a start in that direction, we have begun with a test case based on a set of homologous 1,4-dihydropyridine calcium (Ca2+) channel blockers. These Ca2+ channel blockers have previously been the subject of several SAR, QSAR, and 3D-QSAR studies [2-6]. The data set for the present analysis included 2,6-dimethyl-3,5-dicarbomethoxy-4-phenyl-1,4-dihydropyridine and 45 ortho-, meta-, and/or para- substituted derivatives. An earlier QSAR analysis of this set by Coburn and coworkers [5] used two well-known empirical descriptors, the Leo-Hansch lipophilicity term ( pi ) and the Hammet electronic parameter ( sigma ), as well as three other sterimol parameters. Since our aim was to build toward the analysis of a bigger and more diverse set of compounds, we employed a much larger set of molecular properties, which included a set of 90 topological indices from graph theoretic and information theoretic analyses [7-10].

The most familiar approaches to QSAR [11] are based on multiple linear regression (MLR) and partial least squares (PLS) [12, 13]. As the names imply, these approaches can capture only linear relationships between molecular characteristics and functional features to be predicted. In contrast, neural networks [14, 15] are capable of recognizing highly non-linear relationships; hence, they provide an interesting new approach to QSAR [see e.g., 16,17] and QSPR analysis [18,19]. To reduce the number of input variables, we first subjected the matrix of theoretical descriptors to principal component analysis (PCA), thereby reducing the number of variables, before adding the empirical descriptors and using the combination as inputs to the network. PCA, it should be noted, uses information only from the internal variations in the input matrix and does not take any account of the measured binding constants, unlike a PLS approach. In the present work, we describe the PCA and neural net analyses as applied to a set of Ca2+ channel blockers and show that this approach can be used to obtain reasonable predictions of binding data.

METHODS

Overview of the QSAR algorithm

The essential steps of the present QSAR algorithm can be summarized as follows:

(1) Theoretical descriptors for each molecule in the database are calculated using the computer program POLLY [9]. In the present version of the algorithm, we use 90 topological descriptors based on information-theoretic and graph theoretic analysis [7-10].

(2) Principal component analysis is then used to assess the intrinsic dimensionality of the problem and extract the components (linear combinations of the descriptors) that explain most of the variance in the original data.

(3) Properties dependent on conformation, stereochemical configuration, and charge distribution (such as lipophilicity, dipole moment and hydrophobic moment) are added (as appropriate) to the descriptor list from step (2). In the present version, we use two well-known substituent constants, the Leo-Hansch lipophilic term (pi ) [20] and the Hammet electronic parameter (sigma ) [21].

(4) A feed-forward, back-error propagating neural network is constructed to model the structure-activity relationships. The input vector is the set of descriptors for each molecule in the series, as generated by the previous steps. The network is configured with at least one hidden layer of processing elements. The presence of a hidden layer makes it possible to classify categories (or outputs) that are not linearly separable. The network is trained to reproduce the binding affinities by repetitive presentation of the set of input vectors, randomized each time.

(5) Cross-validation is performed by dividing the input dataset into several distinct training and test subsets such that each training set covers all of the substituent positions and represents the structural diversity in the original dataset.  

Fig. 1 shows the skeletal 2,6-dimethyl-3,5-dicarbomethoxy-4-phenyl-1,4-dihydropyridine structure. This compound and the forty five derivatives listed in   Table 1 constituted the database for analysis. Table I also lists values of log(1/EC50), a pharmacological measure of the effect of a calcium channel antagonist on the tonic contractile response of longitudinal muscle strips from guinea pig ileum. These data and two empirical parameters for each compound in the dataset, the lipophilic substituent constant (pi ) and the Hammet electronic constant (sigma ), are from an earlier publication [5]. In addition, ninety theoretical descriptors based on information-theoretic and graph theoretic analyses were considered for each compound, as described below.

Graph Theoretic and Information Theoretic Indices

The graph theoretic and information theoretic indices used in our analysis are derived from the adjacency matrix and distance matrix of a chemical graph [7-10]. These indices are well-documented in a number of publications [22]. The Weiner index, the information-theoretic index on graph distance, the mean information index and path length parameters of order h ( P(h) ) are derived from the distance matrix (topological) of the hydrogen-suppressed chemical graph. D(G) is a symmetric n x n matrix, where n is the number of non-hydrogen atoms in the molecule. Each element of D(G) is the topological distance (smallest number of intervening bonds in the graph, G) between two vertices. P(h) are the number of paths of length h in the graph. In this study we included indices covering paths of order h=0,1....,6. To calculate the information indices, a set A of n elements is derived from the molecular graph G, depending upon specific structural characteristics. This set is then partitioned into disjoint subsets A(i) of order n(i) (i=1,2,...h: Sum n(i) = n), based on an equivalence relation defined on A. Two atoms were considered equivalent if they possessed an identical r-th order topological neighborhood. A probability distribution is then assigned to the equivalence classes which gives the probability that a randomly selected element of A (A[i]) will occur in the i-th subset. Information indices used here included those derived from the distance matrices, measures of graph complexity, structural information measures and complementarity information measures.

Principal Component Analysis

The descriptor data for the set of molecules defines a 90-dimensional parameter space in which each compound corresponds to a point. The method of principal component analysis (PCA or Karhunen-Loeve transformation) [23] reduces the dimensionality by embedding all of the points (compounds) in a sub-space dimensionally less than 90 such that a desired degree of variance in the original data is captured in the subspace. Each principal component is a linear combination of the original variables, with coefficients given by the eigen vectors of the covariance matrix. The first principal component (PC) is the axis that minimizes the sum of squared Euclidean distances from the points to that axis. The second PC is given by projections onto a basis vector orthogonal to the first PC; the third PC is orthogonal to the first two PC's and so on. For the data in this investigation, it was most appropriate to transform the indices transformed by taking the log of the index plus one and then standardizing to a mean of zero and variance one. For reducing the dimensionality, we have chosen to retain the PCs with eigenvalues greater than one [24].

Neural Network Modeling

For this study we developed a set of feed-forward, back-propagation-of-error networks using the NeuralWare Professional II software package [25]. The training procedure used was similar to that of Rumelhart et al [26]. Network weights (W[ji](s) ) for a processing element (P.E.) 'j' receiving output from the P.E. 'i' in the layer 's' were initially assigned random values between -0.1 and +0.1. We chose the hyperbolic tangent as the transfer function that generates the output of a neuron from the weighted sum of inputs from the preceding P.E.'s. Consecutive layers were fully interconnected; there were no connections within a layer. As usual with back-propagation networks, a bias unit was employed as part of every layer except the output layer. This unit has a constant activation of unity and is connected to all units in the next higher layer. During one "epoch," all compounds in the training set were presented, and weights in the network were then adjusted on the basis of the discrepancy between network output for log(affinity) and the experimental value. The training set was presented in a different random order during each epoch to avoid cyclical behavior and the local minima common to most optimization algorithms. Weight adjustment proceeded on the basis of a gradient descent in the root mean square error between output values (predicted binding affinities) and target values (experimental affinities). Convergence properties of the network were enhanced by using a "momentum" term proportional to the previous weight change [15]. Input to the network consisted of (i) the first eight principal components derived from principal component analysis of ninety topological variables as described in the previous section; and (ii) two other variables, the lipophilic substituent constant (pi ) and the Hammet electronic parameter (sigma ), which appear to be independently important.

Cross-validation

For this purpose, five different pairs of training and test sets were constructed from the data of Table I. The union of each training set and corresponding test set equaled the original database and each of the molecules was present once in one of the test sets (except for the unsubstituted original compound (entry 13 in Table I) which was present in all training sets). At least one entry corresponding to each substituent position was placed in each training set. The last column of Table I identifies the test set to which each compound belonged. Cross-validation methods have been used previously in QSAR work [12], but the procedure usually involved a random deletion of rows (molecules) from the QSAR table. In contrast, the present procedure is designed to offer similar information to all training networks and is therefore statistically more appropriate. The same sets were used for cross-validation of both MLR and neural net procedures.

RESULTS AND DISCUSSION

The compounds represented in this study share the same molecular skeleton, 2,6-dimethyl-3,5-dicarbomethoxy-4-phenyl-1,4-dihydropyridine. Earlier QSAR analysis [5] clearly indicated the need for inclusion of lipophilic, electronic and steric terms in the QSAR equation. Specifically, the Hammet constant for the substituent at the meta position has been shown to be important. In the present study, we have included this and the lipophilic substituent constant explicitly to model electrostatic and hydrophobic interactions; these aspects are unlikely to be fully represented in the information-theoretic and graph-theoretic descriptors. Though these empirical descriptors have the limitation that conformation-dependence is not accounted for, they have the advantage of simplicity and can easily be adapted for treating large datasets. Steric effects, however, are encoded by topological indices which represent size and shape. Hence our QSAR model does not contain any specific steric terms.

Principal component analysis of the ninety topological indices for the present set of molecules showed that the first eight components explained 96% of the variance in the original data, indicating that almost all of the information in the original ninety-variable set was represented in these eight components. We, therefore, used these eight components as part of the input to the neural nets. Neural nets were constructed using NeuralWare Professional II on a desktop personal computer.  Details of the network configurations will be described later. Table II lists the final results of analysis; it represents 'test' correlations obtained after 100,000 epochs of training of each of the five training networks. It shows that good correlations are obtained with a neural network model that uses 8 processing elements in the hidden layer. Increasing the number of PE's beyond 8 did not improve the correlations further. Table II indicates that neural net models perform better than the multiple regression method. In the literature, one finds examples of network performance being relatively insensitive to an increase in the number of processing elements in the hidden layer. The study by Andrea and Kalayeh [17], for example, shows that the reduction in test-set variance between 3 and 8 hidden PE's is small, though the minimum in the variance is reached at a size of 8 PE's for the hidden layer. The present work indicates that an appropriate configuration for the network is dependent on the problem at hand.

In Table I, entries in the third through ninth columns correspond to the observed and predicted (i.e., cross-validated) values for each compound obtained by MLR and by neural network methods. Prediction statistics seem to improve with an increase in the size of the network, but the increase is not monotonic and has to be interpreted with caution. A notable disagreement between observed and calculated values occurs in the case of compound 27, and this is seen in both MLR and neural net models. While no specific rationalization for the discrepancy is offered here, this study points to the difficulties posed by outliers in neural network modeling. Strong correlations (up to r = 0.95) have been reported in earlier QSAR work [5] for this data set. However, in contrast to the present work, no cross-validation study was performed earlier. As expected, we obtained very strong correlations (r greater than 0.95) for each trained network when only the training compounds were used in obtaining correlation, but these were not reported as we felt that only cross-validation can justify the use of the many parameters associated with weights in the neural networks.

Topological indices are shown to possess considerable discriminatory power and utility in classifying structure and in predicting chemical and biological properties [27, 28]. An examination of the components [29] obtained from principal component analysis of a large database of chemical structures revealed that the first four components largely represented shape and size, symmetry, degree of branching and cyclicity, respectively. Since the principal components are, by definition, orthogonal to one another, structural characteristics encoded by components five through eight, are likely to be novel and to require attention in QSAR studies. It is obviously impossible to correlate such attributes of molecular structure as shape and size, symmetry, degree of branching or cyclicity, unless those attributes are quantified; chemical graph theory and information theory provide the necessary tools in the form of topological indices that can be used in a QSAR framework. The extent of cross-validated correlations seen in this study points to the role that can be played by these indices in combination with parameters such as sigma and pi in the prediction of ligand binding.

Acknowledgements

We thank Mr. G. D. Grunwald of the Natural Resources Institute for his help in statistical analysis.

References

(1) Weinstein, J. N.; Kohn, K. W.; Driscoll, J. S.; Grever, M. R.; Viswanadhan, V. N.; Rubinstein, L. V.; Paull, K. D. Science, 258, 1992, 447.

(2) Mahmoudian, M.; Richards, G. W. J. Pharm. Pharmacol., 38, 1986, 372.

(3) Norrington, F. E.; Hyde, R. M.; Williams, S. G.; Wooton, R. J. Med. Chem., 18, 1975, 604.

(4) Rovnyak, G.; Anderson, N.; Gougoutas, J.; Hedberg, A.; Kimball, S. D.; Malley, M.; Moreland, S.; Porubcan, M.; Pudzianowski, A. J. Med. Chem., 34, 1991, 2521.

(5) Coburn, R. A.; Weirzba, M.; Suto, M. J.; Solo, A. J.; Triggle, A. M.; Triggle, D. J. J. Med. Chem., 31, 1988, 2103.

(6) Belvisi, L.; Brossa, S.; Salimbeni, A.; Scolastico, C.; Todeschini, R. J. Comput. Aided Mol. Design, 5, 1991, 571.

(7) Basak, S. C.; Magnuson, V. R.; Niemi, G. J.; Regal, R. R.; Veith, G. D. Math. Modeling, 8, 1987, 300.

(8) Basak, S. C.; Magnuson, V. R.; Niemi, G. J.; Regal, R. R. Discrete Applied Math., 19, 1988, 17.

(9) Basak, S. C.; Hariss, D. K.; Magnuson, V. R. POLLY, University of Minnesota, Duluth, MN, 1988.

(10) Basak, S. C.; Niemi, G. J.; Veith, G. D. J. Math. Chemistry, 4, 1990, 185.

(11) Hansch, C. Acc. Chem. Res., 2, 1969, 232.

(12) Cramer, R. D.; Patterson, D. E.; Bunce, J. D. J. Am. Chem. Soc., 110, 1988, 5959.

(13) Geladi, P.; Kowalski, B. R. Analytica Chimica Acta, 185, 1986, 1.

(14) Khanna, T. Foundations of Neural Networks; Addison-Wesley: New York, 1991.

(15) Dayhoff, J. Neural Network Architectures; van Nostrand Reinhold: New York, 1990.

(16) Hirst, J. D.; King, R. D.; Sternberg, M. J. E. J. of Comput.- Aided Mol. Design, 8, 1994, 405.

(17) Andrea, T.; Kalayeh, H. J. Med. Chem., 34, 1991, 2824.

(18) Zupan, J.; Gasteiger, J. Anal. Chim. Acta, 248, 1991, 1.

(19) Lohninger, H. J. Chem. Inf. Comput. Sci., 33, 1993, 736.

(20) Hansch, C.; Leo, A. Substituent constants for Correlation Analysis in Chemistry and Biology; Wiley: New York, 1979.

(21) Hammett, L. P. Physical Organic Chemistry; McGraw Hill: New York, 1970.

(22) Basak, S. C.; Grunwald, G. D. J. Chem. Inf. Comput. Sci., 35, 1995, 366.

(23) Kshirsagar, A. M. Multivariate Analysis; Marcell Dekker, Inc.: New York, 1972.

(24) Greenacre, M. J. Theory and Application of Correspondence Analysis; Academic Press: New York, 1984.

(25) Neural Works Professional II / PLUS 386/387, NeuralWare, Inc, Pittsburgh, PA, 1991.

(26) Rumelhart, D. E.; Hinton, G. E.; Williams, R. J. Nature, 323, 1986, 533.

(27) Basak, S. C.; Niemi, G. J.; Veith, G. D. J. Math. Chem., 4, 1990, 185.

(28) Basak, S. C.; Bertelsen, S.; Grunwald, G. D. Toxicology Letters, 79, 1995, 239.

(29) Basak, S. C.; Niemi, G. J.; Veith, G. D. J. Math. Chem., 7, 1991, 243.

 

Figure 1



The skeletal 1,4-dihydropyridine structure with substitutions/changes indicated by X on the phenyl ring. These substitutions are listed in Table I.



Back to the article  

Table 1

Results of MLR and neural network prediction models for compounds identified by the substituent ('X' in Fig. 1). Log(1/EC50) is the 50% effective concentration for blocking Ca2+ channel. The predicted values were obtained from multiple linear regression (MLR) and neural network models with 2, 3, 4, 8 or 10 processing elements in the hidden layer of the back-propagation network. The last column indicates the test set in which each compound fell. Please note: If your browser does not support the HTML TABLE function, CLICK HERE

Cmpd # Substituent X Exp. Log(1/EC50) Predicted by MLR Predicted by Neural Net Test Set
2 PE's 3 PE's 4 PE's 8 PE's 10 PE's
1 3-Br 8.89 6.80 6.86 6.85 6.72 7.08 7.14 3
2 2-CF3 8.82 6.96 7.21 6.88 7.16 7.13 7.13 1
3 2-Cl 8.66 8.74 8.40 8.37 8.29 8.32 8.39 2
4 3-NO2 8.40 7.14 7.39 7.51 7.46 7.66 7.60 2
5 2-CH=CH2 8.35 8.48 8.05 7.91 8.00 7.62 7.65 3
6 2-NO2 8.29 7.71 7.97 8.60 8.24 8.71 7.81 4
7 2-Me 8.22 7.79 7.88 7.59 7.50 7.26 7.18 5
8 2-Et 8.19 7.79 7.59 7.59 7.69 7.92 7.86 1
9 2-Br 8.12 8.59 8.57 8.52 8.42 8.45 8.51 2
10 2-CN 7.80 7.79 8.46 8.23 8.00 8.36 8.34 3
11 3-Cl 7.80 7.80 7.62 8.10 7.73 8.28 8.24 1
12 3-F 7.68 7.44 7.44 7.09 7.54 7.10 7.69 4
13 H* 7.55 ****** ****** ****** ****** ****** ****** *
14 3-CN 7.46 7.65 7.62 7.58 7.76 7.54 7.61 5
15 3-I 7.38 6.38 6.71 6.60 6.38 6.71 6.81 3
16 2-F 7.37 8.15 8.43 7.89 8.51 7.90 8.29 4
17 2-I 7.33 7.45 7.59 7.34 7.92 8.01 8.01 3
18 2-OMe 7.24 7.01 6.79 6.71 7.34 6.60 6.72 5
19 3-CF3 7.13 8.39 8.45 8.62 8.48 8.57 8.59 2
20 3-Me 6.96 7.52 7.43 7.58 7.55 7.56 7.54 1
21 2-OEt 6.96 7.33 7.34 7.16 7.39 7.41 7.40 1
22 3-OMe 6.72 6.44 6.46 6.18 6.27 5.86 6.57 4
23 3-NMe2 6.05 4.84 5.31 4.55 5.38 4.33 5.11 -
24 3-OH 6.00 6.77 6.49 6.62 6.50 6.23 6.14 3
25 3-NH2 5.70 5.35 5.85 5.14 5.59 4.98 4.77 2
26 3-OAc 5.22 6.26 6.52 6.71 6.05 6.29 6.43 1
27 3-O-COPh 5.20 10.35 7.49 8.99 8.83 7.23 8.80 4
28 2-NH2 4.40 6.60 6.29 5.89 6.16 5.45 5.25 2
29 3-NMe3 4.30 6.06 4.54 4.61 4.26 4.65 4.31 5
30 4-F 6.89 6.17 5.54 4.96 5.16 6.18 5.16 4
31 4-Br 5.40 5.80 5.31 5.37 5.66 4.93 5.68 5
32 4-I 4.64 5.95 5.41 5.70 5.60 5.66 5.67 1
33 4-NO2 5.50 5.63 6.03 6.19 6.14 6.34 6.24 2
34 4-NMe2 4.00 4.51 3.43 3.28 3.44 2.94 2.96 3
35 4-CN 5.46 5.92 4.97 5.42 5.18 5.90 4.16 4
36 4-Cl 5.09 6.31 6.74 6.40 6.60 5.80 6.76 5
37 2,6-(Cl)2 8.72 6.68 5.09 5.70 7.43 6.05 6.98 5
38 (F)5 8.36 7.29 9.40 9.46 9.11 8.53 9.06 4
39 2-F,6-Cl 8.12 8.37 7.83 7.51 7.64 6.97 7.03 3
40 2,3-(Cl)2 7.72 9.74 8.50 8.49 8.40 8.60 8.57 2
41 2-Cl,5-NO2 7.52 7.51 7.72 7.07 7.72 6.86 6.93 1
42 3,5-(Cl)2 7.03 5.95 4.79 6.01 6.10 6.49 4.32 5
43 2-OH,5-NO2 7.00 6.03 7.24 7.86 7.16 8.18 6.52 4
44 2,5-Me2 7.00 6.97 7.05 6.92 6.80 6.67 6.81 3
45 2,4-(Cl)2 6.40 8.59 8.40 8.41 8.37 8.42 8.41 2
46 2,4,5-(OMe)3 3.00 4.68 5.22 5.45 4.47 4.53 4.45 1




Back to the article  

Table 2

Results of neural net and mutiple regression analyses on 46 calcium channel antagonists. For each model, the corresponding Pearson correlation, rms error and two-tailed p-value are indicated. Please note: If your browser does not support the HTML TABLE function, CLICK HERE

Model and Description Correlation Coefficient rms error p
Neural net, 2 P.E.'s in Hidden Layer 0.653 1.150 < 0.0001
Neural net, 3 P.E's in Hidden Layer 0.611 1.234 < 0.0001
Neural net, 4 P.E's in Hidden Layer 0.715 1.046 < 0.0001
Neural net, 8 P.E's in Hidden Layer 0.733 1.019 < 0.0001
Neural net, 10 P.E's in Hidden Layer 0.676 1.159 < 0.0001
Multiple Linear Regression 0.540 1.297 < 0.0001




Back to the article  

Table 1

Results of MLR and neural network prediction models for compounds identified by the substituent ('X' in Fig. 1). Log(1/EC50) is the 50% effective concentration for blocking Ca2+; channel. The predicted values were obtained from multiple linear regression (MLR) and neural network models with 2, 3, 4, 8 or 10 processing elements in the hidden layer of the back-propagation network. The last column indicates the test set in which each compound fell.

CPD #  Substit    Expt.      Pred         Predicted by neural networks         Tst Set
          X     Log(1/EC50)  by MLR   2 PE's  3 PE's  4 PE's  8 PE's  10 PE's

1     3-Br        8.89        6.80    6.86    6.85    6.72    7.08     7.14       3  
2     2-CF3       8.82        6.96    7.21    6.88    7.16    7.13     7.13       1
3     2-Cl        8.66        8.74    8.40    8.37    8.29    8.32     8.39       2
4     3-NO2       8.40        7.14    7.39    7.51    7.46    7.66     7.60       2
5     2-CH=CH2    8.35        8.48    8.05    7.91    8.00    7.62     7.65       3
6     2-NO2       8.29        7.71    7.97    8.60    8.24    8.71     7.81       4
7     2-CH3       8.22        7.79    7.88    7.59    7.50    7.26     7.18       5
8     2-Et        8.19        7.79    7.59    7.59    7.69    7.92     7.86       1
9     2-Br        8.12        8.59    8.57    8.52    8.42    8.45     8.51       2
10    2-CN        7.80        7.79    8.46    8.23    8.00    8.36     8.34       3
11    3-Cl        7.80        7.80    7.62    8.10    7.73    8.28     8.24       1
12    3-F         7.68        7.44    7.44    7.09    7.54    7.10     7.69       4
13    H*          7.55       ******  ******  ******  *****   ******   ******      *
14    3-CN        7.46        7.65    7.62    7.58    7.76    7.54     7.61       5
15    3-I         7.38        6.38    6.71    6.60    6.38    6.71     6.81       3
16    2-F         7.37        8.15    8.43    7.89    8.51    7.90     8.29       4
17    2-I         7.33        7.45    7.59    7.34    7.92    8.01     8.01       3
18    2-OCH3      7.24        7.01    6.79    6.71    7.34    6.60     6.72       5
19    3-CF3       7.13        8.39    8.45    8.62    8.48    8.57     8.59       2
20    3-CH3       6.96        7.52    7.43    7.58    7.55    7.56     7.54       1
21    2-OEt       6.96        7.33    7.34    7.16    7.39    7.41     7.40       1
22    3-OCH3      6.72        6.44    6.46    6.18    6.27    5.86     6.57       4
23    3-NMe2      6.05        4.84    5.31    4.55    5.38    4.33     5.11       
24    3-OH        6.00        6.77    6.49    6.62    6.50    6.23     6.14       3
25    3-NH2       5.70        5.35    5.85    5.14    5.59    4.98     4.77       2
26    3-OAc       5.22        6.26    6.52    6.71    6.05    6.29     6.43       1
27    3-O-COPh    5.20       10.35    7.49    8.99    8.83    7.23     8.80       4
28    2-NH2       4.40        6.60    6.29    5.89    6.16    5.45     5.25       2
29    3-NMe3      4.30        6.06    4.54    4.61    4.26    4.65     4.31       5
30    4-F         6.89        6.17    5.54    4.96    5.16    6.18     5.16       4
31    4-Br        5.40        5.80    5.31    5.37    5.66    4.93     5.68       5
32    4-I         4.64        5.95    5.41    5.70    5.60    5.66     5.67       1
33    4-NO2       5.50        5.63    6.03    6.19    6.14    6.34     6.24       2
34    4-NMe2     4.00         4.51    3.43    3.28    3.44    2.94     2.96       3
35    4-CN        5.46        5.92    4.97    5.42    5.18    5.90     4.16       4
36    4-Cl        5.09        6.31    6.74    6.40    6.60    5.80     6.76       5
37    2,6-(Cl)2   8.72        6.68    5.09    5.70    7.43    6.05     6.98       5
38    F(5)        8.36        7.29    9.40    9.46    9.11    8.53     9.06       4
39    2-F,6-Cl    8.12        8.37    7.83    7.51    7.64    6.97     7.03       3
40    2,3-(Cl)2   7.72        9.74    8.50    8.49    8.40    8.60     8.57       2
41    2-Cl,5-NO2  7.52        7.51    7.72    7.07    7.72    6.86     6.93       1
42    3,5-(Cl)2   7.03        5.95    4.79    6.01    6.10    6.49     4.32       5
43    2-OH,5-NO2  7.00        6.03    7.24    7.86    7.16    8.18     6.52       4
44    2,5-Me2     7.00        6.97    7.05    6.92    6.80    6.67     6.81       3
45    2,4-Cl2     6.40        8.59    8.40    8.41    8.37    8.42     8.41       2
46  2,4,5-(OMe)3  3.00        4.68    5.22    5.45    4.47    4.53     4.45       1



Back to the article  

Table 2

Results of neural net and mutiple regression analyses on 46 calcium channel antagonists. For each model, the corresponding Pearson correlation, rms error and two-tailed p-value are indicated.



     Model and Description            | Correlation |   RMS   |         
                                      | Coefficient |  Error  |     p   
                                      |             |         |         
Neural net,  2 P.E's in Hidden Layer  |    0.653    |  1.150  | < 0.0001
Neural net,  3 P.E's in Hidden Layer  |    0.611    |  1.234  | < 0.0001
Neural net,  4 P.E's in Hidden Layer  |    0.715    |  1.046  | < 0.0001
Neural net,  8 P.E's in Hidden Layer  |    0.733    |  1.019  | < 0.0001
Neural net, 10 P.E's in Hidden Layer  |    0.676    |  1.159  | < 0.0001
Multiple Linear Regression            |    0.540    |  1.297  | < 0.0001




Back to the article



NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:

Network Science Corporation
4411 Connecticut Avenue NW, STE 514
Washington, DC 20008
Tel: (828) 817-9811
E-mail: TheEditors@netsci.org
Website Hosted by Total Choice