Computer-Assisted Toxicity Assessment: Criteria for Acceptance

Vijay K. Gombar*, Kurt Enslein, and Daniel A. Reid

Health Designs, Inc.
183 E. Main Street
Rochester, NY 14604



http://www.netsci.org/Science/Special/feature05.html


Abstract

The possibility of assessing certain molecular properties before the molecule is even synthesized is perhaps the most beneficial application of computer-aided techniques in chemical design. Since the most desirable chemical may not necessarily be the one with maximum activity but rather the one which has maximum activity-to-toxicity ratio, it is essential to be equipped with reliable computer-assisted toxicity assessment tools. Primarily, two approaches have been followed to develop such tools: the quantitative structure-toxicity relationship (QSTR) model approach and the expert system approach. This paper briefly compares the two approaches, followed by a discussion of the criteria for accepting any computer-assisted toxicity predictions. Finally, with examples, we demonstrate how these criteria are implemented in the latest release of the TOPKAT 3.0 package.

I. Introduction

The challenge of designing molecules with better activity is a thrilling one. The traditional hit-and-miss approach of synthesizing a chemical and then testing it for desired characteristics has, naturally, been, laborious and expensive. With the advent of powerful computers and easy access to them, the researchers in the field of chemical design have aspired to rationalize and expedite the process. Chemists, for instance, are beginning to employ sophisticated hardware and software tools providing capabilities of generating life-like graphics of geometry and structure of a protein binding site to design de novo favorably docking ligands before selecting the one with maximum predicted activity. Maximizing activity, however, does not completely address the problem, because the most active molecule may be worthless if it turns out to be toxic. Therefore, rapid and reliable assessment, preferably computer- assisted, of toxicity associated with any molecular structure is equally important.

Primarily, there are two fundamental approaches to prediction of chemical toxicity from molecular structure. They are:

  1. Expert System Approach, and
  2. Linear Free Energy Relationship (LFER) Approach.

This article first brings out a brief comparison of the two approaches highlighting their limitations. It then introduces TOPKAT 3.0, a commercially available software package based on the LFER approach, while giving detailed explanation of its underlying principles and procedures. Finally, some examples of assessing toxicity with TOPKAT 3.0 are presented to highlight features for validating the computed toxicity value.

I.1. Expert Systems

An expert system is basically a collection of rules derived from the existing subject knowledge and stored in computer memory. When a chemical is presented to such a system for assessment, the rules associated with the presented chemical are identified from the array of stored rules and a decision is displayed. An expert system is called a human expert system or an artificial expert system depending on whether the rules are framed by human experts or are machine-learned. The packages such as Oncologic (Woo et. al., 1995) and DEREK (Sanderson and Earnshaw, 1991) are examples of human expert systems, whereas programs like CASE (Klopman, 1984) can be classified as artificial expert systems. Some inherent problems in these systems are stated below.

I.1.a. Human Expert Systems

The rules in a human expert system may be based on statistically insignificant frequency of an observation, and thus may not be suitable for predictive purposes, especially in the absence of diagnostic procedures guarding against application to the chemicals governed by such rules. Due to the human bias that the experimental data are cast in stone, the "noise" in the experimental data are not filtered, but instead are embedded in the rules. Further, since the rules are a snapshot in time describing the then current knowledge, human expert systems eliminate the opportunity of discovering novel associations between structure and toxicity. The most serious drawback of these systems is that the rules, sometimes in thousands, relate only to "positivity" assuming all chemicals which are not governed by any of the slated rules as "negative". This generally leads to large number of false positive predictions.

I.1.b. Artificial Expert Systems

The artificial expert systems, on the other hand, carry out objective mining of the data to deduce rules. However, sometimes the rules from artificial expert systems may represent false structure-toxicity associations, especially when the rules are built in terms of mere presence or absence of certain groups of atoms. A pattern expressed in terms of binary digits, i.e., a 1 representing the presence and a 0 indicating the absence, cannot adequately encode the electronic structure information, let alone the influence of structural modification on this information. For example, a machine learnt rule (Rosenkranz and Klopman, 1990) may read: the carcinogenicity of azathioprine, 2-amino-5-nitrothiazole, and 4,4'-thiodianiline is due to the fragment =C-S-C=. Agreed, that somewhere in these three molecules the sequence of atoms =C-S- C= is present. But it does not seem possible that their carcinogenicity is due to this common group (Ashby, 1992), because the group =C-S-C= appears to have different chemistry in the three molecules; in azathioprine it is between a purine and an imidazole ring, in 2-amino-5-nitrothiazole it is a part of the thiazole ring, and in 4,4'-thiodianiline it connects two phenyl rings. We computed the E-state value (Hall et. al., 1991), a measure of electrotopology, of this group in the three molecules. The values, respectively, were: 22.06, 15.99, and 25.34 indicating a wide variation in the electronic properties of the group. Since a toxic response is a result of a molecule's ability to transport from the point of exposure across biological membranes to the site of action and to electronically interact with macromolecules, the rules constructed in terms of mere presence or absence of certain groups may lead to false associations, especially when the rules do not account for any synergetic effects of multiple groups.

I.2. LFER-based Models

In principle, an LFER-based model is a quantitative relationship between a numerical measure of toxicity and structure descriptors, i.e.,

T = f (S) ..... (1)

where T is a measure of toxicity such as acute median lethal dose (LD50), lowest observed adverse effect level (LOAEL), indicator of carcinogenicity, etc., S is a set of numerical quantities representing different structural attributes, and f is a mathematical function (Purcell et. al., 1973). The structure may be quantified at any level of complexity ranging from a mere count of certain atoms or groups to sophisticated quantum mechanical indices, and a variety of methods ranging from linear multiple regression analysis to neural networks are available to determine the explicit form of the function f. These structure-toxicity relationships are generally called quantitative structure-toxicity relationship (QSTR) models or equations, because by knowing f and providing the values of S for any chemical one could estimate its T.

Like expert systems, QSTR models are derived from experimentally measured T values of a limited set of chemicals. Therefore, all computer-assisted toxicity predictors model closed systems, and the applicability of these models is not universal. Consequently, it is of utmost importance to ascertain whether the model is applicable to the query chemical presented to it or not. Until recently, the available toxicity predictors did not provide multivariate diagnostic procedures to flag those chemical structures to which the models are not applicable. We have now developed algorithms to quantitatively define the optimum prediction space (OPS) associated with a QSTR model. The algorithms are implemented in the commercially available TOPKAT 3.0 program to provide its users with a tool to test whether a submitted structure is within the OPS of the model. Like other toxicity predictors, the QSTR models in the TOPKAT 3.0 program will always produce an estimate of toxicity, but the result of the OPS test determines whether the estimated value of toxicity is reliable or not, i.e., is the prediction supported by the data from which the model was derived?

The following sections describe what precautions are taken and what procedures are followed to develop robust and parsimonious QSTR models before they are installed in the TOPKAT 3.0 program.

II. TOPKAT 3.0 Methodology

II.1. Selection of Training Set

As mentioned above, the building of any computer-assisted toxicity prediction system begins with collection of results from bioassays. The predictive systems can predict only what the bioassay results represent. For example, imagine a scenario in which one is developing a predictor of carcinogenicity and uses results from all kinds of carcinogenicity bioassays, namely assays conducted on different rodent species, for different exposure durations and routes, with impure chemicals, and interpreted with different standards. When applied to a chemical, the system predicts the submitted chemical to be, let us say, carcinogenic. But what is the meaning of such a prediction? The chemical is carcinogenic, but to which species? to which sex of a species? after how much exposure? etc. In order to obtain a meaningful prediction it is essential that the bioassay data generated under the most uniform conditions be used. In TOPKAT 3.0, for example, there are four sex- species specific carcinogenicity predictors; one each for male rat (MR), female rat (FR), male mouse (MM), and female mouse (FM). Each model is based only on the studies conducted using the National Toxicology Program (NTP) protocol. For strict selection of uniform data, the NTP technical reports were carefully examined and any studies not conforming to the standards of purity, exposure duration, route of exposure, dose levels, etc., were not included in the training sets for developing QSTR models of carcinogenicity. For instance, the carcinogenicity assays on 25 chemicals in the FR were not conducted at the maximum tolerated dose, and thus were excluded from the training set.

II.2. Computation of Structure Quantifiers

Information-rich structure descriptors are key to parsimonious and meaningful QSTR models. In models for predicting toxicity solely from molecular structure, therefore, an effective numerical representation of molecular structure is extremely important. From the analysis of the processes leading to a toxic response, it can be rationalized that the structure descriptors should be able to quantify transport, bulk, and electronic attributes of molecular structure (Purcell et. al., 1973). A number of theoretically calculated and experimentally measured property values (Hansch and Leo, 1995) have been employed to numerically encode these structural features. However, the QSTR models in the TOPKAT 3.0 program employ only theoretically calculated structure descriptors, because the goal is to be able to assess toxicity of chemicals before they are even synthesized. Among the theoretically calculated descriptors one would have the choice of using 3D molecular geometry or 2D molecular topology as the starting structure representation. It has been demonstrated (Gombar and Enslein, 1990) that structure quantification at the topological level is much faster and yields QSTRs comparable to those obtained from the 3D descriptor. Therefore, to be able to rapidly screen large number of structures generated by combinatorial chemistry packages and high throughput screening systems for toxicity, TOPKAT 3.0 uses the following graph theoretical invariants to quantify salient structure attributes.

II.2.a. Electronic Attributes: Electrotopological state values (E-values), as numerical quantifiers of molecular structure, encode information about the electron content (valence, sigma, pi and lone-pair), topology, and environment of an atom, or a group of atoms, in a molecule (Hall et. al., 1991). Since an E-value is computed by taking into account the effects of both intrinsic and environmental features, it changes even with remote variations in structures; of course, the magnitude of variation depends on the severity of change. The calculation of E-values has been explained in the literature (Hall et. al., 1991).

II.2.b. Bulk Attributes: Besides molecular weight, TOPKAT 3.0 employs size-corrected E-values for quantification of molecular bulk. The size- corrected E-values are computed from a rescaled count of valence electrons (Hall et. al., 1991).

II.2.c. Shape Attributes: Since molecular shape and molecular symmetry also influence molecular transport, TOPKAT 3.0 includes topological shape descriptors, mk (kappa), of orders 1 through 7 (Kier, 1986; Gombar and Jain, 1987) and seven indices of molecular symmetry (Gombar, 1991) for effective quantification of molecular shape.

II.3. Model Development

A number of procedures are available in standard statistical and other commercial packages to determine the explicit nature of the function f (Eqn. 1). However, it is important to ascertain that the developed QSTR model is statistically significant, robust, and cross-validated so that it could be used for predictive purposes. Before any QSTR model is installed in the TOPKAT 3.0 program the following steps are taken:

II.3.a. Evaluation of Predictor Variables: All the structural descriptors, namely, shape indices, symmetry indices, and counts, E-values, and size- corrected E-values for the training set chemicals are subjected to a frequency check. Any variables having non-zero values for less than three chemicals are not considered as predictor variables. This is done to enhance the statistical reliability of the predictor variables.

In order to reduce problems due to possible co-linearity of variables, the pairwise correlations of these variables are examined. From a pair of variables with correlation coefficient of 0.9 or higher, only one variable is retained in the descriptor set. A variable which is easier to compute and comprehend and is more continuous (more non-zero values) is generally retained.

II.3.b. Obtaining Tentative QSTR: The choice of the technique chosen to generate a tentative QSTR is dictated by the scale in which the T (Eqn. 1) values are expressed. For a categorical expression of T, for instance, discriminant analysis or a related pattern recognition technique may be used, and when T is expressed on a continuous scale, regression analysis or a similar approach may be adopted. The goal at this stage is to select the most potent variables.

II.3.c. Model Diagnostics: It is relatively easy and straightforward to obtain a tentative QSTR by using standard software packages. However, before these QSTR models are employed for predictive purposes, it is essential that these models be subjected to a variety of diagnostics to establish that:

  1. all descriptors in the function are significant,
  2. no compounds with unique compound-variable association are in the training set,
  3. no influential, leverage, or outlier compounds remain in the training set,
  4. the function is robust in a cross-validation test,
  5. normally distributed residuals, and
  6. cross-validation performance is not significantly different from the performance on the training set.

Unless these diagnostics are performed, the QSTR models are not robust and their statistical quality may be questionable.

II.4. Defining Optimum Prediction Space (OPS)

A robust and statistically significant model so diligently developed is still representative of a closed system, because it is based on a limited training set. Therefore, it can not be expected that the model will be applicable to every chemical. Of course, given the values of the descriptor variables the value of T (Eqn. 1) can be estimated, but the computed value may not be meaningful unless it is ascertained that the model is applicable to the chemical. Every model has an associated multivariate space such that at points (chemicals) within and near the periphery of this space, the model is applicable. We call this space the optimum prediction space, OPS. It is important to note that a query chemical being inside or near the periphery of the OPS does not mean that the predicted value of the dependent variable for that chemical will have concordance with the experimental value. All it implies is that the model is applicable to this chemical, and the probability of concordance between the predicted and the actual values is as high as that for the training set chemicals.

The OPS of a model with p descriptor variables is a p + 1 dimensional space derived from the descriptor space, i.e., the values of the p independent variables of n observations in the training set of the model. The OPS is generally smaller than the predictor space, i.e., the model may not be applicable to some regions in the descriptor space itself. Since the patent application for the algorithms leading to OPS is pending, it will be sufficient to mention that each of the p + 1 dimensions of the OPS has upper and lower bounds quantitatively defined in terms of p + 1 elements composed of double-transformed values of the predictor variables.

III. Using TOPKAT 3.0

The steps and algorithms outlined above are implemented in the commercially available computational toxicology tool TOPKAT 3.0. The latest release v3.05 of TOPKAT 3.0 can assess :

  1. developmental toxicity potential (DTP),
  2. Ames mutagenicity,
  3. sex/species-specific NTP carcinogenicity call,
  4. acute rat median lethal dose, LD50,
  5. chronic lowest observed adverse effect level (LOAEL) in rat,
  6. fathead minnow LC50,
  7. skin sensitization (guinea pig maximization test),
  8. logP (logarithm of n-octanol/water partition coefficient), and
  9. daphnia magna EC50.

All that a user has to do in order to use TOPKAT 3.0 is to:

  1. Input the SMILES string representing the query structure, and
  2. Select the property to be estimated.

TOPKAT 3.0 then calculates the values of model descriptors and computes the chosen toxicity value either in weight/weight or weight/volume units for T expressed on a continuous scale, or in the range 0.0 and 1.0 representing posterior probability of classification when T is expressed as categorical data. The computed toxicity value, however, may not be meaningful or reliable. The TOPKAT 3.0 program is equipped with algorithms for checking:

  1. Does the query contain atoms/bonds not supported by the program?
  2. Does the query contain structural features not represented in the learning set?
  3. Are the values of descriptors outside the range spanned by the learning set?
  4. Is the query outside the Optimum Prediction Space (OPS) of the chosen model?
  5. Can the predicted toxicity value be validated with support from "similar" compounds in the training set?

These checks help determine when the QSTR-assigned property value may not be reliable. The last two examinations are unique to TOPKAT 3.0 and are very important in any structure-based predictive system. The following sections demonstrate the necessity of checking OPS and "QSTR similarity" in making reliable predictions and their confident acceptance.

IV. Prediction Validation

IV.1. OPS Examination

Determining that the submitted structure falls inside the OPS of the model assures the user that the model is applicable to the submitted structure. The importance of such an examination is substantiated with results of logP predictions with TOPKAT 3.0.

The logP predictor, VLOGP, of the TOPKAT 3.0 program is based on accurately measured logP values of 6675 chemicals with diverse structures ranging between -3.56 and 7.73. The high degree of fit of the 363-variable model is indicated by a small standard error of only 0.201 and the stability of VLOGP is illustrated by a small difference between the model average squared residual (=0.040) and the jackknife average squared residual (=0.043). VLOGP has an explained variance of 98.5% and is significant at p<0.0001 with an F-ratio of 1188. On comparing the values of logP calculated by VLOGP with the experimental values of the 6675 training set compounds, it was found that for over half the chemicals the deviations are smaller than 0.15, and for over 75% chemicals below 0.25.

When VLOGP was applied to a prediction set of 113 chemicals not included in the original 6675 chemicals, 29 (or 25.7%) chemicals were identified to be outside the OPS of the model. It was found that for the compounds inside the OPS the average smallest deviation from the reference logP value was only 0.272, whereas the corresponding number for the compounds outside the OPS was 1.348 indicating that the likelihood is greater of predicting a logP closer to the reference logP value for compounds in the OPS. Therefore, though the application ratio of 74.3% of VLOGP may seem low, the capability of the OPS algorithms to proactively warn about the chemicals to which the model is not applicable is in fact a boon.

In another evaluation, TOPKAT 3.0 was used to assess developmental toxicity potential (DTP) of a dye, disperse orange 3 (DO3). Like any other predictor, TOPKAT 3.0 produced probability of DTP. The predicted probability of 0.000 indicates that DO3 does not have DTP. However, a warning is displayed by the program: "WARNING: Assigned Toxicity Value May be Unreliable". Even though a prediction is made, the predicted number has no meaning, because the reason for the warning "Query Outside Optimum Prediction Space (OPS)" means that the model used to make this prediction is not applicable to DO3. This ability to inform users that based on the model alone the prediction is not acceptable sets the TOPKAT 3.0 program apart from other toxicity prediction packages which desert users after making a prediction. In fact, TOPKAT 3.0 computes the upper bound of the periphery of the OPS also. Under special circumstances, this may assist users to take the liberty of accepting some predictions outside but close to the permissible limit. For the DTP predictor, the permissible limit is 0.82, whereas DO3 is at a distance of 7.56. Therefore, the current (v.3.1) DTP model in TOPKAT 3.0 is not suited for investigating DTP of DO3.

It can thus be inferred that in the absence of an estimate validation procedure like OPS, it is not possible to discriminate between "good" and "bad" predictions.

IV.2. "QSTR Similarity" Examination

As mentioned above, a query chemical being inside or near the periphery of the OPS implies that the model is applicable to this chemical and the probability of concordance between the predicted and the actual values is as high as that for the training set chemicals. The confidence in the computed toxicity value can be further enhanced by:

  1. Calculating the "QSTR similarity" distance of the query structure from the data base compounds.
  2. Establishing that there are correctly predicted training set compounds at a small "QSTR similarity" distance from the query.

"QSTR similarity" is different from "structural similarity". For example, both benzene and chlorobenzene are nonmutagenic in the histidine revertant assay. One might say that the similarity in their behavior in the Ames test is due to their structural similarity. But benzene is carcinogenic to female mice, whereas chlorobenzene is not. Since we are talking about the same chemicals their structural similarity is the same, but their different responses in mutagenicity and carcinogenicity tests indicates a difference in their characteristics which are determinant of mutagenicity and carcinogenicity. These determinants are nothing but the descriptors of the QSTR models. Therefore, the question whether two structures are "similar" should be answered in the context of the property being estimated. For example, out of a red automobile, a blue motorcycle, a banana, and a tomato, a yellow truck is most similar to a banana in color but to a red automobile in maximum speed. As far as we know, TOPKAT 3.0 is the only package which determines QSTR similarity between the query structure and all the compounds in the training set, and allows a visual display of all the training set compounds in the ascending order of the normalized QSTR similarity distance. When a user finds one or more correctly classified training set compounds at small QSTR similarity distance, the confidence and reliability of the prediction made by TOPKAT 3.0 are enhanced. The application of TOPKAT 3.0's ability to evaluate "QSTR similarity" in further validating a prediction is exemplified below.

When TOPKAT 3.0 was used to assess male rat carcinogenicity (MRC) of DO3, a message "All Validation Criteria Satisfied" was posted in the decision screen. This is an indication that (1) all structural features present in DO3 are represented in the MRC model of TOPKAT 3.0, and (2) DO3 is in the OPS of the MRC model. Therefore, the MRC model is applicable to DO3. MRC computes a probability of 1.000 for carcinogenicity of DO3 in male rat, i.e. DO3 is carcinogenic. When requested to list the compound from the MRC data base which has the smallest QSTR similarity distance from DO3, TOPKAT 3.0 identified phenazopyridine. It has a small QSTR similarity distance of 0.161 (on a scale of 0 to 1) from DO3, is a known male rat carcinogen (NCI-CG-TR-99, 1978), and is correctly classified by TOPKAT 3.0. This leads to an inference that DO3 is located in that region close to which TOPKAT 3.0 is known to make correct predictions. This certainly enhances the confidence in the prediction that DO3 is a male rat carcinogen.

V. Conclusions

All computer-assisted toxicity predictors are models of closed systems because they are derived from limited sets of chemicals. There are three main components of a reliable toxicity prediction system, namely:

  1. it consists of a robust, cross-validated and predictive model (or set of rules) derived from a uniform training set,
  2. it can analyze, on a multivariate level, whether the predictive model is applicable to the submitted query structure, and
  3. it provides a mechanism to objectively substantiate the predicted toxicity in terms of known chemicals with high "QSTR similarity" to the query structure.

Any system lacking any of these three components may not be suitable for reliable prediction of toxicity from chemical structure. The latest release of a modular computational toxicology tool, TOPKAT 3.0, is designed to incorporate these components of an ideal computer-assisted toxicity prediction system, and can generate, from molecular structure, a variety of mammalian, aquatic, and environmental toxicity measures, thus providing a capability to follow the toxicity profile as molecular structure is modified during the design process.

References

Ashby, J. (1992) Consideration of CASE Predictions of Genotoxic Carcinogenesis for Omeprazole, Methapyrilene and Azathioprine. Mutation Res., 272, 1-7.

Gombar, V.K. and Jain, D.V.S. (1987) Quantification of Molecular Shape and Its Correlation with Physico-chemical Properties. Indian J. Chem., 26A, 554- 555.

Gomabr, V.K. and Enslein, K. (1990) Quantitative Structure-Activity Relationship (QSAR) Studies Using Electronic Descriptors Calculated from Topological and Molecular Orbital (MO) Methods. Quant. Struct.-Act. Relat., 9, 321-325.

Gombar, V.K. (1991) Toxicology Newsletter No. 13, Health Designs, Inc., Rochester, NY.

Hall, L.H., Mohney, B. and Kier, L.B. (1991) The Electrotopological State: Structure Information at the Atomic Level for Molecular Graphs. J. Chem. Inf. Comput. Sci., 31, 76-82.

Hansch, C. and Leo, A. (1995) Exploring QSAR: ACS Professional Reference Book, American Chemical Society, Washington, DC.

Kier, L.B. (1986) Shape Indices of Orders One and Three from Molecular Graphs. Quant. Struct.-Act. Relat., 5, 1-7.

Klopman, G. (1984) Artificial Intelligence Approach to Structure- Activity Studies: Computer Automated Structure Evaluation of Biological Activity of Organic Molecules. J. Am. Chem. Soc., 106, 7315-7321.

NCI-CG-TR-99 (1978) Bioassay of Phenazopyridine Hydrochloride for Possible Carcinogenicity, National Institute of Health, Bethesda, MD.

Purcell, W.P., Bass, G.E. and Clayton, J.M. (1973) Strategy of Drug Design: A Guide to Biological Activity, Wiley, New York.

Rosenkranz, H.S. and Klopman, G. (1990) Structural Basis of Carcinogenicity in Rodents of Genotoxicants and Nongenotoxicants. Mutation Res., 228, 105-124.

Sanderson, D.M. and Earnshaw, C.G. (1991) Computer Prediction of Possible Toxic Action from Chemical Structure; The DEREK System. Human & Exptl Toxicol., 10, 261-273.

Woo, Y., Lai, D.Y., Argus, M.F., and Arcos, J.C. (1995) Development of Structure-Activity Relationship Rules for Predicting Carcinogenic Potential of Chemicals, Toxicol. Lett., 79, 219-228.



NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:

Network Science Corporation
4411 Connecticut Avenue NW, STE 514
Washington, DC 20008
Tel: (828) 817-9811
E-mail: TheEditors@netsci.org
Website Hosted by Total Choice