Statistical Considerations in High Throughput Screening

Michael W. Lutz*, J. Alan Menius, Rebecca Gooding Laskody, Paul L. Domanico, Aaron S. Goetz, David L. Saussy and Tom Rimele



Glaxo-Wellcome Research Institute
Research Triangle Park
North Carolina, 27709
Tel: (919) 483-6106
Fax: (919) 483-2494



http://www.netsci.org/Science/Screening/feature05.html

Keywords



Assay Development, Assay Optimization, High Throughput Screening, Experimental Design, Outliers, Statistics

horizontal line

Introduction



Recent scientific and technological advances have introduced new paradigms for drug discovery research. The availability of chemical libraries and robotic systems for bioassay allow synthesis and testing of hundreds or even thousands of compounds in a single day. These developments present great challenges and opportunities for assay automation and data analysis. As advances in molecular biology and bioinformatics identify more potential biochemical targets and combinatorial libraries provide a large number of compounds for testing, it is clear that methods for efficiently optimizing biochemical assays are of great importance. The vast amount of data that becomes available when libraries are tested against an array of molecular targets creates new opportunities for structure-activity relationship analysis and amplifies the need for effective statistical methods to ensure the integrity of the data and identify trends and relationships in the data.

This paper focuses on some of the issues we have found useful to think about with regard to data analysis when designing high-throughput biochemical assays. There are three parts to the paper which correspond to different phases in the flow of information from a high-throughput screen: experimental design for biochemical assays, data integrity issues and data analysis. Experimental design is most important when setting up a screen and is used to quickly identify optimal assay conditions and ensure the quality of the resulting data. Data integrity issues involve calculation of assay variability for different sets of compounds, identification and possibly correction of systematic effects and verification that the data is appropriate for its intended use. Finally, data analysis involves operations with the data, either data reduction to summarize raw data in terms of a result like Ki, or extraction of patterns and information from data, such as an SAR, QSAR or neural network analysis. Taken together, these methods address the challenge of extracting information from volumes of data.

horizontal line

Experimental Design



Experimental design is a classical statistical method that has found utility in many disciplines that require finding optimal conditions and modeling response surfaces but has often been underutilized in biology. The theoretical basis for experimental design is described in detail in numerous statistical texts1,2,3,4. Application case studies can be found in the literature associated with a given field. Examples include industrial process optimization5,6 and chemistry7,8. A recent book by Haaland, Experimental Design in Biotechnology 9 is an excellent introduction to some of the problems and methodologies associated with using experimental design for biological and chemical applications.

In all applications, the underlying question is how to optimize something in a systematic way in order to extract the most information in the fewest numbers of runs. In many cases, identification of the optimal conditions is sufficient. In other cases, the relationship between the factors and the response is modeled mathematically, this is termed response-surface modeling. Interactions are particularly important since they are more the rule than the exception in biology and can greatly increase the complexity of a problem or in worst case, make interpretation of the data impossible. A special subset of experimental designs are variance reduction experimental designs. In these designs, the variance of a response is optimized rather than the response itself. These designs are important in biology since by minimizing variance the assay becomes more robust. In other applications, cost is an important factor and optimization is desired to minimize use of scarce resources like proteins. In many cases multiple responses are to be considered: for example minimizing variance while keeping receptor concentration low in order to conserve protein. Finally, the advantage to having a model, even a simple one such as the linear model that forms the basis of classical experimental designs, is that you can make predictions.

In classical experimental design, investigators choose a number of factors that they believe may have an effect on the response. Depending on the number of factors and their possible ranges, different types of designs are chosen to sample the space defined by these factors. All of the factors are varied simultaneously in a systematic way that later allows measuring the significance of a specific factor or interaction between factors. Typically the factors are sampled at the extremes of the operating range and a linear or polynomial interpolating function is used to model the region between the design points.

Different experimental designs are chosen for specific objectives. When a robotic assay is first set up, there are often many factors that potentially can influence the robustness of the assay. To find the important factors, a low resolution fractional factorial design might be run. This is known as factor screening. After the important factors are identified, a full factorial design might be set up to find the optimal settings for the factors. Response surface modeling could be done to explore the region around the optimal settings, to assess sensitivity of the factors and to test for higher order interactions.

Example: Optimization of a Scintillation Proximity Assay



This example considers optimization of a scintillation proximity assay (SPA). These assays are often considered as an alternative to traditional radioligand binding assays that is more amenable to high-throughput and automation. It is not necessary to separate bound and free radioligand in an SPA assay. For the example case, human alpha 1B adrenoceptors were expressed in mammalian cells. SPA beads with a wheat germ agglutinin coating were used to bind the glycosylated proteins of the membranes containing the expressed receptors. A competition binding assay was performed where concentration-response curves were generated in response to 5-Methyl-Urapidil displacement of tritiated prazosin.

The objective for this experimental design was to minimize variation in the Ki estimate. A secondary objective was to minimize the amount of receptor needed. Several responses were considered: variability of the Ki, signal to noise, range of counts and adequacy of the nonlinear regression fit to the data. For this example, the multiple response variables were considered separately.

Three factors were considered in the design: receptor concentration (40 pM, 80pM, 120 pM), SPA bead reagent concentration (0.5 mg/ml, 2.0 mg/ml) and ligand concentration (0.448 nM, 1.792 nM). Since the number of factors and levels was small, it was possible to use a full factorial design of 12 runs (2x2x3). Since variability in the Ki was the primary response, the design was repeated three times to estimate the variance at each design point.

Results for Variability



The results for variability are shown in Figure 1. Each panel is obtained from the statistical analysis/visualization software package JMP. The ordinate in each case is variability of the negative log of the Ki (pKi). The abscissae are the factor levels used in the design. The vertical lines represent the values for particular factor settings that can be changed by the user. When the factors are changed, the plots update to show the effect on response predicted by the linear model, including interactions. The horizontal lines show the response obtained when all factors are set at their middle level. The error bars show the pooled standard deviation estimate for the model.

Figure 1. Experimental design results for a full factorial optimization of a scintillation proximity assay. The ordinates are variability of the pKi (expressed in standard deviation units) calculated from three replicate measurements. The abscissae are factor levels for each factor in the design. A) Receptor concentration, B) ligand concentration and C) SPA reagent bead concentration.

From the plots, it is seen that for the factor levels chosen, there is a significant effect of receptor concentration (Figure 1A). The highest receptor concentration minimizes variance. An inverse relationship is observed between ligand concentration and variability: the lowest ligand concentration has the smallest variance (Figure 1B). SPA bead reagent concentration does not have a significant effect on variability in this experiment (Figure 1C). A contour plot (Figure 2) is an alternative way of viewing these results. This plot is helpful in visualizing operating regions, particularly if many factor levels are used, but is limited by its bivariate nature.

Figure 2. Contour plot for experimental design results of the full factorial optimizaton of a scintillation proximity assay. The ordinate is ligand concentration. The abscissa is receptor concentration. Contours are variability of the pKi (expressed in standard deviation units) calculated from three replicate measurements.

Results for Signal to Noise



The optimal factor settings for signal to noise (S/N) are slightly different from variability. The best S/N is obtained at the highest receptor concentration (Figure 3A). Lowering ligand concentration (Figure 3B) and SPA bead reagent concentration (Figure 3C) also improves S/N.

Figure 3. Experimental design results for a full factorial optimization of a scintillation proximity assay. The ordinates are signal to noise (B0-NSB), where B0 is the concentration of radioligand bound with no inhibitor added, NSB is nonspecific binding. The abscissae are factor levels for each factor in the design. A) Receptor concentration, B) ligand concentration and C) SPA reagent bead concentration.

Results for pKi trend



Figure 4 and Table 1 show the trend of pKi values for the different design points. It is seen that pKi is not invariant with respect to receptor and ligand concentrations. Inspection of Table 1 shows that while most combinations of factor levels have a variance that is typical of this type of assay, the two rows with the lowest receptor concentration have unusually high variance.

Figure 4. Consistency of the pKi estimates for a full factorial optimization of a scintillation proximity assay. The ordinates are pKi. The abscissae are factor levels for each factor in the design. A) Receptor concentration, B) ligand concentration and C) SPA reagent bead concentration.

Table 1
pKi Trends for Full Factorial Experimental Design





Receptor (nM) Ligand (nM) Radioligand (nM) N Mean (pKi) Std. Dev. (pKi)
2 0.448 0.1 3 6.99 0.71
2 0.448 0.4 3 6.96 0.08
2 1.792 0.1 3 6.72 0.34
2 1.792 0.4 3 6.38 0.31
4 0.1 0.1 3 7.03 0.13
4 0.1 0.4 3 6.89 0.19
4 0.4 0.1 3 6.71 0.17
4 0.4 0.4 3 6.81 0.12
6 0.1 0.1 3 6.93 0.06
6 0.1 0.4 3 7.07 0.10
6 0.4 0.1 3 6.72 0.12
6 0.4 0.4 3 6.68 0.10


Summary



The best conditions are low ligand and SPA bead reagent concentration and high receptor concentration. These conditions minimize variance and provide the best S/N. If amount of receptor were to be minimized, perhaps because quantities were limited, it would be desirable to use the lowest ligand concentration level. For the best S/N, the lowest concentration of SPA bead reagent should be used. Although data are not shown for lack of fit, the higher SPA bead reagent concentrations appeared to give the best fits. These conclusions address the trade-off between the different factors. After completing the design and constructing a model, it is possible to be more quantitative about making decisions. For example, if lack of fit does not seem to be a problem, it might be possible to lower SPA bead reagent concentration in order to obtain a better signal.

horizontal line

Data Integrity



Depending on the experimental protocol used, high-throughput screening assays can provide data that is appropriate for many different purposes: identification of lead compounds, SAR, QSAR and library design. Experimental data may be derived from single concentration screening, testing of a few different concentrations or a complete concentration response curve. In each case, it is important to determine if the experimental protocol will provide the type of data needed for the experimental objective. Experiments must be repeated to determine variability of the assay.

Assay Variability and Replication



It is important to have a good estimate of assay variability, both to characterize the robustness of the assay and to identify significant differences between compounds tested in the assay. It is important to recognize that the replicates must be independent. For example observations obtained from adjacent wells on a 96 well microtiter plate are probably not independent and will measure only the error in pipetting.

An important estimate of variability is the standard deviation of metrics calculated for control or standard compounds that are tested every time the assay is run. Use of a control allows a laboratory to compare results across time and check for systematic effects. It is also important to check the variability of a diverse set of compounds across the affinity spectrum to make sure that variability is not a function of affinity. This can be done using a standard analysis of variance (ANOVA) model and by using some graphical methods. The plots shown in Figure 5 were obtained for a radioligand binding assay using the software program JMP. The points on the two graphs are linked by position on the abscissa. No apparent relationship is seen between affinity and variability, however certain points seem to have unusually high variability (the points labeled with the X symbol). In this case, the high variation was due to test compounds being obtained from different batches. It is also observed that the control has unusually low variability in this assay.

Figure 5. Variability and affinity for a radioligand binding assay. The top panel ordinate is the average pKi value from three experiments. The bottom panel ordinate is the standard deviation of the pKi. Both panels are linked by the abscissa, an ordinal value assigned to each of 42 compounds. The square symbol is the average value and standard deviation of the control which was repeated four times.

After assay variability has been estimated, the replication scheme for the assay can be determined. One useful graph is an indication of the size of effects which can be detected assuming a fixed sample size and statistical power. Figure 6 is an example of this type of graph. To use this graph, a size of effect (difference in affinity in log units) and statistical power are selected which identify a point on one of the three curves on the ordinate scale. The minimal number of replicates that should be done can be read off the abscissa. A table of 95% confidence bounds for Ki and IC50 estimates can be constructed using a spreadsheet program as demonstrated in Table 2. It is worthwhile to construct this table before starting SAR analysis to provide a quantitative framework for deciding which effects are statistically significant.

Figure 6. Power plot for assay variability. The abscissa is the number of times an experiment is repeated. The ordinate is the size of the difference in pKi (-log(Ki)) that can be detected. The curves are drawn for different statistical power (): 0.70, 0.80 and 0.95.

Table 2
Table of 95% confidence bounds for
Ki and IC50 estimates





nM Rep sigma lower upper
500 2 0.12 42 5986
500 3 0.12 252 993
500 4 0.12 322 776
500 6 0.12 374 668
500 2 0.18 12 20711
500 3 0.18 179 1400
500 4 0.18 259 967
500 6 0.18 324 772
500 2 0.2 8 31326
500 3 0.2 159 1570
500 4 0.2 240 1040
500 6 0.2 308 811
385 3 0.12 194 765
15000 3 0.12 7551 29798
385 2 0.12 32 4609
15000 2 0.12 1253 179576


Systematic Effects



For subsequent data analysis, including the regression modeling that is done for QSAR, the assumption is made that the error in the response is random variation. All systematic effects are assumed to be part of the model. Particularly for robotic biological assay systems, the investigator must be careful to guard against introduction of inadvertent systematic effects by examining experimental protocol for sources of systematic effects like time effects, lack of homogeneous reaction conditions and differences between reagents and biological materials. In many cases, if a systematic effect is suspected, it is possible to modify the design and add terms to the model to account for the effect.

For some applications, it is possible to correct results for a systematic effect. If a control value is obtained every time an assay is run, results can be reported relative to the control. The difference in potencies between test and standard compounds in log units (-log(IC50 standard) + log (IC50 test)) can be a more robust metric than raw IC50 measurements for some assays. When experiments are done on batches of biological samples this adjustment can correct for systematic effects. However, the adjustment is not a panacea. If the variation between compounds and controls tested in different batches is random rather than systematic, the corrected IC50 may have a higher variance than the raw IC50 since it includes variation in the control as well. The trend of IC50s for replicated controls and several compounds tested in different batches should be examined to determine if the shift in potency is consistent.

horizontal line

Data Analysis



Good experimental design and attention to the details presented in the data integrity section are prerequisites to effective data analysis. Two components of the data analysis process include data reduction, the transformation of raw data into summary level data and subsequent use of the summary level data to draw conclusions or inferences from the results. This section discusses two aspects of data analysis which are relevant for high-throughput screening: the use of rule-based automated data reduction systems and outlier identification.

Rule-based Automated Data Reduction



Data reduction is often the rate-limiting step in a biochemical or pharmacological assay. Automated, rule-based software systems can greatly decrease the amount of time necessary for data reduction but should not replace the important step of examination of the data for trends and quality control. The goals of an automated data analysis system should be to increase the consistency of data analysis, reduce repetitive steps and quickly identify data which requires more intensive manual analysis.

Figure 7 shows a flow diagram for part of the decision tree that is used for analysis of radioligand binding data in an automated system10,11,12,13,14,15,16. This part of the decision tree addresses the question of whether or not data are appropriate for curve fitting. The decision tree includes empirical criteria which are based on the biochemical conditions and assumptions of the assay (are bound concentrations less than 10%) and on statistical measures (does the curve have clearly identifiable asymptotes). The remainder of the decision tree determines which of several mathematical models or equations offers the best fit to the data. Use of this system improved the consistency of data analysis done by different investigators by applying a uniform set of criteria to the data. It also decreased the number of repetitive steps by quickly categorizing the data into groups that were appropriate for curve fitting and that did not produce an identifiable curve. For a screening assay, the statistical tests should be considered exploratory data analysis rather than hypothesis testing. If the decision tree model suggests something unexpected, experiments should be designed to test the observation.

Figure 7. A portion of the flow diagram used in an automated data analysis system for radioligand binding experiments. This part of the decision tree addresses the question of whether or not data are appropriate for curve fitting.

Outlier Identification



Testing of combinatorial libraries require some special consideration in data analysis. In many cases, multivariate visualization may be the best tool to quickly identify patterns in the data17. Outliers present an interesting problem for library data. Often the outliers are the most interesting points, or hits. In other cases, they are outliers in the more traditional sense. Outliers are an old and difficult problem in statistics. Multiple outliers and outliers in multivariate space are a particularly difficult problem. Theoretical treatment of outliers can be found in the statistics literature and in several texts18,19,20, but it is often difficult to apply a method in an automated sense.

The Bradu/Hawkins method21 is an outlier detection algorithm which we have found to be useful for large data sets which is amenable to automation. It is based on a standard cell means model where it is assumed that there are no interactions between the rows and columns:

Yij = N(µij, S2)
µij = µ + i - j + ij

In this model, i are row effects, j are column effect and ij are the interaction terms where if ij = 0 the cells are table inliers and if ij <> 0 the cells are outliers. Tetrads are calculated as:

Tijkl = Yij - Ykj - Yil + Ykl

The algorithm is to save the median tetrad for each cell Yij. The median tetrads can then be plotted on a half-normal plot as shown in Figure 8.

Figure 8. Half-normal plot for tetrad outlier identification algorithm. The abscissa is the median tetrad values. The ordinate is the associated probability score for the median tetrad value. Outliers are suspect when the curve breaks from linearity.

As published, the algorithm required inspection of a half-normal plot to identifiy outliers. To automate the algorithm, it is possible to standardize the median tetrads, rank the resulting values and compare them to expected values obtained from the distribution of the order statistics22. The modified algorithm is easily implemented in a statistical spreadsheet. We have found the algorithm to be a useful tool for outlier identification during data reduction of plate-oriented assays where samples are replicated on the order of three to four times and for fast identification of outliers in tables of data from high-throughput screening. In the latter example, outliers stand out from the remainder of the data as a signal in a matrix of noise while in the former application, the data are organized so that columns represent different concentrations of compound, rows are replicates and a significant interaction between the two indicates an outlier.

Directions for Future Work



Drug discovery in the 90s involves a combinatorial explosion of both compounds to test and potential biological targets with a concommitant increase in the quantity of data to analyze. The methods described in this paper are powerful tools for the development and optimization of biochemical assays and for effectively handling some of the problems associated with data analysis.

Current work in the area of mathematical and statistical methods for analysis of high-throughput screening data includes improvements in these techniques in addition to some areas not discussed such as pooling and mixture decoding strategies. One promising area of research is space-filling experimental designs23,24,25,26 which optimally fill a design space rather than sampling the extrema. These designs appear to be best suited for problems where the response surface is nonlinear or has numerous spikes and valleys. Detection of multivariate outliers continues to be an area of considerable theoretical research27,28,29. Finally, novel methods for the visualization and statistical analysis of combinatorial library data continue to be developed in order to facilitate structure-activity relationship analysis17.

horizontal line

References



  1. Cochran, W.G. and Cox, G.M., Experimental Designs, John Wiley and Sons: New York, 1957.
  2. Hicks, C.R., Fundamental Concepts of Design of Experiments, Holt, Rhinehart and Winston, 1974.
  3. Box, G.E., Hunter, W.G. and Hunter, S., Statistics for Experimenters: An Introduction to Design, Data Analysis and Model Building, John Wiley and Sons: New York, 1978.
  4. Montgomery, D.C., Design and Analysis of Experiments, John Wiley and Sons: New York, 1984.
  5. Daniel, C., Applications of Statistics to Industrial Experimentation, John Wiley and Sons: New York, 1976.
  6. Murphy, T.D., Design and analysis of industrial experiments, Chemical Engineering, June 6, 1977, 168-182.
  7. Deming, S.N. and Morgan, S.L., Experimental Design: a chemometric approach; Elsevier, 1987.
  8. Austel, V., Design of Test Series by Combined Apllication of 2n Factorial Schemes and Pattern Recognition Techniques. In Quantitative Approaches to Drug Discovery; Dearden, J.C., Ed.; Elsevier, 1983.
  9. Haaland, P.D., Experimental Design in Biotechnology; Marcel Dekker, 1989.
  10. Munson, P.J. and Rodbard, D., LIGAND: A versatile computerized approach for characterization of ligand-binding systems, Anal. Biochem., 1980, 107, 220-239.
  11. Lutz, M.W., Goetz, A.S., Morgan, P.H. and Rimele, T.J., Statistical and graphical methods for analysis of radioligand binding experiments, Proc. of the BBN worldwide user meeting, Boston M.A., 1991.
  12. Abramson, S.N., McGonigle, P. and Molinoff, P.B., Evaluation of models for analysis of radioligand binding data, Mol. Pharmacol., 1987, 31, 103-111.
  13. Bylund, D.B. and Yamamura, H.I., Methods for receptor binding. In Methods in neurotransmitter receptor analysis; Yamamura, H.I., Ed.; Raven Press: New York, 1990; pp. 1-35.
  14. Kenakin, T.P., Pharmacological Analysis of Drug-Receptor Interaction, Raven Press: New York, 1993, pp. 385-410.
  15. Lai, T.L. and Zhang, L., Statistical analysis of ligand binding experiments, Biometrics, 1994, 50, 782-797.
  16. Unnerstall, J.R., Computer-assisted analysis of binding data. In Methods in neurotransmitter receptor analysis; Yamamura, H.I., Ed.; Raven Press: New York, 1990; pp. 37-68.
  17. Young, S.S. and Hawkins, D.M., Analysis of a 29 full factorial chemical library, J. Med Chem., 1995, 38, 2784-2788.
  18. Hawkins, D.M., Identification of Outliers, Chapman and Hall: London, 1980.
  19. Barnett, V. and Lewis, T., Outliers in statistical data, John Wiley: England, 1978.
  20. Beckman, R.J. and Cook, R.D., Outliers, Technometrics, 1978, 25, 119-149.
  21. Bradu, D. and Hawkins, D.M., Location of multiple outliers in two-way tables using tetrads, Technometrics, 1982, 24, 103-108.
  22. personal communication, Stan Young.
  23. Haaland, P., McMillan, N., Nychka, D., Welch, W., Analysis of space-filling designs, Computing Science and Statistics, 1994, 26, 111-120.
  24. Menius, J.A., Rocque, W., Emptage, M.R. and Young, S.S., Space-filling experimental designs for determining protein storage conditions, Computing Science and Statistics, 1994, 26, 106-110.
  25. Sacks, J., Welch, W.J., Mitchell, T.J. and Wynn, H.P., Design and analysis of computer experiments, Statistical Science, 1989, 4, 409-435.
  26. Welch, W.J., Buck, R.J., Sacks, J., Wynn, H.P., Mitchell, T.J. and Morris, M.D., Screening, predicting and computer experiments, Technometrics, 1992, 34, 15-25.
  27. Atkinson, A.C., Fast, very robust methods for detection of multiple outliers, J. Am. Stat. Assoc., 1994, 89, 1329-1339.
  28. Hawkins, D.M., Bradu, D. and Kass G.V., Location of several outliers in multiple-regression data using elemental sets, Technometrics, 1984, 26, 197-208.
  29. Rocke, D.M. and Woodruff, D.L., Multivariate outlier detection, Computing Science and Statistics, 1994, 26, 392-400.


NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:

Network Science Corporation
4411 Connecticut Avenue NW, STE 514
Washington, DC 20008
Tel: (828) 817-9811
E-mail: TheEditors@netsci.org
Website Hosted by Total Choice