Random Versus Rational
Which Is Better for General Compound Screening?

S. Stanley Young, Mark Farmen, Andrew Rusinko III

Glaxo Wellcome Inc.
Five Moore Drive
Research Triangle Park, NC 27709



http://www.netsci.org/Science/Screening/feature09.html

ABSTRACT



The design of general chemical libraries and the selection of large general screening sets of compounds pose common questions: how many compounds should be sampled and how should they be selected? As the cost of testing and synthesis decrease, it is now possible to make and test more compounds. Chemical library methods are theoretically capable of making millions of compounds but how many compounds should be made, 1,000 or 10,000 or 100,000? Currently, computational chemistry tools are employed to numerically characterize compounds and statistical methods are then used to select subsets of them. It is the selection of compounds that is problematic and leads to a rather perplexing question: How would random selection of compounds compare to rational selections? We use some simple examples and arguments to show when rational methods have the advantage over random methods. But interestingly, random methods appear equivalent to rational for many cases.

Key Words: compound screening, chemical library design, near-neighbor

horizontal line

Introduction



Once a biological target has been identified, the next step in the drug design process is to find new lead molecules. New leads are compounds that are modestly potent and are generally chemically novel, i.e. patentable. These compounds are the starting point for molecular optimization. Suitable formulations of optimized compounds are developed and these are then tested in clinical trials. Because there is no guarantee that a potent compound will become a marketable drug, a large number of new leads are needed to feed into the drug development process. It is a fact that most new leads have been discovered through random screening - large numbers of compounds are tested in biological systems and the active compounds are then selected for optimization. Rational screening approaches are now being employed, Romero et al. (1991), to sample a large collection of compounds thoroughly and efficiently. It was shown that it is possible to stratify a large data base of structures using computed (or experimentally) derived chemical descriptors so that a representative set of structures that covers the "space" of the large collection can be selected. However, two questions still were left unanswered. How large does a screening set really need to be to effectively sample chemical space and should it be selected rationally?

Combinatorial chemistry is the simultaneous synthesis of large numbers of compounds. There are over 10,000 carboxylic acids and over 4,000 primary and secondary amines in the ACD (1995). Using a simple reaction, A+B->C, that couples amines with carboxylic acids (assuming that all reactions go to completion), there are over 40 million products! Would we be better off selecting some carboxylic acids and amines at random and react them to build a library or should we use some rational selection process either in selection of monomers or in the selection of whole molecules?

To get some idea of how to proceed, we will discuss similarity and coverage of space rather abstractly. We will then develop a probability argument and a simulation to show how well space can be covered. Finally, we will relate the probability and simulation results to the chemistry problem of how many compounds to select and should random or rational methods be used.

horizontal line

Assumptions



We start with the assumption that the activity of a compound can be well predicted by a compound that is very close to it in chemical space, near-neighbor proximity. This is another way of stating the basic tenet of structure-activity relationships: similar molecules behave similarly. We extend this concept to say that any compound within a specified distance to a tested compound can be predicted by the tested compound. So there is a region of predictability around a tested compound. If two compounds are closer than their diameter of coverage, then there is an overlap of their coverage. But if the region of coverage of a compound is small relative to all chemical space and if there are relatively few compounds being used to cover the space, then we expect little overlap among randomly selected compounds. As the number of compounds becomes large, then we expect more duplicate coverage. Clearly, if we can only afford to screen 100 compounds, and if nothing is known about what features are important, then it makes little difference in how they are selected.

horizontal line

Continuous Case Theory



During World War II, useful mathematical theory was developed that relates to our problem (Walter Smith of the University of North Carolina, private communication). Saturation bombing of a specified target required the knowledge of the extent of coverage that could be expected. Each bomb destroys a certain area and determining the number of bombs necessary to adequately cover a target was a pressing problem. To get the desired bombing density (and also for self-defense) systematic formations of planes became necessary. Systematic plane formations were expected to give more systematic bomb coverage on the ground. But if general coverage was desired, and the relative area covered by a bomb is not great, then random coverage might be just as (in)effective as systematic.

It is instructive to study what the volume covered in theory is when all points in a known space are accessible, i.e. when we can put a point anywhere we want. Assuming the absence of certain boundary effects, the amount of space covered by n randomly chosen molecules can be expressed in terms of the ratio () of the volume of the region of similarity (VS) to the total volume of the space of interest (VT). More explicitly, the amount of space covered on average by n points is c defined by

(Robbins [1944, 1945]). This relationship is true provided that the chances of a point in the space being covered by a randomly chosen compound is always the same no matter where the point is in the space. If the space of interest is square with side dimension, X, and the radius of similarity is r, then

and hence,

In general, will depend on the ratio of the radius of the region of similarity to the side dimension of the space raised to the power of the dimension of the space.

If a systematic process of selection is used, then the amount of space covered will be the maximal amount nVs, assuming that the density of coverage is below that were overlap will occur. In a random design, this maximal coverage can be divided into expected coverage c and expected overlap. The proportion of the maximal amount of coverage that can be accounted for as overlap is

In order to see when it is a disadvantage to use a random design, the value of n that results in a given percentage of overlap, P, can be found. This value of n will be denoted by NP and it can be found by solving the fixed point equation

which can always be solved by iteration (see appendix for discussion). In order to see how NP is affected by dimension and , several values of the log base 10 of N0.1 are displayed in Figure 1. Each dimension was scaled so that they are 100 units long. So a 2D space has 1002 = 10,000 cells and a 3D space has 1003 = 1,000,000 cells, etc. The ratio, = VS/VT, is varied by changing the radius of similarity by factors of 1/2. We will discuss later the chemical reasonableness of this range of . From Figure 1, it can be seen that changes in dimension cause large multipicative changes in the size of N0.1. Multiplicative changes in radius also cause multipicative changes in N0.1. These theoretical results can be useful for producing rules of thumb to deciding when a systematic approach is warranted. Indeed, if the value of N0.1 is known for one space, then knowledge about the relative difference in the size of the known space and a space of interest could be used to compute theoretical multiplicative changes needed to produce a ball park figure of N0.1 for the space of interest.

In addition, if large, randomly selected sets of compounds appear to be as effective in screening as those selected in a systematic way, then this is telling us something about the size of the chemical space -- it is big. For very large chemical spaces, it is clear that the amount of overlap is expected to be quite negligible.

Figure 1: Sample size N0.1 for different values of radius and for dimensions 1, 2, and 3.

horizontal line

Discrete Case: Discussion and Theory



Obviously, we do not have the luxury of placing a compound any place we desire in chemical space. Some points are inaccessible since it is not possible to make a compound reside at a particular location. Also, compounds are made of a combination of discrete atoms so there is an inherent granularity of structures in chemical space. Furthermore, it is much more expensive to make particular compounds than to draw compounds from inventory or use combinatorial synthesis to make large general collections of compounds. Unfortunately, it is possible that a large collection of compounds or a chemical library might cover chemical space very unevenly. There may be very dense regions where there is considerable overlap and there may be diffuse regions where chemistry is difficult or where no one has turned their attention.

Analogous coverage results follow in the case that selection is made from a discrete collection of compounds. It must be assumed that each of MT compounds in a collection has the same number of similar near neighbors, MS. If n compounds are chosen at random with replacement, then the average number of compounds covered, c, will be given by

The maximal number of compounds that can be covered by a sample of size n is nMS and the average number of compounds attributable to overlap is

Therefore, the discussion for the continuous case has relevance for the discrete case with the understanding that is now defined by

In order to apply this formula in practice, the average number of compounds being covered by a randomly selected compound must be assessed by similarity searching. Each compound in the collection could be run as a target against the rest of the collection and a similarity coefficient computed. The number of compounds with similarity greater than some tolerance could be found and averaged over all compounds in the collection. This average could be used for MS.

Clearly, in practical situations, there are going to be regions of the chemical space where the molecules are very dense such as with steroids or penicillins. In such cases, it might be useful to assess MS for the dense groups. Knowing the proportion of the collection which lie in these dense regions could be used to compute the number of compounds that will likely be chosen from these regions when choosing at random from the collection. The average amount of overlap can then be assessed for each dense region and cost considerations could be used to decide whether or not it is worth using a systematic selection of compounds in dense regions to avoid this overlap. The assumption that each compound in a design is chosen at random from the entire collection will not be completely satisfied since after a compound is chosen from the collection, it will not be selected again (referred to as sampling without replacement). This departure from the assumptions is expected to be of less concern than the variability in MS.

horizontal line

Chemical Nearness



Just how near are structures to one another? We need some sense of how numerically close similar structures are in a chemical space. In particular, we need to know the radius of a hypersphere such that similar structures are within that radius. It is popular to use graph theory and topological indices to describe structures, as was done by Basak (1995). Several programs are available to compute on the order of a hundred features of a compound, such as the numbers of rings of various sizes, the numbers of paths of various lengths and branching patterns, etc. These features are highly intercorrelated so to reduce the dimensionality of the space one can use principal components analysis (PCA). We took the structures in DDR (1995) and used Molconn-X (Hall, 1993) to compute topological indices. That was followed by PCA to place the structures into a 11 dimensional space. Experience indicates that structures need to be quite close together in this 11D space to be judged similar. As an example, we provide a small application so that the reader can get a sense of how similar structures appear as the radius of the hypersphere around a target is varied.

Instructions:

  1. Click the center of the bull's eye to see the target structure.
  2. Click the menu item "Back" to move back to the bull's eye.
  3. Click in the first circle to view the structures within a hypersphere of radius 0.5. Click "Back".
  4. Click the second circle to see more distant structures.


Near Neighbors of Target Compound



Clicking the white center of the Bulls Eye will display the target. Compounds within a radius of 0.5 can be viewed by clicking between the white center and the first circle. The circles represent radii from the target. Compounds which are further from the target can be viewed by clicking between two circles. The descriptor space was generated from Principal Components of Topological Indices.

The MDDR number is given with each compound. And the distance to the target compound is also given. There were no structures within a hypersphere radius of 0.25 of the target. As the radius is increased, more structures are found, but these structures become more dissimilar.

horizontal line

Simulations



Boundary Effects

In order to assess the influence of boundary effects, the average area covered by n randomly placed hyperspheres in 3D has been computed by simulation. The space covered was a cube with side dimension of 100 units. The centers of the hyperspheres were chosen randomly within the space and the volume covered outside the space by the hyperspheres landing on the boundary was included as volume covered. For each of three samples sizes, n, and five radii, the average volume covered was computed based on one hundred simulations (randomly placing the n spheres in the cube). The observed average area covered together with the predicted area covered in theory appear in Table 1. The observed averages match those predicted in theory quite well. Even more interesting is the fact that samples as large as 10,000 can be selected at random with nearly zero overlap when the radius is 0.25 or smaller.

Radius = 0.1

n Observed Predicted Maximal
10 0.04 0.04 0.04
100 0.42 0.42 0.42
1,000 4.19 4.19 4.19
10,000 41.89 41.89 41.89


Radius = 0.25



n Observed Predicted Maximal
10 0.65 0.65 0.65
100 6.55 6.55 6.55
1,000 65.45 65.45 65.45
10,000 654.29 654.29 654.29


Radius = 0.5



n Observed Predicted Maximal
10 5.24 5.24 5.24
100 52.36 52.36 52.36
1,000 523.46 523.46 523.46
10,000 5,221.75 5,221.75 5,221.75


Radius = 1.0



n Observed Predicted Maximal
10 41.89 41.89 41.89
100 418.75 418.79 418.88
1,000 4,179.61 4,180.04 4,188.79
10,000 40,981.30 41,022.81 41,887.90


Radius = 2.0



n Observed Predicted Maximal
10 335.10 335.05 335.10
100 3,346.69 3,345.48 3,351.03
1,000 32,906.80 32,955.61 33,510.32
10,000 280,517.00 284,739.76 335,103.22


Table 1: Average area covered by spheres
in a 100 x 100 x 100 cube based on 100 simulations.



Chemical Library Calculations



A lot has been said in theory but the question of when random designs are not adequate for a specific collection must be addressed. The DDR (1995) collection was chosen for this purpose. The discrete case theory can be applied regardless of the shape of the chemical space provided that the number of near neighbors can be found. In order to assess the number of near neighbors, Atom Pairs similarity, Carhart et al. (1985), with the Tanimoto coefficient was used. An Atom Pair is composed of two non-hydrogen atoms and the topological distant between the atoms. Some information about the types of bonds that the atoms have is also included in the Atom Pair. For two molecules, the ratio of the number of Atom Pairs that are common to both molecules to the total number of Atom Pairs occurring in either molecule is roughly the Tanimoto coefficient. Atom pair similarity coefficients relate directly to the underling chemical structure so that similarity can be judged in an absolute sense. Descriptors such as those generated by topological indices are useful for judging similarity in a relative sense but the distances between compounds by themselves are hard to interpret in terms of actual similarity of the structures.

An Atom Pair similarity of 0.5 or greater was interpreted to mean that compounds are similar, and hence they are near neighbors of one another. The number of near neighbors around compounds was far from being constant, which was not surprising. The histogram in Figure 2 shows that the distribution of the near neighbors has considerable spread. Compounds with more than 50 near neighbors where not considered in order to make computations tractable. This excluded approximately 5000 compounds from consideration.

Figure 2: Number of near neighbors histogram for the DDR Collection



It is not completely clear how to apply the coverage formula when the number of near neighbors is not constant. If the magnitude n is sufficiently less than 1 then

which can be shown by applying standard Taylor's series techniques to the coverage formula. For small n, there is going to be very little overlap in the near neighbors of compounds in a random design. Hence, it is reasonable to expect that the number of compounds covered by n randomly chosen compounds is the sum of the number of near neighbors of the design compounds. On average, this is just n times the mean number of near neighbors around an arbitrary compound in the collection. Therefore, choosing the average number of near neighbors for Ms should work at least for small n. The calculations below where made on DDR.



Number of Compounds
(with less than 50 Near Neighbors):
56,723
Mean Number of Near Neighbors: 11.11


In order to assess the total number of compounds that can be covered (maximal coverage) by systematically choosing molecules uniformly over the space, the total number of near neighbors of n randomly chosen target compounds was computed. This number included duplicate compounds that where near neighbors of more than one design point. The total number of unique compounds among the near neighbors is the actual design coverage. For each of several sample sizes, n, the maximal design coverage and actual design coverage where computed for 100 separate designs of size n. The average maximal coverage and average actual coverage were computed based on these 100 samples. The average maximal coverage together with the maximal coverage that is predicted in theory (n*11.11) appear in Table 2 for several sample sizes. In addition, the actual coverage along with that predicted by theory appear in Table 3. Even though the number of near neighbors is far from constant, the theory appears to be useful for predicting coverage. To decide exactly how small the sample size must be in order to expect random to work as well as systematic, it is useful to look at the proportion of the maximal coverage possible that is attributable to overlap. The observed and theoretical overlap appear in Table 4. The theoretical overlap is apparently off by about a factor of two. Since overlap depends on the difference of large values that are being approximated, it is not entirely surprising that such a discrepancy exists. This error is expected to become worse as the skewness of the distribution of near neighbors increases. A plot of the overlap percentages appears in Figure 3. Clearly, the performance of the predictions gets worse as the samples size increases. If 10% expected overlap is the maximum tolerable, then using the actual numbers and interpolating produces the cut off sample size of 610 for using a random design. More specifically, samples sizes of 610 or less can safely be chosen at random. In theory, this cut off would be 1196, which is roughly twice the actual value. Since the theoretical cut off sample size is larger than the actual cut off sample size, this might be an indication that the candidate set is not as diverse as it could be.



Sample Size Observed Predicted
300 3,348 3,333
700 7,778 7,777
1,000 11,112 11,110
3,000 33,327 33,330


Table 2: Maximal coverage possible where the observed was computed from 100 samples and the predicted is the theoretical value.





Sample Size Observed Predicted
300 0.05 0.03
700 0.12 0.07
1,000 0.16 0.09
3,000 0.37 0.24


Table 3: Actual coverage



Sample Size Observed Predicted
300 0.05 0.03
700 0.12 0.07
1,000 0.16 0.09
3,000 0.37 0.24


Table 4: Proportion of maximal coverage that is attributable to overlap



Figure 3: Overlap as a function of sample size



The above information might prove useful for determining the cut off sample size for an arbitrary collection. For example, suppose that there is a collection of compounds of the same size as DDR which is more diverse in the sense that the average number of near neighbors is roughly half that of DDR. In theory, the cut off sample size for 10% overlap or less will be about 2360 and half this value is 1180 which can be used to estimate of the true cut off.

horizontal line

Discussion



Again, let us examine the tough questions posed. What is the inherent dimensionality of chemical space pertinent to drug design? How many features need to be relatively correct before a compound exhibits some activity? Certainly several binding points need to be present and correctly spaced. They would account for six to ten dimensions. The structure has to be roughly the correct size adding at least three more dimensions. Proper alignment with the binding points will easily contribute three more dimensions. Various physical properties, e.g. lipophilicity, need to be relatively correct, two to five dimensions. Accordingly, if each dimension is scaled to be one unit long, and to be relatively correct a structure has to be within 0.05 units of the optimum to be active, then we are dealing with a minimum of 10 compounds to span one dimension. If there are 11 to 18 dimensions, and if each needs to be saturated with 10 levels, then to fully span the space would take 1011 to 1018 compounds! Obviously, screening is relatively successful with much smaller screening set sizes. These numbers probably are more indicative of the number of compounds needed to be screened to find a relatively fully optimized compound. Fortunately, to get the six or so "critical" dimensions relatively correct, much smaller screening data sets would be required. It is left to each computational chemistry group to determine which dimensions are important and what is the width of the window for each dimension that we have to hit.

It bares repeating that the volume of space increases geometrically with the number of dimensions; a six dimensional space is 1000 times as large as a three dimensional space. It would be very useful to have some small number of variables that are pertinent to biological activity. Computational chemists can search for the magic few dimensions or analysis of data might indicate the important few dimensions for a particular type of biological activity. The fact that high dimensional spaces are so large is often termed "the curse of high dimensions." In the drug design situation it means that it is unlikely that a compound good enough to be considered an optimized drug is to be found by screening 10,000 or even 100,000 compounds. Experience supports this assertion. Screening can uncover a good neighborhood, but molecular optimization is necessary to develop a drug. However, there should be a break even point where the cost of additional general screening just equals the cost of specific synthetic modification. This break even point depends upon the compound and assay cost versus the cost of synthetic modification.

How many randomly selected points are needed to cover as well as a systematically selected set of points? A simple first answer is that if the coverage of each point is small, and if we are using a relatively small number of points, then both the randomly selected and systematically selected set of points cover almost exactly the same volume of space. Our second answer is that very dense areas should be identified and treated in a special way. For example, we could identify and remove all the steroids and then sample as many of them as deemed reasonable. Next, we would cover the volume of space in which they exist with the same density of points as is used in other regions. We would thus sample rationally with the same density over the whole space.

Obviously our theoretical analysis of diversity sampling has been simplified to make the theory and simulations tractable. Clearly, drug-like molecules are complex entities. The many facets of a molecular structure that are pertinent to drug action can only be described in high-dimensional space. Reduction of such space using multi- dimensional scaling, Martin et al.(1995) or principal components, Basak et al.(1995) have been used to simplify the problem of selecting a "diverse" sets of compounds. These methods project points from a high dimensional space into a lower dimensional space, but are most likely an over-simplification. It is probable that there are many important features, so the inherent dimensionality of the problem is likely to be much larger than the reduced dimensionality given by Martin et al. (1995) and Basak et al. (1995). We have assumed a constant radii around each point in our theory and simulations. It is quite possible that, in reality, there is not a single constant radii about each molecule that includes all "active" neighbors. In some parts of space, effective radii should be expanded whilst for points in other regions the radii may be almost negligible.

Another assumption made in the course of trying to adequately sample chemical diversity is that the descriptors computed actually matter to the targeted biological receptor. Rarely do we know the important features necessary for tight binding to a particular receptor a priori. Thus, the design of the chemical library or the selection of the most diverse set of compounds from a collection for a particular problem should use accumulated biological knowledge for variable selection when it is available. For example, active CNS agents generally have a basic nitrogen atom and an aromatic ring. This information could be used to restrict chemistry space to expected active regions.

There are two algorithmic methods to construct space filling designs, Tobias (1995). A "coverage" design seeks a set of points such that any point not in the design is not very far away from a design point. A "spread" design seeks a set of design points that are mutually as far from each other as possible. See Kennard and Stone(1969) for a spread algorithm. Coverage designs are more computationally intensive than spread designs and thus should only be used when thorough coverage of space is warranted.

There is a cost in computer time and project delay for the construction of a high quality rational design relative to a simple random selection. Timeliness is important for successful drug discovery. If it takes several days to a week to design using rational methods, research might advance more rapidly and cheaply using a simple random design. If a rational design is to be used, then other factors could be incorporated into these design as well. Synthetic ease, monomer availability, and elimination of toxic functionalities are some examples.

horizontal line

Conclusions



First, chemical space is very large. Unless a very large number of compounds are used to fill space, randomly selected compounds will cover as much space as carefully selected compounds.

Second, if the important dimensions for a particular problem are identified, and if a focused set of compounds is desired, then rational selection should be more effective than random designs.

Third, if diverse compounds are desired, realizing that coverage will not be improved by rational coverage methods, then fast spread methods can be used to select diverse compounds. This will ensure that selections will be as far apart as possible while eliminating possibility of overlapping selections.

horizontal line

References



ACD (Available Chemicals Directory), (1995) MDL Inc., San Leandro, CA.

Basak, S.C. and Grundwald, G.D. (1995) Predicting mutagenicity of chemicals using topological and quantum chemical parameters: A similarity based study. Chemosphere, 31: 2529-2546.

Bronowski, J. and Neyman, J. (1945) On the variance of a random set. Annals of Mathematical Statistics, 16: 330-341.

Carhart, R. E., Smith, D. H. and Venkataraghavan, R. Atom pairs as molecular features in structure-activity studies: Definitions and applications. J. Chem. Inf. Comput. Sci., 1985, 25, 64-73.

DDR (Drug Data Report), (1995), MDL Inc., San Leandro, CA.

Doehlert, D. H. (1970). Uniform shell designs. Journal of the Royal Statistical Society, Series C 19, 231-239.

Hall, L.H. (1993) Molconn-X, Version 2.0, A program for molecular topology analysis. Quincy, MA

Johnson, M.E., Moore, L.M., and Ylvisaker, D. (1990). Minimax and maximin distance designs. Journal of Statistical Planning and Inference, 26: 131-148.

Kennard, R.W. and Stone, L.A. (1969). Computer aided design of experiments. Technometrics, 11: 137-148.

Martin, E.J., Blaney, J.M., Siani, M.A., Spellmeyer, D.C., Wong, A.K. and Moos, W.H. (1995) Measuring diversity: Experimental design of combinatorial libraries for drug discovery. J. Med Chem., 38: 1431-1436.

Robbins, H. E. (1944). On the measure of a random set. Annals of Mathematical Statistics, 15: 70-74.

Robbins, H. E. (1945). On the measure of a random set II. Annals of Mathematical Statistics, 16: 342-347.

Romero, D.L, Busso, M., Tan, C.K., Reusser, F. Palmes, J.R., Poppe, S.M., Aristoff, P.A., Downey, K.M., So, A.G., Resnick, L., and Tarpley, G.(1991) Nonnucleoside reverse transcriptase inhibitors that potently and specifically block human immunodeficiency virus type 1 replication, Proc. Natl. Acad. Sci. USA. 88, 8806-8810.

Stoer, J. and Bulirsch, R. (1980) "Introduction to Numerical Analysis," Springer- Verlag.

Tobias, R. (1995). SAS QC Software. Volume 1: Usage and Reference. SAS Institute Inc. Cary, NC.

horizontal line

Appendix



Explanations of two theoretical results will be given. Formal proofs of these results can be found in the references. These explanations provide an intuitive grasp of the derivations.

Expected Space Covered

The question of how much volume is covered by n hyperspheres with centers randomly placed in a hypercube has been addressed by Robbins(1944), Robbins(1945) and Bronowski and Neyman(1945). They derive expressions for the mean of the covered space and the variance of the covered space. The problem is addressed in the context of measures of random sets. The measure theoretic abstraction is not necessary for obtaining an intuitive understanding of why the formula for average volume covered is true. The use of Fubini's Theorem can be replaced by a change of focus from the space covered by the hyperspheres to the probability that a small element or the hypercube is covered. For simplicity, the problem will be considered in two dimensions. In this case, the process can be viewed as dropping n disks randomly onto a square table. In order to derive the average area covered, the surface of the table can be divided into infinitesimal parts as illustrated in Figure 4. An arbitrary element will be denoted by i,j

Figure 4: Square subdivided into small parts.

Since the area of deltai,j is small, it will be assumed that either it is covered by a disk or not. Therefore, the area covered is given by

The expected value of Vc is the sum of the expected values of the terms, deltaA1deltai,j. The expected value of 1deltai,j is the probability that at least one of the n disks falls on deltai,j. This is one minus the probability that none of the n disks fall on deltai,j. The probability of a disk falling on a given area element is

where VS is the area of the disk and VT is the area of the tabletop. This follows because all area elements have an equal chance of being covered and VS/VT is the proportion of the total area that will be covered by a disk. The probability that no disks fall on deltai,j is just the probability that in n Bernoulli trials, there are no successes, where the probability of success is rho. This probability is

Therefore, the expected value of 1deltai,j is

Summing the expected values over the entire region produces

At the boundaries of the table, the chances of being covered by a disk will not be the same as at the interior. If the disk centers are allowed to reach the edge of the table and if the area over the edge of the table is included in the area covered, then this situation will fit the assumptions of equal probability of coverage better. At least the expected value will be closer to what is given by the theory.

Solving the Fixed Point Equation

The equation

can be solved by first starting with an arbitrary positive value n0 and then given the ith iterate, ni, the (i+1)th iterate is defined by

Note that a solution exits since for n0=1, n1=1/(1-P) which is larger than 1. Since 1 - rho < 1,


is increasing in n and it is concave (i.e. the second derivative is negative). Arguments from Stoer and Bulirsch (1980) chapter 5 can be used to show that the iteration converges to the solution of the equation. Figure 5 shows how the iterates will converge to the fixed point. The (i+1)th iterate can be found by drawing a vertical line from ni to the curve,

followed by drawing a horizontal line from the curve to the line y=n and finally drawing a vertical line back to the n axis.

Figure 5: Iteration process for an increasing concave function.



NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:

Network Science Corporation
4411 Connecticut Avenue NW, STE 514
Washington, DC 20008
Tel: (828) 817-9811
E-mail: TheEditors@netsci.org
Website Hosted by Total Choice