Affinity Fingerprints: Applications and Implications

Hugo O. Villar

Department of Chemistry
Terrapin Technologies, Inc.
750-H Gateway Blvd
South San Francisco, CA 94080-7020 USA

http://www.netsci.org/Science/Cheminform/feature07.html

Introduction

High throughput screening (HTS) has become a dominant tool in the pharmaceutical industry for the discovery of lead compounds that can be modified into candidates for drug development. Several independent events have contributed to its present importance. At the end of the 80s there was a growing perception that computer assisted molecular modeling had had limited impact in areas of key strategic importance for the industry, while screening of chemical files was deemed more successful in some critical areas [1]. In parallel, substantial advances in instrumentation, molecular biology and protein chemistry led to the facile adaptation of biochemical activity screens into microplate formats, making possible the screening of large sets of chemicals. Manufacturers of liquid handling equipment and detection instruments quickly responded to the needs of the industry, making the HTS paradigm possible. In all, the technological advances, together with the perceived failure of more rational approaches, provided random screening with a novel facade. While a few years earlier the words random and screening implied "ignorance about the system", and "an inefficient, inelegant and irrational method of research", random screening had now come back as a reputable technology [2].

HTS techniques allow hundreds, and even thousands of compounds to be screened daily, which generates a need for massive chemical libraries. Large chemical libraries have become an essential part of the technology, without which HTS would come to a halt. Such need propelled the area of combinatorial chemistry and molecular diversity [3]. Initially, the components in naturally occurring biopolymers were used to generate the large sets of chemicals. Two factors caused the expansion of the set of building blocks beyond the naturally occurring monomers. First, was the notion that peptides and oligonucleotides present bioavailability and pharmacokinetic problems that may hold back their ultimate use as pharmacotherapeutic agents. Second, was the realization that working with biopolymers provided limited diversity compared to what might be possible using other building blocks, scaffolds or linkers [4].

The search for the samples required has expanded beyond the generation of compounds using combinatorial chemistry. Nowadays, most of the large chemical libraries that have been generated over the past several decades world wide are being tapped. Commercial suppliers of rare chemicals have sprung up and update their collections periodically with additions from academic and research institutions. For the most part, these collections constitute a finite resource, whose regeneration is quite impractical. Nonetheless, this rich inheritance is withering in a process in which all but a handful of compounds are being scored as negative in each particular assay. Unless the current paradigms for the discovery of leads changes significantly, the process is only likely to accelerate as the human genome project progresses, offering an ever wider variety of potential targets.

Terrapin has adopted an approach for lead discovery, that was enabled by technological developments for HTS, but which yields results that are quite complementary in use to standard HTS. This lead discovery technology, called TRAP(TM), uses HTS methodology to generate a chemical classification system that provides information about how organic compounds bind to proteins in general. The resulting database created can then be utilized efficiently to retrieve a lead from the chemical library while minimizing the quantities of the macromolecular target that are required, reducing the number of physical assays ncessary and optimizing the conservation of compound stocks.

Affinity Fingerprinting

Traditionally, the drug design process has relied on physicochemical properties as a way to organize the knowledge that accumulates during the discovery process. Properties such as octanol/water partition coefficients, molecular refractivity, topological indices, charges, dipole moments or structural properties are commonly cited to explain the bioactivity observed in a compound by those involved in drug design. The approach contrasts with the more conventional description of a molecule adopted in biology, where a compound is described in terms of its ability to affect biological systems, for instance its ability to bind to a series of receptors or other measurable effects. Biological descriptors had not been utilized in the drug design process and were thus expected to provide unique insights.

The biological descriptor used in the TRAP system is defined as the pattern of affinities of a particular molecule for a set of proteins. The binding affinity determined constitutes a molecular descriptor. If an invariable set of proteins is used for all the compounds in a library, the collection of affinities for each compound provides a uniform molecular descriptor that permits establishing similarities and differences among chemicals. In this way, the affinity patterns constitute an identifier of the molecule. With an appropriate panel, essentially all small organic compounds show unique patterns of binding, justifying calling these patterns affinity fingerprints. [5]

The affinity fingerprints can be utilized in the same manner as other more traditional descriptors based on physicochemical properties. Maximally diverse sets of compounds can be established, by selecting compounds with significant differences in their affinity fingerprints. Such maximally diverse sets can be used in a screen process [6]. Alternatively, compounds with similar affinity fingerprints can be expected to display similar biological properties. In this way, if only a maximally diverse subset is initially screened, and a hit is found in this process, other compounds in the library with similar affinity fingerprints can be retrieved, maximizing the chances of finding additional compounds similar to the initial lead. Both uses have been validated in recent research at Terrapin [5- 7].

Identification of a Reference Panel of Proteins

The definition of the affinity fingerprint requires the selection of a set of proteins to constitute the reference panel. Since the number of members in the reference panel will determine the magnitude of the experimental task required to generate the fingerprints, we attempt to minimize the number of proteins in it. For a small number of proteins to suffice for fingerprinting very large numbers of molecules, each must recognize a wide variety of compounds, but the variety should be quite different for each panel member. In other terms, we want to admit as members of the panel only proteins with sufficient information content, but that at the same time do not repeat the information provided by other proteins already in the panel.

The problem of finding proteins able to recognize a wide range of chemicals (high information content) can be overcome by recording the binding information up to the limits of detection of the technique used to generate the binding affinity information. In this way, high rates of detectable binding are easy to achieve. The more difficult criterion is to find proteins whose affinity patterns are uncorrelated with each other or with mathematical combinations of several others in the panel. While it is simple to find pairs of proteins whose affinities for the same set of compounds are completely uncorrelated, we have found that it is harder to find proteins whose affinities for a set of compounds is uncorrelated with a combination of the affinities of the same compounds for panels of other proteins that have more than 10 members.

In other words, we have found that the logarithm of the IC50 of a series compounds can be expressed as a combination of logarithms of the IC50 values of the same compounds against a panel of structurally and functionally unrelated proteins [7],

log (IC50) C,T = AR log (IC50) C,R

where the sum is carried out over all the proteins in the reference set (R). If a set of coefficients AR can be found that adequately represents the binding proteins of protein T, then there is no point in including the candidate protein T in the panel, because the information it contains can readily be generated by other members of the panel. The task of finding new members for the panel becomes progressively more difficult the larger the number of proteins that are already part of the panel.

Based on these two criteria, several hundred proteins were screened to come up with a set of 18 proteins that are useful as a reference panel for the determination of the affinity fingerprints. When small organic molecules are tested against this panel they generate unique patterns of binding, yielding each compound's affinity fingerprint.

Generation of the Affinity Fingerprints

For the generation of the affinity fingerprints we have developed a strategy for the rapid profiling of compound libraries, measuring the binding of each compound to the reference panel of proteins [8]. To obtain the binding data, an IC50 for each compound's interaction should be determined. For a set of 10,000 compounds this amounts to 180,000 IC50s or well over a million data points, which clearly could not be obtained without the tools of HTS.

In the TRAP system, the required data sets are generated using the technique of fluorescent polarization. Fluorescent polarization is a nearly ideal technique for this task, and for automation in general. The technique has only recently become available in microplate format, but has a long-standing proven record in clinical immunoassays.

At our company, the Fluorolite FPM-2 Fluorescent Polarization Microtiter Reader (Jolley Consulting and Dynatech Laboratories, Inc) was interfaced to a Zymark Zymate II robotic system. The system in-house has an assay cycle time of 3.25 min., with a throughput (each compound titrated at 4 dilutions) of 380 compounds per hour. We can have unattended operation for up to 28 hours at a time. In a period of six weeks we have recently completed the determination of affinity fingerprints for 10,000 compounds.

The speed at which the data is generated by the robotic system represents a real challenge for the timely generation of the IC50 values. A four parameter logistic function was selected to converge on best estimates. All of the data is generated on a personal computer. The SCREEN package (MDL, Inc., San Leandro, CA) is used for compound entry and interfacing with an ORACLE (Oracle Corp., Redwood City, CA) database.

Many of the advantages and potential challenges of the application of Fluorescent Polarization to HTS have been recently described in detail [8]. However, our conclusion is that the technique is ripe for significant expansion. A combination of improvements in instrumental sensitivity, reagent design, availability of cloned targets and dissemination of methods for calculation, will make it a major option for groups in HTS.

Applications of the Affinity Fingerprints: the TRAP technology [5-7]

The real power of affinity fingerprinting becomes evident once the database of fingerprints has been compiled. The determination of the fingerprints is an effort that is required only once for any given compound. Once the affinity fingerprints have been generated, they can be used in the same way as other sets of molecular descriptors. Indeed, the affinity fingerprint is a vector where each of the elements is an IC50 value determined for a given protein. The vectors represent a point in the affinity fingerprint space, and distances between points can then be measured. Compounds with similar affinity fingerprints will have similar vectors associated, and therefore the distances between them will be small. Conversely, compounds with significantly different affinity fingerprints will be separated by larger distances. The larger the separation the more dissimilar the compounds are, in terms of their affinity fingerprints.

The analysis of the distances between the vectors associated to a given set of compounds affords a way to select subsets of compounds with diverse fingerprints, which provides a way to select a core library for screening. The use of a small core library presents an advantage when cost is an issue, the assay does not lend itself to automation, the target is not available in the quantities required, or as an effort to contain the consumption of compound stocks.

The separation between affinity fingerprints provides an empirical way to assess the chemical variety of a chemical library. If large numbers of duplications in fingerprints occurred, then the variety of the compounds in that library would not be large. To estimate the overall redundancy in the fingerprints, two parameters can be considered; one is the maximal separation between points observed for the library. The larger the maximal separation observed, the larger the portion of the affinity fingerprint space that the compounds occupy and consequently the more dissimilar the most different compounds in the library are. The second parameter is the most frequently observed distance between points. The larger the distance, the less likely it is that two compounds are close and therefore the less likely it is that any pair of compounds can be classified as similar.

As mentioned above, it is relatively hard to find new members for the panel, because a combination of the compounds' affinities for the proteins in the panel reproduce the affinities of other proteins. This finding can be utilized to our advantage as a system for virtual screening. We can use the information generated when the core library was screened to determine the coefficients AR in equation 1. If such an equation is possible, then the database of affinity fingerprints can be used to rank order the remaining compounds in our chemical library according to affinities predicted for the target. The compounds with the highest predicted affinities can be selected for direct screening against the target. If the potencies correspond to the predictions, then we have, in effect, screened the entire library. If there are significant differences between the predictions and the actual values, these values can be utilized to refine the coefficients AR and attempt the procedure one more time. The relation has been found to work even when there is no previously recognizable similarity between the target protein and any of the reference proteins. Moreover, the core library (which in this case is named a training set borrowing from the chemometrics terminology) can be very small, containing less than 100 compounds. Such a small number of compounds can be physically screened even in the most cumbersome assays, or with small amounts of the target at our disposal. We have been successful using this approach to generate primary hits of better than 10 micromolar potency for 14 out of 18 targets tested so far.

Implications of the Affinity Fingerprints

The mere existence of a set of reference proteins that could be used for lead discovery poses a series of very interesting scientific questions. All point to the fact that it is possible to transfer binding information among binding sites. The existence of affinity is a generalization of the common observation that a compound can interact with more than one macromolecule. Such lack of specificity points to the idea that at a certain level, unrelated binding sites share features that permit them to recognize the same compounds. The TRAP technology uses the small molecules as probes to characterize those similarities.

Those similarities are more widespread than might be expected in principle. Even drugs that act at specific targets also display a lack of specificity [9]. Within concentrations that are one or two fold that of their ED50 for their target, they normally show significant affinity for other unintended targets, an effect associated to the drug's side effects. The reason for this lack of specificity may be intrinsic to the nature of the biological systems. Use of twenty aminoacids and the same linker may limit the total number of possibilities at the binding sites. Moreover, in a recent study the relative abundance of the 20 natural amino acids was calculated for a large series of proteins whose crystal structure has been determined [10]. The analysis revealed that certain amino acids tended to be overrepresented at binding sites, in the same way that hydrophobic residues are found at the core of proteins. Although individual protein binding sites are certainly unique, the results point to the existence of underlying similarities in the way small molecules and proteins interact.

References

1. R M. Snider, J.W. Constantine, J.A. Lowe III, K.P. Longo, W.S. Lebel, H.A. Woody, S.E. Drozda, M.C. Desai, F.J. Vinick, R.W. Spencer, H.J. Hess. "A Potent Non Peptide Antagonist of the Substance P (NK1) receptor., Science, 251: 435-439 (1991).

2. A. Pluckthum and L. Ge. "The rationality of Random Screening- Efficient Methods of Selection of Peptides and Oligonucleotide Ligands", Angew. Chemie Int. Ed. Engl., 30: 296-298 (1991).

3. M. H. Lyttle. "Combinatorial Chemistry: A Conservative Perspective", Drug Dev. Res., 35: 230-236 (1995).

4. E.J. Martin, J.M. Blaney, M.A. Siani, D.C. Spellmeyer, A.K. Wong and W.H. Moos, "Measuring Diversity: Experimental Design of Combinatorial Libraries for Drug Discovery", J. Med. Chem., 38: 1431-1436 (1995).

5. L.M. Kauvar. "Affinity Fingerprinting: A novel approach to quantitative chemical classification proves useful in drug discovery", Bio/Technology, 13: 965-966 (1995).

6. L.M. Kauvar. "Affinity Fingerprinting: an index to chemical variety", Pharm. Mfr. Intl., 8: 25-28 (1995).

7. L.M. Kauvar, D.L. Higgins, H.O. Villar, J.R. Sprotsman, A.E. Engqvist- Goldstein, R. Bukar, K.E. Bauer, H. Dilley and D.M. Rocke. "Predicting Ligand binding to proteins by affinity fingerprinting", Chem. & Biol., 2: 107-118 (1995).

8. J. R. Sportsman, S.K. Lee, Hara Dilley and R. Bukar, in High Throughput Screening: the discovery of bioactive substances, A. Kolb and J. Devlin (Eds.), Marcel Dekker, in press.

9. La Bella, F.S. "Molecular Basis for binding promiscuity of antagonist drugs", Biochem. Pharmacol., 42: 51-58 (1991).

10. H.O. Villar and L.M. Kauvar. "Aminoacid Preferences at Protein Binding Sites", FEBS Lett., 349: 125-130 (1994).

Additional information on fluorescent polarization as a High Throughput Screened method can be obtained from Dr. Richard Sportsman (rich_sportsman@trpntech.com).



NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:

Network Science Corporation
4411 Connecticut Avenue NW, STE 514
Washington, DC 20008
Tel: (828) 817-9811
E-mail: TheEditors@netsci.org
Website Hosted by Total Choice