The Chemical Generation of Molecular Diversity

Michael R. Pavia

Sphinx Pharmaceuticals
A Division of Eli Lilly & Co.
840 Memorial Drive
Cambridge, MA 02139

horizontal line

http://www.netsci.org/Science/Combichem/feature01.html

horizontal line

The average cost for introducing a new drug entity to the marketplace is estimated at greater than $300 million dollars. Of this dollar figure, nearly one-third has been estimated to go to the discovery and optimization of a lead chemical structure. Furthermore, it has been estimated that the cost of preparing each novel molecule in the traditional pharmaceutical industry paradigm of individually synthesized molecules prepared in serial fashion is between $5,000-10,000. Clearly, an opportunity exists to reduce costs at this earliest stage of the drug discovery process; the identification of a novel lead chemical structure.

Chemical leads for the pharmaceutical industry are currently identified through rational design and/or mass screening. Traditional mass screening has been quite successful in identifying new leads. With the recent introduction of high throughput, automated screening technologies the evaluation of hundreds of thousands of individual test molecules per year against a large number of targets is now possible. The source of large chemical libraries still remains a limitation for most investigators. Compound libraries commonly used in mass screening consist of either an historical collection of synthesized compounds owned by pharmaceutical companies or natural product collections. Each of these libraries has limitations. Historical collections contain a limited number of diverse structures (e.g. - thousands of steroids, beta-lactams, etc.), and while useful, represent only a small fraction of diversity possibilities. Natural products are limited by the structural complexity of the leads identified and the difficulty of reducing them to useful pharmaceutical agents (e.g. - taxol).

During the past decade, a new source of compounds has arisen; those obtained through the rapid chemical (and sometimes biological) generation of compound libraries [1,2,3,4]. This wealth of new compounds, coupled with the ability to rapidly carry out their biological evaluation represents an important shift in the traditional paradigm for generating and optimizing new lead structures not only for the pharmaceutical industry, but in addition, may prove to be important in the agricultural, materials, and chemical industries.

From its earliest days over a decade ago, to the present, the field of chemical generation of molecular diversity has changed dramatically. Early attempts focused exclusively on the rapid generation of very large numbers of peptides. The majority of workers in the field today focus on the generation of non-peptide, low molecular weight compounds. A brief historical perspective follows.

Available methods for the generation of compound libraries differ considerably in the types and numbers of compounds prepared (tens through tens of millions), and whether the compounds are obtained as single structurally defined entities or as large mixtures. In addition, irrespective of what type of molecules, and how many one chooses to prepare, one must consider whether the products are screened attached to a solid support or in solution. Advantages and disadvantages of each method are also discussed below. It is however, safe to conclude that it is still too early to know with certainty which is the most useful approach for effective discovery of new pharmaceutical agents.

horizontal line

DIVERSITY GENERATION - AN HISTORICAL PERSPECTIVE

The following perspective gives representative examples of the methods reported to date and is not meant to be all inclusive.

The initial work in the field focused on peptide library methods (primarily due to the ready availability of the natural and unnatural amino acids and a well established coupling methodology). These methods can rapidly generate hundreds to millions of small to medium size peptides for identifying novel leads or to help elucidate the chemical basis of known ligand/ligate interactions by preparing and evaluating a large number of peptide analogues.

Key issues which differentiate the various approaches described below are the size of the library, whether they are generated as single compounds (active compound identified by it's physical location) or mixtures (active compound identified by it's tag for encoded libraries or through deconvolution, where an active compound is identified by iterative synthesis and screening of mixtures). One must also consider whether the compounds are to be screened on a solid support or in solution.

Pin technology: Geysen [5] demonstrated that peptides can be synthesized in numbers several orders of magnitude greater than by conventional one-at-a-time methods. The peptides are synthesized on polyethylene rods arranged in a microtiter plate format allowing ninety-six separate peptides to be simultaneously synthesized at the tips of the rods.

The pin technology is representative of techniques that generate libraries of single compounds in a spatially-differentiated manner. An alternative approach, to rapidly prepare large mixtures of compounds, was made possible by the introduction of the split-pool approach.

Split-pool method: An important consideration in synthesizing large mixtures of peptides (or any other class of molecules) is to assure that each final mixture component is present in approximately equimolar concentrations. This issue has been effectively addressed in the split-pool approach [6,7] (Figure 1) where the solid support material is physically segregated into equal portions for coupling to each of the individual initial reactants. This affords uniform coupling since competition between reactants is eliminated. The individual polymers are combined in a single vessel for washing and deprotection and then divided again into individual portions for the next coupling. The resulting synthetic products exhibit a statistical distribution of sequences. Using this approach, a complete set of possible molecular combinations is rapidly prepared in approximately equimolar amounts.

FIGURE 1
The Split-Pool Method

Iterative Deconvolution Using the "Tea-Bag" Method [8]: Small amounts of resins representing individual peptides are enclosed in porous polypropylene containers. The bags are immersed in individual solutions of the appropriate activated amino acids while deprotections and washings are carried out by mixing all the bags together. The bags are then reseparated for subsequent coupling steps (the split-pool method). Removal of the peptides from the resins affords peptides in soluble form. It is possible to rapidly prepare a collection of libraries which represents, for example, all 64 million naturally-occurring hexapeptides and identify an optimal peptide ligand for any ligate of interest. This method has the advantage that it affords fully-characterizable, non-modified, solution phase peptides which may afford more realistic interaction results than solid-support bound peptides. The method also affords variable quantities of soluble peptides for testing against virtually any target, and can be used with unnatural amino acids.

Physical isolation of active beads using the Selectide method: The library is prepared on polymeric beads by the split-pool method and incubated with a tagged ligate. Those ligates with bound peptides are identified by visual inspection, physically removed, and microsequenced [9]. The approach can also incorporate cleavable linkers on each bead where, after exposure to cleaving reagent, the beads release a portion of their peptides into solution for biological assay and still retain sufficient peptide on the bead for structure determination.

Encoded libraries: In this method an "identifier" tag is attached to the solid support material coincident with each monomer again using a split-pool synthesis procedure. The structure of the molecule on any bead identified through screening is obtained by decoding the identifier tags. Numerous methods of tagging the beads have now been reported. These include the use of single stranded oligonucleotides which have the advantage of being used as identifying tags as well as allowing for enrichment through the use of PCR amplification [e.g.. - Affymax; 10,11,12]. The use of halocarbon derivatives which are released from the active beads through photolysis and sequenced using electron capture capillary gas chromatography has also been described [e.g. - Pharmacopeia; 13]. It is noteworthy that various groups are now synthesizing small organic molecules using the encoded approach.

Phage libraries [14-17]: These libraries contain tens of millions of filamentous phage clones, each displaying a unique peptide sequence on the bacteriophage surface. The phage genome contains the DNA sequence encoding for the peptide. The ligate of interest is used to affinity purify phage that display binding peptides, the phage propagated in E. coli, and the amino acid sequences of the peptides displayed on the phage are identified by sequencing the corresponding coding region of the viral DNA. Tens of millions of peptides can be rapidly surveyed for binding. Initial libraries of short peptides generally afford relatively weak ligands. Longer epitope regions and/or constrained epitopes have recently been prepared. A limitation of this method is that only naturally occurring amino acids can be used and little is known about the effect of the phage environment. Phage technology has also effectively been applied to proteins and antibodies demonstrating that protein domains can fold properly on the surface of phage.

While each of the technologies described above afford a large number of compounds, the usefulness of these systems for the effective rapid discovery of drug candidates is limited since all of them result, at least as initially described, in the identification of peptide ligands. In most cases, small peptides are not suited as drugs due to in vivo instability and lack of oral absorption. Furthermore, the conversion of a peptide chemical lead into a pharmaceutically useful, orally active, non-peptide drug candidate is more difficult than identifying the original peptide lead since no general solution yet exists for designing effective peptide mimics.

For these reasons the field of library generation has turned to the production of peptide-like and small organic molecules.

Peptoid libraries: Scientists at Chiron have reported the preparation of peptoid libraries [18 and references therein]. These are a collection of N-substituted glycines as peptoid monomers which are assembled in an automated modular fashion. The structures of the resulting compounds are unique and subsequently are likely to display unique binding properties, and incorporate the important functionalities of peptides in a novel backbone. Furthermore, initial studies suggest this class of compounds posses intriguing biological activities and are resistant to enzymatic breakdown.

Low molecular weight organic molecules: The first examples of non-polymeric low molecular weight diversity generation focused on benzodiazepines [19,20] and hydantoins [20] on a solid support. Both methods generated single structurally well-defined molecules in a solution format after cleavage from a solid support. Hobbs-DeWitt et al [20] presented a method of cleaving the final molecules from the resin by intramolecular reaction that ejected the compound from the solid support while leaving no remnant of the site of attachment. Both groups are continuing to expand the repertoire of chemistries which can be prepared by these methods [21].

Others groups such as ArQule [22] (Medford, MA) and ComGenex Ltd. (Budapest) are preparing libraries using automated solution chemistry. The ComGenex Matrix technology is used for pre-evaluation of the class of molecules planned for synthesis, project planning including cost and time analyses, and project implementation.

High speed synthesis techniques are also being used for preparation of both individual compounds as well as compound mixtures at Pfizer. The molecules possess drug-like characteristics, being non-peptide in nature with low molecular weights and with structures that incorporate known drug-like pharmacophores. Mixtures of compounds are tested in solution and no tagging procedure is needed; the actives are identified after a single iterative deconvolution step.

A host of other academic and industrial laboratories are vigorously pursuing unique organic molecule libraries including Kurth [23] (UC - Davis; polymer supported synthesis of b-mercaptoketones) and Ontogen (Carlsbad, CA).

While it is still too early in the game to know with certainty which is the most useful approach for effective discovery of new pharmaceutical agents, there are certain observations which seem intuitive to us.

  • Most pharmaceutical agents are low molecular weight organic compounds, not peptides which possess the many disadvantages detailed above.

  • Screening of single, structurally defined molecules has a proven track record in the industry. In contrast, screening large mixtures of compounds raises a number of issues, all related to the question: can a single active component be consistently identified from large mixtures?

  • Screening of compounds attached to a solid support raises the issue of whether the solid support interferes with the assay. Synthesis on a solid support greatly simplifies the problem of product isolation from reaction mixtures. The solid phase methodology also facilitates the division of products into multiple aliquots for multiple simultaneous reactions. As medicinal chemists we well know the often large effect of replacing a hydrogen with a methyl group for example. What will happen when we replace the hydrogen atom with a polymer bead or other solid support?

The Sphinx Approach - For the reasons discussed above we have chosen to prepare a large number of individual molecules of known structural identity using a multiple simultaneous synthesis approach. The individual chemical reactions are carried out by a multi-step organic synthesis procedure in a spatially addressable and parallel format so that one single and well defined compound is prepared at each site of the solid support and multiple syntheses are carried out simultaneously. After completion of the synthesis the compound is removed from the solid support to afford soluble compounds for testing. The libraries consist of non-peptide compounds, are of low molecular weight to maximize the chance for oral bioavailability, and are generated in sufficient quantities for use in multiple screens.

A simple, useful apparatus for multiple simultaneous synthesis has been constructed and it's successful utilization for rapid preparation of organic molecules via multi-step resin-based procedures was initially demonstrated by preparing a series of hundreds of bis-amide phenols in excellent yields and purity levels (Figure 2) [24].

FIGURE 2
Bis-Amide Phenol Library

We chose to utilize solid phase synthesis methods since they avoid tedious workups and purifications typically performed after each synthetic step in solution reactions. These advantages are amplified when constructing even relatively small libraries of hundreds of compounds.

A 96-well format was chosen, the format used in numerous biological applications, including automated high throughput screens. Plate synthesis was carried out on resin placed in sterile polypropylene deepwell plates which were modified for filtration by drilling a small hole in the bottom of each well, then placing a porous polyethylene frit into the bottom of each well. An aluminum plate clamp was made as a two-piece assembly, consisting of a solid base clamp fitted with four removable corner stainless steel studs, and a frame clamp which fits atop the plate and is secured with wing nuts. A gasket was utilized on the base clamp to prevent leakage of well contents. A vacuum plenum made from Delrin facilitated filtration of resin washings and isolation of solution libraries. The deepwell plates fit in a recessed seat at the top of the plenum, which has sufficient height to accommodate a rack of microdilution tubes. A hole with a Teflon tube connector resides on the plenum floor, and an aspirator or high vacuum can be attached here. The inexpensive equipment and limited human resources required for this effort illustrates the economic and productivity advantages of this technology.

The Universal Library Concept - Any biological macromolecule (receptor, enzyme, antibody, etc.) recognizes binding substrates through a number of precise physicochemical interactions. On a fundamental level these interactions can be divided into a number of different parameters or dimensions such as size, hydrogen bonding ability, hydrophobic interactions, etc. We are exploring a representative sampling of multi-parameter space by designing libraries which orient groups responsible for these binding interactions at unique locations in space through a scaffolding approach. A large number of compounds prepared around each scaffold will explore unique sizes, shapes, and volumes. Subsequent chemically-related scaffolds will explore different sizes, shapes, and volumes by simple chemical modifications. A collection of such sub-libraries will represent a universal library designed to rapidly explore multi-parameter space and consequently effectively identify a chemical lead for any biological target of interest (ion channel, receptor, antibody, enzyme, etc.).

The universal library is being generated using a double combinatorial approach, a novel technology developed at Sphinx (Figure 3). In this scheme, functional groups, representing various physicochemical interacting properties, are introduced onto the first solid support-bound scaffold building block. Then the second scaffold building block is added followed by an additional round of functional group introduction. The final target molecule is then cleaved from the solid support to afford the desired product in solution and ready for screening. By applying the double combinatorial approach a very large number of highly functionalized low molecular weight target molecules can be rapidly produced from a small collection of building blocks.

FIGURE 3
The Double Combinatorial Approach

A major challenge was to select a structural class of target molecules of sufficient generality to allow a wide variation in substitution patterns as well as a method to readily allow changes in overall size, shape and physical properties. The biphenyl scaffold has been selected as our initial class of target molecules which allows for facile introduction of three or four functional groups in a large number of spatial arrangements (Figure 4). Furthermore, we have built into our design a simple method of changing the size, shape, and physical properties of the final products. While the products are prepared via a multi-step procedure involving a wide range of solid support based chemistries, the final products are cleaved from the resin for biological evaluation. It is important to note that the final cleavage reaction results in a pendant methyl group. This is significant since we desired not to have an invariant OH or CO2H group in our final product since we wish to display important functional groups in space and don't wish to be limited by a strongly interacting invariant functional group. The required chemistries for library generation are now operational and we are in the process of preparing the desired libraries. Details will be published elsewhere.

FIGURE 4
Generic Structure of the Biphenyl Library

In conclusion, we have devised a versatile class of molecules which allows for the rapid display of multiple functional groups in large numbers of spatial arrangements and also allows simple modifications to significantly change the size, shape, and physical properties of the target molecules. We believe that related collections of many such compounds/libraries can be designed to effectively explore a large segment of multi-parameter space. These universal libraries will be rapidly screened against biological targets of interest (receptors, enzymes, ion channels, antibodies, etc.) to quickly afford novel chemical leads.

horizontal line

The field of chemical diversity generation is experiencing enormous growth within the pharmaceutical industry. In the past several years almost every major pharmaceutical and biopharmaceutical company has initiated or acquired an effort in this area. These groups are continuing to develop improved methods of generating and evaluating structural diversity, particularly in the realm of small organic molecule synthesis. It is however, important to keep in mind that an integrated approach must be developed which includes not only compound generation, but computational analysis of the diversity of the compounds prepared, automation, information handling, and patent considerations.

Genomics advances are likely to result in an enormous number of new and relevant biological targets for screening. Not only will combinatorial libraries be useful in identifying new drug candidates, but they will also be useful in evaluating and validating the physiological relevance of these new targets in a much shorter timeframe than using the traditional discovery protocols.

With the exponential growth in the chemical generation of molecular diversity we can reasonably expect that these libraries will significantly decrease the time and cost of discovering novel therapeutic agents for multiple disease states. The chemical generation of molecular diversity represents one of the most significant paradigm shifts in the industry in decades. By greatly expanding the range of compounds available for biological screening, combinatorial chemistry has become an essential component in the discovery of pharmaceutical agents.

horizontal line

REFERENCES

  1. Moos, W.H., Green, G.D., Pavia, M.R. "Recent advances in the generation of molecular diversity", Ann. Rep. Med. Chem., 1993, 28: 315-324.

  2. Pavia, M.R., Sawyer, T.K., Moos, W.H., "The generation of molecular diversity", Bioorganic Medicinal Chem. Lett., 1993, 3: 387-96.

  3. Gallop, M.A., Barrett, R.W., Dower, W.J., Fodor, S.P.A., Gordon, E.M. "Applications of combinatorial technologies to drug discovery. 1. Background and peptide combinatorial libraries", Journal of Medicinal Chemistry, 1994, 37: 1233-1251.

  4. Gordon, E.M., Barrett, R.W., Dower, W.J., Fodor, S.P.A., Gallop, M.A., "Applications of combinatorial technologies to drug discovery. 2. Combinatorial organic synthesis, library screening strategies, and future directions", Journal of Medicinal Chemistry, 1994, 37: 1385-1401.

  5. Geysen, H.M., Meleon, R.H., Barteling, S.J., "Use of peptide synthesis to probe viral antigens for epitopes to a resolution of a single amino acid", Proc. Nat. Acad. Sci. U.S.A., 1984, 81: 3998-4002.

  6. Furka, A., Sebestyen, F., Asgedom, M., Dibo, G. 1988, Abstr. 14th Int. Congr. Biochem., Prague, Czechoslovakia, Vol 5, pg 47. Abstr. 10th Intl. Symp. Med. Chem., Budapest, Hungary, pg 288.

  7. Houghten, R.A., "General method for the rapid solid-phase synthesis of large numbers of peptides: Specificity of antigen-antibody interaction at the level ofindividual amino acids", Proc. Natl. Acad. Sci. U.S.A., 1985, 82: 5131-35.

  8. Houghten, R.A., Pinilla, C., Blondelle, S.E., Appel, J.R., Dooley, C.T., Cuervo, J.H., "Generation and use of synthetic peptide combinatorial libraries for basic research and drug discovery", Nature, 1991, 354: 84-6.

  9. Lam, K.S., Salmon, S.E., Hersh, E.M., Hruby, V.J., Kazmierski, W.M., Knapp, R.J., "A new type of synthetic peptide library for identifying ligand-binding activity", Nature, 1991, 354: 82-84.

  10. Brenner, S., Lerner, R.A., "Encoded combinatorial chemistry", Proc. Natl. Acad. Sci. U.S.A., 1992, 89: 5381-5383.

  11. Nielsen, J., Brenner, S., Janda, K.D., "Synthetic methods for the implementation of encoded combinatorial chemistry", J. Am. Chem. Soc., 1993, 115: 9812-9813.

  12. Needels, M.C., Jones, D.G., Tate, E.H., Heinkel, G.L., Kochersperger, L.M., Dower, W.J., Barrett, R.W., Gallop, M.A., "Generation and screening of an oligonucleotide-encoded synthetic peptide library", Proc. Natl. Acad. Sci. USA , 1993, 90: 10700-10704.

  13. Ohlmeyer, M.H., Swanson, R.N., Dillard, L.W., Reader, J.C., Asouline, G., Kobayashi, R., Wigler, M., Still, W.C., "Complex synthetic chemical libraries indexed with molecular tags", Proc. Natl. Acad. Sci. USA, 1993, 90: 10922-10926.

  14. Smith G.P., "Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface", Science, 1985, 228: 1315-17.

  15. Cwirla, S., Peters, E.A., Barrett, R.W., Dower, W.J., "Peptides on phage: a vast library of peptides for identifying ligands", Proc. Natl. Acad. Sci. USA, 1990, 87: 6378-82.

  16. Scott, J.K., Smith, G.P., "Searching for peptide ligands with an epitope library", Science, 1990, 249: 386-90.

  17. Devlin, J.J., Panganiban, L.C., Devlin, P.E., "Random peptide libraries: a source of specific protein binding molecules", Science, 1990, 249: 404-6.

  18. Zuckermann, R.N., Martin, E.J., Spellmeyer, D.C., et al, "Discovery of nanomolar ligands for 7-transmembrane G-protein-coupled receptors from a diverse N-(substituted)glycine peptoid library", J. Med. Chem , 1994 , 37: 2678-85.

  19. Bunin, B.A., Ellman, J.A., "General and expedient method for the solid- phase synthesis of 1,4-benzodiazepine derivatives", J. Am. Chem. Soc., 1992, 114: 10997-8.

  20. DeWitt, S.H., Kiely, J.S., Stankovic, C.J., Schroeder, M.C., Reynolds Cody, D.M., Pavia, M.R., ""Diversomers": an approach to nonpeptide, nonoligomeric chemical diversity", Proc. Natl. Acad. Sci. U.S.A., 1993, 90: 6909-6913.

  21. Bunin, B.A., Plunkett, M.J., Ellman, J.A., "The combinatorial synthesis and chemical and biological evaluation of a 1,4-benzodiazepine library", Proc. Natl. Acad. Sci. USA, 91: 4708-12.

  22. Hogan,J.C., "New Aminimide compounds useful as peptide, protein, nucleotide, carbohydrate lipid and polymer mimetics", WO 9401102 A, 1994.

  23. Chen, C., Ahlberg Randall, L.A., Miller, R.B., Jones, A.D., Kurth, M.J., ""Analogous" organic synthesis of small-compound libraries: validation of combinatorial chemistry in small-molecule synthesis", J. Am. Chem. Soc, 1994, 116: 2661-2662.

  24. Meyers, H.V., Dilley, G.J., Durgin, T.L., Powers, T.S., Winssinger, N.A., Zhu, H., Pavia, M.R., "Multiple Simultaneous Synthesis of Phenolic Libraries", Molecular Diversity (in press).


NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:

Network Science Corporation
4411 Connecticut Avenue NW, STE 514
Washington, DC 20008
Tel: (828) 817-9811
E-mail: TheEditors@netsci.org
Website Hosted by Total Choice