Combinatorial Chemistry: A Strategy for the Future

NOTICE: This article contains material which originally appeared in the March 1995 issue of the Molecular Connection

For over a year, "combinatorial chemistry" has been discussed throughout the pharmaceutical and biotechnology industries. At MDL, the anticipated release of Project Library highlights MDLI's commitment to this field. But what exactly is combinatorial chemistry?

Combinatorial chemistry is one of the important new methodologies developed by academics and researchers in the pharmaceutical, agrochemical, and biotechnology industries to reduce the time and costs associated with producing effective, marketable, and competitive new drugs. Simply put, scientists use combinatorial chemistry to create large populations of molecules, or libraries, that can be screened efficiently en masse. By producing larger, more diverse compound libraries, companies increase the probability that they will find novel compounds of significant therapeutic and commercial value. The field represents a convergence of chemistry and biology, made possible by fundamental advances in miniaturization, robotics, and receptor development. And not surprisingly, it has also captured the attention of every major player in the pharmaceutical, biotechnology, and agrochemical arena.

While combinatorial chemistry can be explained simply, its application can take a variety of forms, each requiring a complex interplay of classical organic synthesis techniques, rational drug design strategies, robotics, and scientific information management. This article will provide a basic overview of existing approaches to combinatorial chemistry, and will outline some of the unique information management problems that it generates.

Approaches to Combinatorial Chemistry

As with traditional drug design, combinatorial chemistry relies on organic synthesis methodologies. The difference is the scope--instead of synthesizing a single compound, combinatorial chemistry exploits automation and miniaturization to synthesize large libraries of compounds. But because large libraries do not produce active compounds independently, scientists also need a straightforward way to find the active components within these enormous populations. Thus, combinatorial organic synthesis (COS) is not random, but systematic and repetitive, using sets of chemical "building blocks" to form a diverse set of molecular entities. Scientists have developed several different COS strategies, each with the same basic philosophy--stop shooting in the dark and instead, find ways to determine active compounds within populations, either spatially, through chemical encoding, or by systematic, successive synthesis and biological evaluation (deconvolution).

There are three common approaches to COS. During arrayed, spatially addressable synthesis, building blocks are reacted systematically in individual reaction wells or positions to form separated "discrete molecules." Active compounds are identified by their location on the grid. This method has been applied in scale (as in the Parke-Davis Pharmaceutical DIVERSOMER technique), as well as in miniature (as in the Affymax VLSIPS technique). The second technique, known as encoded mixture synthesis, uses nucleotide, peptide, or other types of more inert chemical tags to identify each compound.

During deconvolution, the third approach, a series of compound mixtures is synthesized combinatorially, each time fixing some specific structural feature. Each mixture is assayed as a mixture and the most active combination is pursued. Further rounds systematically fix other structural features until a manageable number of discrete structures can be synthesized and screened. Scientists working with peptides, for example, can use deconvolution to optimize, or locate, the most active peptide sequence from millions of possibilities. You could say that combinatorial chemistry is a technologically advanced way of finding a needle in a haystack. The whole idea is to remove the guesswork and instead, to create and test as many compounds or mixtures as possible--logically and systematically--to obtain a viable set of active leads.

Managing Combinatorial Chemistry Libraries

As with traditional drug design, the ability to integrate different types of chemical, biological, and corporate information is crucial to combinatorial chemistry techniques. But combinatorial chemistry also generates an enormous amount of information which present day information systems have a hard time managing. Combinatorial chemists also ask different questions in different ways, and their information systems need to adapt to find these answers quickly.

For example, chemists planning a traditional synthesis typically conduct a retrosynthetic analysis to determine the best, and perhaps cheapest, way to obtain the target. And while combinatorial chemists also look at retrosynthetic trees to build combinatorial libraries, their priorities differ. "By which modes of forward synthesis are the most building blocks available or obtainable?" they might ask. "And if I allow the synthesis to proceed by this course, what is the scope and reliability of the necessary reactions?" Combinatorial chemists need a way to access this type of reaction information efficiently. In addition, one of the largest bottlenecks in the construction of combinatorial libraries is in obtaining the basic building blocks necessary to run each reaction. Chemical information systems that can quickly access updated databases of inventory and commercially available reagents are invaluable tools in reagent acquisition (see Figure A).


An Archival Revolution

Once built, combinatorial libraries produce unprecedented amounts of information. Reaction histories for each compound must be archived. Robots and other laboratory instruments need to be controlled, and the data they acquire archived for future reference. Scientists need to integrate screening results and biological data with structural information. As in single-molecule archival systems, the archival of combinatorial libraries and their corresponding data is essential to cost-effective research and development. Basic archival and reporting capabilities can provide managers with the facts they need to justify the costs associated with combinatorial chemistry. And researchers can use scientific information management systems to avoid past mistakes, learn from previous successes, and answer critical questions, such as, "How much of this library overlaps with other corporate libraries?" or "Which building blocks have proven most successful?" or "What is the difference between the biological performance of a molecule produced in a mixture rather than in a discrete format?"

Combinatorial chemistry is a promising new field that stands to revolutionize the chemical industry, and demands completely new scientific information management solutions. Combinatorial chemists will be able to meet their goals if they can find ways to plan libraries quickly, produce libraries that better interrogate biological assays, and learn from past screening results. Using software that can orchestrate the planning, building, screening, and interpretation of synthesized libraries, combinatorial chemistry programs will begin to realize their promise of minimizing the time and cost associated with bringing new molecular entities to market.


Project Library: A New Tool for a New Paradigm in Drug Design

Proper management of combinatorial chemistry libraries requires an application that understands the science behind combinatorial chemistry while managing the chemical and biological data generated by combinatorial chemistry programs. To meet these ends, MDL plans to release Project Library, a complete, ready-to-use desktop software application that supports the multiple combinatorial chemistry research methods in use today, including functions such as:

  • Storage of both oligomeric and non-oligomeric structures.
  • Tracking of mixture and discrete compound libraries.
  • Elucidation of mixtures or discrete compounds from a library derived from any of the active identification strategies in use today.

In addition, information concerning the components or building blocks of the library must be processed, stored, and tracked. And because very large numbers of novel structures have to be considered when designing new combinatorial libraries, researchers must be able to enumerate experimental or virtual libraries (groups of either subgeneric or fully specified structures derived from a single generic structure) for investigation and planning.

MDL's Project Library is an application that not only helps researchers to manage combinatorial libraries, their building blocks, and their associated data at the project level, it also allows them to plan and refine combinatorial libraries by making it easy to build, store, and export "virtual" libraries.


Combinatorial Chemistry Project Management

In order to track information for both mixture and discrete compound libraries, a combinatorial chemistry project management tool must allow the researcher to associate data with:

  • the library itself (represented as a generic structure, or parent library) [see Figure 1];
  • mixtures of compounds within the library (represented as subgenerics, or child libraries) [see Figure 2];
  • individual compounds in the library [see Figure 2].

Project Library supports each of these activities by allowing combinatorial researchers to organize libraries into project databases, where information on parent and child libraries and discrete compounds can be stored and evaluated and associations between them automatically managed. A quick-loading feature makes it easy to load tables of administrative, biological, physical, and encoding data to the appropriate parent library, child library, or specific structure, all of which are fully searchable by structure or associated data.

Working with Project Library's tools, you can quickly build generic structures--representing hundreds of thousands of compounds-- and assign names and encoding information to the components. Then, using substructural, encoding, or other data constraints, you can search for specific structures within the library [see Figures 1 and 2].

Intelligent Enumeration

Enumeration is the process of automatically generating either subgeneric or fully specified structures (individual compounds) from a generic structure. Enumeration of a parent library enables the researcher to produce structural representations of child libraries, or discrete compounds within the library. Project Library not only generates the appropriate structures on demand, but also automatically maintains the relationships between parent, child, and specific structures. (Data inherited by the child library or specific structure may include encoding, component names, and parent library information.)

This ability to do "intelligent" enumeration also enables Project Library to support the three methods commonly used today to identify the structure of active components within a combinatorial library:

  • Arrayed, spatially addressable synthesis (active compound identified by molecular weight, components, or location) [see Figure 3];
  • Encoded mixture synthesis using nucleotides, peptides, or other chemical tag (active compound identified by its tag) [see Figure 4];
  • Deconvolution (active compound identified by iterative synthesis of mixtures and subsequent screening) [see Figure 5]

Project Library's special navigation tools allow the researcher to move between enumerated parent libraries, child libraries, and specific compounds with ease.


Reagents and Building Blocks

In combinatorial chemistry, the ability to manage the building blocks that make up the libraries is just as important as the ability to manage libraries themselves [see Figure 6]. So, Project Library allows researchers to assign names and codes to specific building blocks incorporated into a library, and to store the blocks and other associated data for future use.

Special processing tools in Project Library let researchers manipulate lists of reagents and quickly turn them into building blocks [see Figure 7]. Also, being able to search for building block structures makes it possible for researchers to gain new insight into the biological effectiveness of individual building blocks by tracking their success across libraries and screens.

Communication Management

Because tracking data means everything in combinatorial chemistry, researchers must be able to enter data by themselves and access it by themselves. They also must be able to generate reports when necessary, and make the data readily available to all other members of the research team.

Using the guided, graphical user interface, researchers can enter data into Project Library, generate standard reports at the click of a button, or easily export data into word processor programs for custom reports. They can also perform data analysis by building a spreadsheet, complete with structures and data for SAR work, or export the data into other software programs for analysis. Project Library runs on Microsoft Windows and Apple Macintosh computer systems.

Cost Management

The process of finding novel, active compounds through combinatorial chemistry is akin to finding the proverbial needle in a haystack, using an array of technologically sophisticated tools. Project Library can help research management contain R&D costs in several ways.

First, Project Library can help you manage the information generated from your automated systems. Capital investment in robots for combinatorial synthesis and high-throughput screening runs high, and laboratories cannot afford the bottlenecks in information processing caused by inadequate data handling. Project Library allows researchers to drive the enumeration of specific structures from a library based on information generated from an ASCII robot file. Users can specify, for example, the components used or location of samples on plates. Project Library also makes it possible to write an ASCII file that can be used to program robots to synthesize compounds elucidated from virtual libraries.

Project Library can also help management plan and manage the entire combinatorial process. Rather than creating an enormous paper trail, Project Library can be used to track reagent costs, planning duration, building duration, and screening effectiveness. Project Library also keeps the combinatorial chemistry work flow going smoothly, complementing MDL's current range of scientific information management products and databases to help combinatorial researchers at every stage of the process.

MDL has done more than create a new software solution with Project Library--it has introduced a new way of developing software. MDL chose to develop Project Library in two phases. Through extensive industry research and a software pilot program, MDL gained an understanding of the information needs presented by combinatorial chemistry programs and ascertained the type of contribution that was needed by this growing field. During the industry research phase, MDL technically assessed the industry requirements and the science involved in combinatorial chemistry by visiting 50 companies in the pharmaceutical, biotechnology, and agrochemical industries. There, MDL interviewed the researchers involved in combinatorial chemistry programs. After discussing possible software solutions on paper, MDL developed prototype software to encourage information exchange and refine software requirements.

In the second phase, MDL placed the prototype software in research groups at pilot sites. Combinatorial chemists were trained on the prototype and asked to use the software in their daily routine. Feedback from this portion of the process helped shape Project Library--at the request of pilot researchers, MDL ensured that the software contains a guided user interface, supports multiple research methods seamlessly, and is complemented by other MDL software and scientific applications.

According to Dr. Sheila DeWitt, a senior research associate in the Bio-Organic chemistry group at Parke-Davis Pharmaceutical Research, the pilot process forged a unique partnership between Parke-Davis and MDL.

"The process was excellent. MDL was always willing to work with us and talk to us about what we needed the software to do. The opportunity to play with a prototype gave our researchers a better idea of how the software would fit into their daily routine. When we encountered resource issues, MDL even provided a computer for us to use."
-- Dr. Sheila DeWitt, Parke-Davis Pharmaceutical Research

Pilot sites are currently working with MDL to refine the beta version of Project Library. The success of this approach to project development has led MDL to plan to additional pilot programs in the development of future MDL solutions.


Figure A

Information sources useful throughout the planning of a combinatorial library. Databases of reaction methodology, reagent availability, and relevant prior-art are extremely useful in combinatorial synthesis. Also portrayed is data capture, discussed in the article on Project Library.

Return to the paper.


Figure 1

Project Library allows researchers to archive an entire library as a single, searchable generic structure. Here, a single generic structure represents a complete library of benzodiazepine compounds.

Note that in addition to structural information, component ID information is stored (e.g. BZA, MeInd, MeOPh, etc.) (Benzodiazepine library from: DeWitt, et. al., Proc. Natl. Acad. Sci. USA, Vol. 90, pp. 6909-6913, Aug. 1993).

Return to the paper.


Figure 2

Using selective enumeration capabilities available in Project Library, the researcher can automatically create subgeneric or specific compounds from a generic structure.

Here, the subgeneric structure is generated from the benzodiazepine library in Figure 1 by asking Project Library to enumerate R1 = Me and R2 = Ph. The specific structures shown below are four of the 90 created when the library is fully enumerated.

Note that in addition to automatically creating the structures, Project Library automatically creates the appropriate structure identification string from the component names in the original generic structure (e.g. BZA-MeInd-Ph-R3-R4, BZA-Bzl-Chx-H-H, etc.)

Return to the paper.


Figure 3

"Intelligent" enumeration capabilities in Project Library allow the researcher to elucidate quickly the exact structure of active compounds within libraries synthesized using arrayed, spatially addressable synthesis.

With this technique, active compounds are identified by molecular weight, components, or location. Here, Project Library automatically generates the structures for two benzodiazepines from the library above that have a molecular weight between 360 and 370.

Return to the paper.


Figure 4

Because Project Library allows the researcher to associate names and encoding information with individual components of a library, the active structure(s) from an encoded mixture synthesis can be automatically enumerated using a component or encoding string.

Shown here is the specific structure corresponding to a binary synthesis code (110- 101-011-100-010-001) obtained from a gas chromatogram of the tags from an active bead from the library.

Encoded peptide library from: Ohlmeyer, M.H.J., et. al., Proc. Natl. Acad. Sci. USA, Vol. 90, pp. 10922-10926, Dec. 1993.

Return to the paper.


Figure 5

Project Library supports deconvolution experiments through selective enumeration of parent and child libraries which represent the mixtures synthesized and screened.

As structural features are fixed, the mixtures are assayed, and the most active combination is pursued until a manageable number of discrete compounds can be synthesized and screened. Here, Project Library tracks the history of a typical deconvolution experiment where the parent library is the original generic structure and child libraries (mixtures) are represented as subgeneric structures (where successive R-groups have been enumerated to produce each set of subgeneric structures).

Note that the associations between parent and child libraries are automatically tracked and data can be associated with either parent libraries, child libraries, or specific structures. (Non-encoded peptide library from: Houghten, R.A., et. al., Nature, Vol. 354, 7 Nov. 1991, pp. 84-86.)

Return to the paper.


Figure 6

In addition to managing libraries, Project Library allows the researcher to manage sets of building blocks used in library creation.

Reagents for combinatorial synthesis can be processed to form building blocks. Once the building blocks are saved in the database, they can easily be incorporated into a generic structure representing a library. The building blocks can be organized by compound class, and data such as name or encoding information can be associated with individual building blocks and is automatically incorporated into the generic structure.

Return to the paper.


Figure 7

Researchers can automatically process sets of reagents to produce building blocks based on rules they create.

Here, a saved rule (to clip acid halide leaving groups and add the appropriate attachment point) is applied to a list of acid chloride reagents automatically producing the corresponding building blocks. The building blocks can be saved in the database and incorporated into a generic structure representing a real or virtual library.

Return to the paper.

NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:

Network Science Corporation
4411 Connecticut Avenue NW, STE 514
Washington, DC 20008
Tel: (828) 817-9811
Website Hosted by Total Choice