An Information Management Architecture
for Combinatorial Chemistry
Keith Davies and Catherine White
Chemical Design Ltd.
http://www.netsci.org/Science/Combichem/feature07.html
Introduction
Combinatorial Chemistry synthetic methods have the potential to generate a very large number of molecules to be tested, either in mixtures or as individual molecules. The use of High Throughput Screening robotic technology allows up to 1000 samples to be tested per day using a single robot. Many installed database systems cannot cope with the amount of biological data generated and the number of associated structures. In addition, the functionality for searching and reviewing results is different from that used for older single-molecule systems. This paper describes the use of a truly relational database system to store the reaction, reagent, biological and protocol data linked to a Chem-X database which stores just the components or R-groups in a non-enumerated database. The Pharmacophore Diversity technology for Library Design developed by Chemical Design continues to require enumerated databases [1]. It should be understood that the use of SQL means that Chem-X is sufficiently flexible to be used with almost any set of relational database tables.
The data management focuses on the plate and well identifiers with which the biological results and chemical structures are linked. The components or R-groups of the chemical structures are linked to the reactants used in the reactions. Suitable reactants may be selected using the reagent searching capabilities of Chem-X and a reaction database can be used to record the results of optimising reaction conditions.
Chem-X is supported on a range of platforms including PC clients running MS-Windows and UNIX or VMS servers. Silicon Graphics workstations are recommended for Library Design.
Reaction Databases
Prior to making a library it is necessary to select a series of reactions and appropriate R-groups. It can be useful to search and review reactions in a database although at present no reaction databases for Combinatorial Chemistry are commercially available. Using Chem-X, reactions can be registered together with sets of reaction conditions and yields. The optimum sets of conditions are included in the synthetic protocol. It is also possible to store the generic reactants as reactions, although the additional effort drawing intermediates with correct atom mapping may be difficult to justify. The generic reactants are useful for reagent searches as described below and can be stored in the component database where they can be conveniently viewed with the reactants.
Library Registration
A library is registered before it is made so that the
information system can help generate and manage data for the
synthesis. The Chem-X component database contains relational tables
recording the use of each R-group in a library in terms of the
R-group number and position in the list of alternatives for each
library. The components may be drawn by the user, cut & pasted
from ChemDraw [2] or ISIS/Draw [3] or generated using reagent
searching (as described below). The list of R-groups for each
position may be defined using database searching, selecting from
the screen or using the keyboard. R-groups may also be read from
files such as MACCS SD files [3]. During library registration a row
of data is inserted for each component and optionally a database of
enumerated structures may be generated which may also be exported
as a MACCS SD file.

Figure 1
All or some of the components in a library may be viewed in a
special layout where the components are arranged in columns. This
approach is well suited to very large libraries which cannot be
displayed on a single page.

Figure 2
Reagent Management
The reagent searching of databases of available starting materials automatically writes the corresponding R-groups to a component database with the same structure identifiers. This use of the same structure identifiers for the R-groups as the parent reagents greatly simplifies reagent management. The storage of these search queries for the reagents allows the reactions to be viewed in terms of the generic reagents and the resulting Markush representation of the final product. These may be viewed in an analogous way to the components with the generic reagents in columns.
It is necessary to maintain in a database a table of reagent
names and physical locations so that when robotics are used for
synthesis, a file mapping from reagent to location can be generated
simply. Chem-X has the capability to generate the list of reagents
which were used to make any given structure so it is not necessary
to store the reagents that were used for each well of each plate
(although this can be done).

Figure 3
Synthetic Protocols
Each synthetic protocol has a separate identifier which is
associated with a description and, for automated synthesis, a set
of robotic instructions. These instructions refer to the reactants
or reagents by R-group number as well as any protecting group
chemistry. The conditions for each step in the protocol will have
been previously optimised and form part of the robotic
instructions. Although it is conceptually possible to have
different protocols for each plate, the synthetic protocol is
usually the same for an entire library. The robot will require data
files with the list of reactants and their locations in a carousel.
It is assumed that it is more practical to store the reference to
the robotic instructions rather than the instructions themselves.
If the synthetic procedure generates single compounds then the
synthetic protocol may include instructions to combine samples if
mixtures are to be tested.

Figure 4
Defining Mixtures
Mixtures are usually used to reduce the number of samples to be
tested. They can be generated inherently in the synthetic procedure
or deliberately combined, for instance to create an orthogonal set
(when molecules occur in several samples and the pattern of
activity identifies individual molecules from mixtures). Chem-X
assumes that the relational database contains a table which has
columns of data for the plate identifier, well identifier, library
and structure identifier. The structure identifier is only unique
within a library. For mixtures there are multiple rows with the
same plate and well identifiers. The structure identifiers are
generated by Chem-X when a library is registered and, for the
purposes of integration with robotics, an ASCII file containing
structure identifier and the list of reagents can be exported from
Chem-X. This file is processed according to the synthetic protocol
by a customised program to give a file containing plate, well,
library and structure identifiers for loading into a relational
database.

Figure 5
Plate Management
Most existing robotic high throughput screening systems use 96
well plates. Each plate is labelled with a barcode of the unique
plate identifier and the wells are numbered systematically. The
barcodes are allocated and the plates labelled on completion of the
synthesis and transfer to the plate (the management of the
synthetic data is previously described). Given the restriction that
most robots do not read barcodes, it is important to have operating
procedures which ensure that the assay results can be associated
with the correct plate. A separate ASCII file for each plate
containing the assay results is convenient to load into a
relational database.

Figure 6
It may be desirable to store additional information such as the date and operator although this could be associated with the Assay ID if database size is a concern.
Assay Data
The assay data usually requires some scaling before being made available for searches. For this purpose a range of control samples are normally used. For instance, if the actual binding constants are known for the controls, the data for these samples can be used to set a linear or quadratic interpolation for generating binding constants for the other wells. This scaling can be performed before or after loading the data into a relational database but may require some program customised for a specific assay.
Assay Protocols
Each different assay procedure revision is allocated a unique
identifier and a separate relational table contains information
about the assay including class of assay, description, date last
revised and, when appropriate, robotic data to perform the assay.
In practice, it may only be necessary to store the reference to the
robotic instructions. For space efficiency it is preferable to
store the assay identifier per plate (since all the samples on a
plate will be processed for the same assays) but this complication
is ignored here.

Figure 7
Links to Structures
The data stored in the Chem-X component database records the use of components for central and R-groups and the position in the list of alternatives for each group allows the structures to be quickly enumerated. The individual structures may be enumerated for an entire library or just a subset of the library. The enumerated library will contain the list of reagents for each structure. The Chem-X ORACLE interface allows ad hoc searches of data stored in ORACLE to be performed and the corresponding structures viewed. For example, a search on biological activity returns the list of components used for each structure and an enumerated database of these structures. Any ORACLE column containing data which needs to be searched from Chem-X has a corresponding field in Chem-X. For each linked field, a template SQL statement is stored to search the ORACLE database and return the list of libraries and structures. This template is specified once by the database adminstrator. To perform searches efficiently, several fields such as plate, well, library and biological activity may be searched at once when a single SQL statement is generated automatically by combining the templates. Substructure searches can be combined with such data searches.
Conclusions
The Chem-X software when used with a relational database such as ORACLE provides a comprehensive data management system for Combinatorial Chemistry synthesis data and assay data linked to chemical reactions and structures. The innovative use of a relational architecture means that a R-group is only stored once and referred to in several libraries. The open and flexible architecture means that synthetic, biological and structural data can be readily imported and exported. The use of SQL templates which are modified by Chem-X means that the searching is very efficient and the organisation of the ORACLE tables can be adapted to best meet the requirements of the users.
References
- "Combinatorial Chemistry Library Design using Pharmacophore Diversity" by Keith Davies and Clive Briant presented at the MGMS meeting Leeds April 1995.
- ChemDraw is a product of CambridgeSoft Corporation, Cambridge, MA
- ISIS/Draw and MACCS are products of MDL Indormation Systems, Inc, San Leandro, CA, USA
NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:
- Network Science Corporation
- 4411 Connecticut Avenue NW, STE 514
- Washington, DC 20008
- Tel: (828) 817-9811
- E-mail: TheEditors@netsci.org
- Website Hosted by Total Choice