Combinatorial Chemistry Library Design
Using Pharmacophore Diversity
Keith Davies and Clive Briant
Chemical Design Ltd.
Presented at the MGMS meeting, Leeds, April, 1995.
http://www.netsci.org/Science/Combichem/feature05.html
Introduction
The use of Combinatorial Chemistry synthetic methods allow a very large number of molecules to be synthesized much more rapidly and at lower cost than traditional synthetic chemistry. The aim of Library Design is to reduce the number of molecules which need to be made without decreasing the diversity of the library. This has the potential of finding leads more rapidly because a smaller number of molecules are tested by avoiding molecules which are very similar. This paper describes a new way of analyzing diversity by considering the type and geometry of pharmacophores which are exhibited by the molecules.
Representing Combinatorial Libraries
Combinatorial Chemistry libraries are usually represented by one or more Markush structures with a small number of R-group positions. For each R-group position there are lists of alternative groups. The capabilities of Chem-X for storing and searching Combinatorial Libraries in databases are described separately [1]. In order to evaluate the available pharmacophores, the library is expanded using Chem-X into individual 3D structures which are stored in a database.
Definition of a Pharmacophore
The atoms or centres in a small molecule which have critical interactions with the receptor constitute the pharmacophore. In Chem-X, following previous work [2], we use 5 centre types:
- Hydrogen Bond donors
- Hydrogen Bond acceptors
- Positively Charged centres
- Aromatic Ring centres
- Hydrophobic centres
These interactions typically have precise geometric requirements which are readily described in terms of the distances between the atoms in the pharmacophore.
Quantifying Diversity
It is useful to be able to quantify diversity in order to determine the diversity of a given library or set of molecules and to compare the diversity of libraries. In biological systems, there are obviously upper limits on the geometry of pharmacophores which can be accommodated in receptor sites. The sensitivity of biological activity to the distances between centres in the pharmacophore will depend upon the receptor and the values of the distances.
In the ChemDiverse module of Chem-X, 3-centre pharmacophores are considered which are selected from 4 possible types (hydrophobic centres are not used in the current software). This gives 20 ways of choosing 3 centres. The distance bin model used in ChemDBS-3D [3], uses 31 bins spanning up to 15 Angstroms. The number of potential pharmacophores can therefore be deduced as 20 x 31 x 31 x 31 = 595820. We have calculated that approximately 20% of these geometries do not form triangles because the sum of the length of two sides is less than the length of the third side. These bins constitute a 3-centre key known as the pharmacophore key where as previously only 2-centre distance keys have been used. The pharmacophore key is too large to be routinely stored for each molecule and in this work it is calculated when required for molecules, mixtures or entire libraries.
Conformational Analysis
Most molecules have some rotatable bonds and consequently many low energy conformers. When generating keys for 3D database searching it is usual to consider conformational freedom as efficiently as possible [3] The use of empirical energy calculations to identify populated low energy conformations is prohibitively time- consuming when considering over 10,000 molecules. Instead, the rule-based conformational analysis capabilities of Chem-X are used [4] to identify and eliminate conformers which have a high energy because atoms are too close. The pharmacophores exhibited by all accepted conformers are then deduced and the appropriate bits set in the pharmacophore key, as illustrated below.
Figure 1
Visualizing the Pharmacophore Key
The pharmacophore key can be conveniently visualized as a 3-dimensional plot known as a pharmacophore plot, in which the X,Y and Z axes are used for the distances between the 3 centres. In ChemDiverse, each pharmacophore is represented as a symbol coded to show the pharmacophore's type using 20 letters. Colours are used to distinguish pharmacophores with 3 identical centres, 2 identical centres and 3 different centres. For large libraries, the plot may be too crowded but pharmacophore plots for individual molecules, active molecules, etc. are readily interpreted visually.
When symbols are picked from the plot, the pharmacophore is drawn, the geometry is reported to a message bar and optionally the pharmacophore saved for use as a search query. This is important as the lists of molecules which exhibit each pharmacophore are not constructed because they would be extremely large. Instead, the pharmacophore of interest is used as a query to search the 3D database in order to identify the molecules which exhibit that pharmacophore. This search regenerates conformations on the fly using either a rule-based systematic search [3] or a flexifit approach which automatically considers a small number of random conformers of each structure and attempts to fit them to the pharmacophore while allowing the torsion angles about rotatable bonds to change [4].
Figure 2
Selecting Molecules for Subsets
Our approach reduces the Library Design problem to selecting a subset of molecules which exhibit most of the pharmacophores exhibited by the entire set of molecules. The method computes the pharmacophore key for the set of molecules as described previously. Molecules are eliminated on the following criteria:
(a) Flexibility: Molecules which have too few rotatable bonds are considered too rigid and are eliminated because too many such molecules would be required to cover all the pharmacophores. Molecules which are too flexible are also eliminated because these molecules exhibit so many pharmacophores that only at high concentrations would the pharmacophores be sufficiently populated. In addition, excessive flexibility increases the computational time taken to analyze the results. These criteria are applied prior to performing the conformational analysis and are therefore a computationally efficient means of eliminating molecules.
(b) Number of Pharmacophores: Molecules which exhibit too few or too many pharmacophores are eliminated for the same reasons as discussed for (a).
(c) Overlap of Pharmacophores: Molecules which exhibit pharmacophores which largely overlap with those exhibited by molecules already included in the subset are eliminated. The current software uses the percentage overlap of the pharmacophores exhibited by a molecule as the criterion for elimination.
This algorithm is obviously order-dependent. To perform the analysis most efficiently, it is beneficial to use a diverse ordering derived from the similarity of the 2-centre distance keys [5]. This approach is especially suitable because the distances keys are dependent on the pharmacophores exhibited. The algorithm starts with the molecules that are most similar and most dissimilar to the mean distance keys. Subsequent molecules are selected to be most dissimilar from the mean of the keys for the previously selected molecules. The resulting order places dissimilar molecules at the start of the list and ensures that those at the end of the list are similar to molecules earlier in the list. When selecting subsets, this can allow the search for a diverse subset to be terminated early when the desired degree of diversity in terms of number of exhibited pharmacophores has been reached.
The current experimental technology can limit the extent to which library design can be used. For instance, it is always possible to decide to include or exclude a given reagent which forms an R-group, but there may be limited or no control over the combinations of R-groups that are used. In such cases, the above procedure may be enhanced to identify which R-groups to include based on the frequency of occurrence in the subset.
Determining Overlap of Libraries
The number and percentage of set bits in the pharmacophore key may be reported to users of the software giving a quantitative measure of diversity. By performing logical operations such as AND, NOT, OR and EOR the overlap and differences between pharmacophores exhibited by libraries may be readily determined. A tolerance is used to compensate for rounding errors for distances at or close to the bin boundaries and to accommodate the variation in distance that may be acceptable to the receptor. This approach may determine whether it is useful to make or purchase a library for testing. A library which exhibits pharmacophores largely similar to those exhibited by libraries previously tested is unlikely to generate new leads.
The overlap between pharmacophore keys is also important for identifying pharmacophores common to active molecules or mixtures. In this case, the pharmacophore key for each active mixture is determined and these keys are then ANDed. When the activity of a mixture arises from a large number of weakly active molecules, the optimum pharmacophore for activity may not be exhibited by the mixture. Inclusion of the pharmacophore key for such mixtures in the AND operation is unlikely to give any common pharmacophores. It is necessary to identify and eliminate such mixtures from the AND operation by trial and error. In some cases, there may be many pharmacophores in common when those which also occur in inactive mixtures may need to be identified and eliminated by 3D searches on the constituent compounds. Once the pharmacophore responsible for activity is known, the subsets of libraries which exhibit that pharmacophore are determined using 3D searching as previously described.
References
- To be published.
- Y. C. Martin et al., J. Comput. Aided Mol. Des.,1988, 2, 15-29.
- N W Murrall and E K Davies, J. Chem. Inf. Comput. Sci., 1990, 30, 315.
- Chem-X Reference Manuals, Chemical Design
- Fast Clustering Applications Note, Chemical Design
NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:
- Network Science Corporation
- 4411 Connecticut Avenue NW, STE 514
- Washington, DC 20008
- Tel: (828) 817-9811
- E-mail: TheEditors@netsci.org
- Website Hosted by Total Choice