3D Pharmacophore Searching
Keith Davies and Roger Upton
Chemical Design Ltd.
Roundway House
Cromwell Park, Chipping Norton
Oxon OX7 5SR, UK
Tel: + 44 0 1608 644000
http://www.netsci.org/Science/Cheminform/feature02.html
Introduction
Software for storing 3D coordinates of molecules has been available for many years. Probably the earliest example of a 3D database system was that developed by Dr Olga Kennard and co-workers at the Cambridge Crystallographic Data Centre which was first published in 1972 [1]. This system has evolved over the years and remains the definitive source of small organic molecule crystal structures. The software includes capabilities to search by geometric constraints as well as connection tables but (of course) does not consider conformational freedom.
3D database technology has emerged as the most important tool for exploiting databases of molecules available for testing. During the 1980's most pharmaceutical companies established 2D databases of compounds they had made and those available for purchase. The availability of software such as CONCORD [2] which automatically created 3D coordinates from the 2D connection tables prompted the development of 3D database software capable of storing and searching a large number of molecules. The 2D databases have continued largely as archival systems whereas companies with 3D databases have been able to use pharmacophore-based queries to select molecules to test [3].
The Pharmacophore Concept
The traditional medicinal chemistry definition of a pharmacophore is the minimum functionality a molecule has to contain in order to exhibit activity. For a series of derivatives, the molecules usually have much in common and to derive the true minimum pharmacophore, structurally-diverse active molecules are required. Only molecules which interact at the same receptor site in the same way will share a pharmacophore. Consequently, experiments which confirm that competitive binding is occurring can be valuable. In the early stages of a project, when only a few leads are known, it may be worthwhile considering several pharmacophores and selecting molecules to test to reduce the number of possible alternatives.
In the Chem-X software, the pharmacophore is defined in terms of atoms or centers which can interact with the receptor following the work of Dr Yvonne Martin and co-workers [4]. The types of interaction centers are categorized as:
- Hydrogen bond donors
- Hydrogen bond acceptors
- Positive charge centers
- Aromatic ring centers
- Hydrophobic centers
This approach allows the pharmacophore to be defined as 3 or 4 centers thus forming a triangle or tetrahedron.
Database Architecture
The Chem-X database is designed to store vast numbers of molecules. For small project oriented databases it is desirable to store a lot of data per atom while for large databases it is important to reduce the storage to a minimum to avoid excessive disk usage. The configuration of a Chem-X databases is defined when it is created and may store the following:
| Property | Size (bytes) | Comment |
|---|---|---|
| Atom Type | 2 | Required |
| Atom Name | 4 | Optional |
| Serial Number | 2 | Required for Reaction Databases |
| Group Number | 2 | Required for Reaction Databases |
| Print Status | 4 | Optional |
| Display Status | 2 | Optional |
| Colour | 2 | Optional |
| Radius | 4 | Optional |
| Charge | 4 | Optional |
| 3D Coordinates | 12 | Required for 3D Databases |
| 2D Coordinates | 12 | Optional |
For peptides and proteins there is the option to store residue name and sequence numbers and for crystal structures the symmetry operators may be stored. The disk usage for 3D databases of small molecules is typically 2 Kbytes per molecule. This can be reduced to 1 Kbyte by not storing hydrogens explicitly but generating them on reading the structure from disk.
The original version of ChemDBS-3D [5] used 3D keys which described the possible distances between centers. In the current software, 2D keys are recommended which use estimates of the lower and upper bounds between centers. In addition, the keys have been extended to include bond keys storing information about connected chains of atoms.
The database architecture also supports up to 2,048 numeric and text data fields to be stored per molecule. The numeric fields are often created for QSAR when they are used to store the results of calculations. The text fields are of variable length and allow multiple values to be stored per molecule. In addition, the fields can be grouped into relational tables where values from one row and column index another table. These capabilities are important for coping with multiple batches of molecules in 2D registration software and in Combinatorial Chemistry as published elsewhere [6].
Coping with Conformational Freedom
Most small molecules have many low energy conformers which may be populated in solution and are therefore candidates for interacting with a receptor. Two approaches have been used in 3D databases systems:
- Storing a single conformer and regenerating alternatives during the search
- Storing multiple representative conformations
The released ChemDBS-3D software [5] adopts the former approach because it dramatically reduces the disk space required for storing databases. This also means that once molecules have been loaded into a database, it is immediately available for searching. Using modern workstations, a database of over 500,000 structures can be built in a day. The algorithms used by Chem-X, predict whether a conformer will have a low energy prior to generating coordinates using a Rule-Based search. Once a conformer is generated and matches the query it is stored in a results database. Search times vary depending on the query, but with typical vague pharmacophore queries it is usual for searches on databases this large to take 1-3 hours. Simple well defined queries complete in a few minutes. Storing multiple conformers usually has the advantage of faster search times, but generating a representative set of conformations for a database of 100,000 molecules may take months, which is an unreasonable amount of time [7].
Generating Pharmacophore Search Queries
In order to formulate a hypothesis for the pharmacophore, it is normally necessary to identify several active molecules. If these molecules are very flexible then the molecules usually exhibit many different pharmacophores and it can be difficult to identify any common pharmacophores without the aid of computer algorithms. Chem-X includes capabilities to generate all possible common pharmacophores for a series of molecules. The approach constructs the 3D distance keys for each molecule as previously described [5]. In the current software, these keys consist of 15 32-bit words, each word storing the available distances between a pair of centers as indicated in the following table where the number indicates the number of the word for the corresponding pair.
| H Donor | H Acceptor | Positive | Aromatic | Hydrophobe | |
|---|---|---|---|---|---|
| H Donor | 1 | 2 | 3 | 4 | 5 |
| H Acceptor | ... | 6 | 7 | 8 | 9 |
| Positive | ... | ... | 10 | 11 | 12 |
| Aromatic | ... | ... | ... | 13 | 14 |
| Hydrophobe | ... | ... | ... | ... | 15 |
The keys are shared between different conformations of the same molecule with the consequence that no information about which conformer sets a bit, and thus exhibits a distance, is retained.
The pharmacophores are identified by extracting the common bits in the keys which represent the inter-center distances shared by the molecules. Next, by generating all possible 3 or 4 center geometries the pharmacophores are created and stored in a database. The consequence of sharing the keys between conformations of the same molecule is that a pharmacophore may attempt to use distances from different conformations which is obviously not possible. Consequently, a validation step consisting of a 3D search with the pharmacophore as the search query is required to confirm that all molecules can exhibit each pharmacophore.
When there is a published pharmacophore or active molecule, this can often be used as the basis for a search query. Sometimes, the pharmacophore can be expressed more conveniently by changing angle constraints into distances and generalized by only including the minimum number of atoms. Before an active molecule can be used as the basis for a search query, the essential conformation which is associated with activity needs to be identified. A query can be generated by deleting non-essential atoms and used immediately to identify molecules to test by searching databases of available molecules. The results can be used to validate or invalidate the hypothesis in terms of the conformation and essential atoms.
Advanced Searches
It can be tempting to include functional groups in the search query, especially when the active molecules found to date contain a common functional group. Almost invariably this improves the search speed but can fail to generate novel functional groups or frameworks. Consequently, in the absence of evidence to use a functional group restraint, queries should be as generic as possible.
Chem-X includes a number of generic search constraints including:
| Keyword | Type of Restraint |
|---|---|
| CHAIN | Atom in any chain |
| RING | Atom in any size or type of ring |
| HDON | Atom is hydrogen bond donor |
| HACC | Atom is hydrogen bond acceptor |
| NONE | Atom not in a center |
| ACIDIC | Atom in acidic group |
| BASIC | Atom in basic group |
| NEUTRAL | Atom in group which is not acid nor basic |
| AROMATIC | Atom in carbocyclic aromatic ring |
| HET-AROM | Atom in heteroaromatic ring |
| n-CYCLIC | Atom in n-membered ring (n between 2 and 8) |
| MEDIUM | Atom in 8-12 membered ring |
| LARGE | Atom in 12+ membered ring |
Chem-X assigns atom types to the query and every molecule in a database and these are used as the basis for advanced searches. The atom type of an atom in a query may be constrained to a specific type or a list of types. In addition, any of the above constraints may be applied (with the complication that some are mutually exclusive). By default, an atom may match any atom with the same element with the added restraint that if the original atom is a hydrogen bond donor with an attached hydrogen, then all matched atoms must also be hydrogen bond donors. Center types are stored for each atom type and the generic queries HDON, HACC etc are matched against this data.
The assignments of acidic, basic and neutral types are made algorithmically. This is because too many substructures and corresponding atom types would otherwise need to be defined. In general terms, a BASIC atom is an uncharged oxygen or nitrogen hydrogen bond acceptor in which the lone pair is not conjugated and an ACIDIC atom is the hydroxyl oxygen of a carboxylic acid or nitrogen with an attached hydrogen in a tetrazole. O- is considered basic except when in a nitro group and a protonated quaternary nitrogen is considered acidic. All other donors and acceptors are considered neutral.
Integration with QSAR
In Chem-X, the conformations which result from a 3D search are stored in a database with conformations of each molecule oriented to fit the query. In addition, the atoms which match the query inherit the names of the atoms in the query and when Markush queries are used the atoms which match an R-group are assigned the same group number. Chem-X includes capabilities to automatically calculate a range of properties for entire molecules and for groups within the molecule which can then be used in QSAR studies.
| Property | Molecule | Group |
|---|---|---|
| Inter-heteroatomic distances | Y | N |
| Atomic charge | Y | N |
| Total charge | Y | Y |
| Dipole moment | Y | Y |
| Dipole components | Y | Y |
| Dipole component angles | Y | Y |
| Weight | Y | Y |
| Number of atoms | N | Y |
| Number of heteroatoms | N | Y |
| Number of centers | N | Y |
| VdW volume | Y | Y |
| VdW surface area | Y | Y |
| Coded surface areas | Y | N |
| Electrostatic potential volumes | Y | Y |
The range of QSAR techniques available are:
- Data Reduction
- Linear Regression
- Principal Component Analysis (PCA)
- Partial Latent Squares (PLS)
- Clustering on property values
- Non-linear mapping using weighted least-squares (WLS)
Conclusions
3D pharmacophore searching has two main applications in drug discovery:
- Generating new leads by searching for molecules to test
- Aligning and naming molecules for QSAR studies for lead optimization
The Chem-X software allows a variety of 3D pharmacophore search queries to be expressed as well as software to generate pharmacophores. The results of 3D searches are stored in a database suitable for use in QSAR studies and a comprehensive set of QSAR techniques are provided. The software is widely used by approximately 130 sites world-wide and has been extensively used to generate new leads in pharmaceutical research laboratories [8].
References
- O Kennard, D G Watson and W G Town, J. Chem. Doc., 12, (1972) 14
- CONCORD written by R Pearlman et al. and distributed by TRIPOS Associates, St Louis, MO, USA
- J S Mason "Drug design using conformationally flexible molecules in 3D database", in "Trends in Drug Research" Ed V Classen, Elsevier Science Publishers BV, Amsterdam, The Netherlands (1993)
- J van Drie, D Weininger and Y Martin, J. Comp. Aided Mol. Des., 3, (1989) 225
- NW Murrall and E K Davies, J. Chem. Info. Comput. Sci., 30, (1990) 31
- E K Davies and C J White, "An Information Management Architecture for Combinatorial Chemistry", NetSci, 1, (July, 1995).
- Personal communications from Catalyst users and former users indicates 2,000 structures per day on a Silicon Graphics workstation is typical.
- G W A Milne et. al., J. Chem. Info. Comput. Sci., 34, (1994) 1219
NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:
- Network Science Corporation
- 4411 Connecticut Avenue NW, STE 514
- Washington, DC 20008
- Tel: (828) 817-9811
- E-mail: TheEditors@netsci.org
- Website Hosted by Total Choice