Bioinformatics: An Editorial Perspective
Michael Liebman
http://www.netsci.org/Science/Bioinform/feature01.html
Bioinformatics is an expression of the '90's which has evolved out of its application in molecular and structural biology rather than being based in a formal scientific discipline. The word appears in conference names, scientific papers, classified advertisements, job descriptions, financial analyses, company prospectuses, newspaper articles and both real and virtual academic institutes. Much of what has been described as bioinformatics has involved database generation, maintenance and analysis for nucleic acid and amino acid sequences. Rather than suggest that the term has been misapplied, it appears to have persisted through a continuum of ad hoc definitions and applications. The focus of this edition of Netsci is to propose and explore a functional definition for bioinformatics which parallels that of biochemistry, whose definition is accepted as the "chemistry of biological systems and processes". Within this framework, we propose a definition for bioinformatics as the study of the information content and information flow in biological systems and processes. Now we must deal with how to define information, and especially, how to evaluate and value the difference between information and knowledge.
Information is interestingly defined in the dictionary as "knowledge derived from study, experience, or instruction; facts" (American Heritage Dictionary, 1983). Such a definition is noteworthy because it states that information is knowledge, while elsewhere in the dictionary, data is defined as "information". The implication is that data is knowledge. In science, this presents a conflict, because the pursuit of data or information is undertaken to provide the basis for developing the knowledge about how a system is structured and behaves, but the observations, or data, represent only part of the critical puzzle pieces. Bioinformatics is evolving to serve as a bridge between the observations in diverse bio-based disciplines, and the derivation of the understanding or knowledge about how the system or process functions, or in the case of disease, dysfunctions.
Perhaps some of the conventional use of the term stems from the treatment of biological molecules as information carriers, e.g. DNA codes for proteins; amino acid sequences code for protein structure; protein structure codes for protein function. Many studies have attempted to evaluate the information content available to DNA constructed of four unique bases, as well as for proteins and their 20 common amino acids. More recently, x-ray crystallography and nmr have yielded three-dimensional protein structures and led to studies into the classification of structural elements, e.g. secondary an super-secondary structures, structural domains and motifs, as well as tertiary folds, for use in inferring function to newly found protein sequences. The inherent bias has been towards sequence-based information analysis, and the effort to develop the genome projects for humans and other organisms, naturally continue this emphasis. We have become more aware of the information flow in biological pathways, receptor-mediated functions, gene regulation, intercellular communication and the impact that these have on normal and disease states. The study of information content and flow in biological systems can provide a much broader view, particularly in the observation and interpretation of intra- and inter-molecular communication, and expand the basis by which we address daily research issues.
What does this expanded view look like? As an example, we can look at issues related to drug design, or specifically design of an inhibitor for a specific enzyme for therapeutic use. Conventional approaches will focus on identifying, isolating and purifying a target enzyme; obtaining its sequence and deriving its three-dimensional structure; applying rational drug design and molecular modeling methods for computationally docking an inhibitor into an active site; expanding the library of chemical leads using combinatorial chemistry; synthesizing, screening and evaluating suitable lead compounds and their derivatives; design and implementation of clinical trials; and submission for FDA approval. The perspective that bioinformatics can present, raises additional questions along this path, some of which may include:
- if the enzyme target functions in a biological pathway, have we
optimized the selection of the target on the basis of maximal
control over pathway behavior? specificity and selectivity of the
target enzyme vs related, non-targeted enzymes from other pathways?
what about the interaction of this pathway with associated
pathways, e.g. does targeting this enzyme produce undesirable
effects in other pathways which communicate above or below the
target? can these secondary effects be controlled or evaluated to
assess total clinical efficacy? what are the most likely secondary
effects because of inter-pathway effects?what is known about the
genetic variations in individuals in this biochemical pathway and
its components which might affect individual response? How does
this become part of risk assessment?
- in the target enzyme, are there non-active site targets which
might yield greater specificity and reduce "side-effects" of
interacting with structurally/evolutionarily related targets? can
we modulate function of the enzyme at the specificity vs reactivity
level to be more effective? is specificity determined in the active
site alone? what is known about natural mutations of the target
enzyme? do individuals exhibiting natural mutations in the target
exhibit other critical physiological differences?
- in the small molecule developed to target the enzyme, does this
adequately elicit full response by the target in terms of
specificity, selectivity and efficacy? what are other potential
targets that the molecule may interact with to produce unwanted
effects?
- additional considerations should involve the analysis of the
disease process itself. If we consider the disease as a set of
initiating events which yield a structure of secondary and
tertiary, etc. events of varying time course, then we should
consider how to evaluate whether we perform the right clinical
analyses on a patient, who will be simultaneously presenting from
multiple disease levels, as to how to detect and stage the disease?
should a therapeutic or diagnostic be considered appropriate for
development based on available clinical observations? are we
observing the critical path for short-term remediation? how do we
evaluate the potential for long-term effects which we may not
observe or control? are there common mechanistic paths which may be
more suitable for multiple disease targeting? How does this become
part of correctly managed care?
We view these additional questions to be the key to the development of our bioinformatics "business" and have focused our efforts on developing novel technology and databases to address them specifically and successfully.
These questions just serve as examples of the expanded view which bioinformatics and its inherent interdisciplinary approach can present to the relevant problems which we face daily in both the research and commercial worlds. Where does the information reside and how can it be accessed to enhance productivity and add value to current practices? Much of the data has become available through the development of the World Wide Web as a result of particular efforts of research groups to provide access and tools for database use. The number of databases, tools and web sites are increasing daily. The problem which faces this virtual global research community is not gaining access to information, it is in converting it into knowledge. This is analogous to having access to the Library of Congress, i.e. access to the information does not impart knowledge of how it functions, but does provide the base for developing new approaches to evaluating and understanding it. An important component of these approaches involves the ability to transfer technology from disciplines where the problem has been solved, but perhaps in a non- obvious application. These are the challenges and the opportunities which confront researchers in biology, biochemistry, biophysics and clinical areas, and we believe that the development and application of bioinformatics will help integrate the solutions with the problems.
"Where is the knowledge we have lost in information" T. S. Eliot, Choruses from "The Rock"
The papers which appear in Netsci will attempt to explore the answer to this question by presenting the breadth of the bioinformatics application and provide a small sample of its variety.
NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:
- Network Science Corporation
- 4411 Connecticut Avenue NW, STE 514
- Washington, DC 20008
- Tel: (828) 817-9811
- E-mail: TheEditors@netsci.org
- Website Hosted by Total Choice