Bioinformatics: An Editorial Perspective

Michael Liebman

http://www.netsci.org/Science/Bioinform/feature01.html

Bioinformatics is an expression of the '90's which has evolved out of its application in molecular and structural biology rather than being based in a formal scientific discipline. The word appears in conference names, scientific papers, classified advertisements, job descriptions, financial analyses, company prospectuses, newspaper articles and both real and virtual academic institutes. Much of what has been described as bioinformatics has involved database generation, maintenance and analysis for nucleic acid and amino acid sequences. Rather than suggest that the term has been misapplied, it appears to have persisted through a continuum of ad hoc definitions and applications. The focus of this edition of Netsci is to propose and explore a functional definition for bioinformatics which parallels that of biochemistry, whose definition is accepted as the "chemistry of biological systems and processes". Within this framework, we propose a definition for bioinformatics as the study of the information content and information flow in biological systems and processes. Now we must deal with how to define information, and especially, how to evaluate and value the difference between information and knowledge.

Information is interestingly defined in the dictionary as "knowledge derived from study, experience, or instruction; facts" (American Heritage Dictionary, 1983). Such a definition is noteworthy because it states that information is knowledge, while elsewhere in the dictionary, data is defined as "information". The implication is that data is knowledge. In science, this presents a conflict, because the pursuit of data or information is undertaken to provide the basis for developing the knowledge about how a system is structured and behaves, but the observations, or data, represent only part of the critical puzzle pieces. Bioinformatics is evolving to serve as a bridge between the observations in diverse bio-based disciplines, and the derivation of the understanding or knowledge about how the system or process functions, or in the case of disease, dysfunctions.

Perhaps some of the conventional use of the term stems from the treatment of biological molecules as information carriers, e.g. DNA codes for proteins; amino acid sequences code for protein structure; protein structure codes for protein function. Many studies have attempted to evaluate the information content available to DNA constructed of four unique bases, as well as for proteins and their 20 common amino acids. More recently, x-ray crystallography and nmr have yielded three-dimensional protein structures and led to studies into the classification of structural elements, e.g. secondary an super-secondary structures, structural domains and motifs, as well as tertiary folds, for use in inferring function to newly found protein sequences. The inherent bias has been towards sequence-based information analysis, and the effort to develop the genome projects for humans and other organisms, naturally continue this emphasis. We have become more aware of the information flow in biological pathways, receptor-mediated functions, gene regulation, intercellular communication and the impact that these have on normal and disease states. The study of information content and flow in biological systems can provide a much broader view, particularly in the observation and interpretation of intra- and inter-molecular communication, and expand the basis by which we address daily research issues.

What does this expanded view look like? As an example, we can look at issues related to drug design, or specifically design of an inhibitor for a specific enzyme for therapeutic use. Conventional approaches will focus on identifying, isolating and purifying a target enzyme; obtaining its sequence and deriving its three-dimensional structure; applying rational drug design and molecular modeling methods for computationally docking an inhibitor into an active site; expanding the library of chemical leads using combinatorial chemistry; synthesizing, screening and evaluating suitable lead compounds and their derivatives; design and implementation of clinical trials; and submission for FDA approval. The perspective that bioinformatics can present, raises additional questions along this path, some of which may include:

  • if the enzyme target functions in a biological pathway, have we optimized the selection of the target on the basis of maximal control over pathway behavior? specificity and selectivity of the target enzyme vs related, non-targeted enzymes from other pathways? what about the interaction of this pathway with associated pathways, e.g. does targeting this enzyme produce undesirable effects in other pathways which communicate above or below the target? can these secondary effects be controlled or evaluated to assess total clinical efficacy? what are the most likely secondary effects because of inter-pathway effects?what is known about the genetic variations in individuals in this biochemical pathway and its components which might affect individual response? How does this become part of risk assessment?

  • in the target enzyme, are there non-active site targets which might yield greater specificity and reduce "side-effects" of interacting with structurally/evolutionarily related targets? can we modulate function of the enzyme at the specificity vs reactivity level to be more effective? is specificity determined in the active site alone? what is known about natural mutations of the target enzyme? do individuals exhibiting natural mutations in the target exhibit other critical physiological differences?

  • in the small molecule developed to target the enzyme, does this adequately elicit full response by the target in terms of specificity, selectivity and efficacy? what are other potential targets that the molecule may interact with to produce unwanted effects?

  • additional considerations should involve the analysis of the disease process itself. If we consider the disease as a set of initiating events which yield a structure of secondary and tertiary, etc. events of varying time course, then we should consider how to evaluate whether we perform the right clinical analyses on a patient, who will be simultaneously presenting from multiple disease levels, as to how to detect and stage the disease? should a therapeutic or diagnostic be considered appropriate for development based on available clinical observations? are we observing the critical path for short-term remediation? how do we evaluate the potential for long-term effects which we may not observe or control? are there common mechanistic paths which may be more suitable for multiple disease targeting? How does this become part of correctly managed care?

We view these additional questions to be the key to the development of our bioinformatics "business" and have focused our efforts on developing novel technology and databases to address them specifically and successfully.

These questions just serve as examples of the expanded view which bioinformatics and its inherent interdisciplinary approach can present to the relevant problems which we face daily in both the research and commercial worlds. Where does the information reside and how can it be accessed to enhance productivity and add value to current practices? Much of the data has become available through the development of the World Wide Web as a result of particular efforts of research groups to provide access and tools for database use. The number of databases, tools and web sites are increasing daily. The problem which faces this virtual global research community is not gaining access to information, it is in converting it into knowledge. This is analogous to having access to the Library of Congress, i.e. access to the information does not impart knowledge of how it functions, but does provide the base for developing new approaches to evaluating and understanding it. An important component of these approaches involves the ability to transfer technology from disciplines where the problem has been solved, but perhaps in a non- obvious application. These are the challenges and the opportunities which confront researchers in biology, biochemistry, biophysics and clinical areas, and we believe that the development and application of bioinformatics will help integrate the solutions with the problems.

"Where is the knowledge we have lost in information" T. S. Eliot, Choruses from "The Rock"



The papers which appear in Netsci will attempt to explore the answer to this question by presenting the breadth of the bioinformatics application and provide a small sample of its variety.



NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:

Network Science Corporation
4411 Connecticut Avenue NW, STE 514
Washington, DC 20008
Tel: (828) 817-9811
E-mail: TheEditors@netsci.org
Website Hosted by Total Choice