A Short History of Bioinformatics
Allen B. Richon
One of the fundamental principles of biology is that within each cell, the DNA that comprises the genes encodes RNA which in turn produces the proteins that regulate all of the biological processes within the organism. The human body is made up of an estimated 1012 cells, each of which contains 23 pairs of chromosomes that are comprised of approximately 30,000 genes which in turn contain some 3 billion pairs of DNA bases. While we have a basic understanding of how gene sequences code specific proteins, we lack the information necessary to completely understand role of DNA in specific diseases or the functions of the thousands of proteins that are produced. The methods that are used to collect, store, retrieve, analyze, and correlate this mountain of complex information are grouped into a discipline called bioinformatics. The goal of bioinformatics thus is to provide scientists with a means to explain:
- Normal biological processes
- Malfunctions in these processes which lead to diseases
- Approaches to improving drug discovery.
The use of these techniques has grown explosively in the past five years and shows no sign of slowing down. The result of this growth is that the number of sources of products, services, and information has increased to the point that keeping track of (or locating) the numerous providers has become extremely time consuming.
The intent of this article, and others that will appear in the near future, is to compile as comprehensive a list as possible of resources used by bioinformatics scientists. While the author has endeavored to provide a complete listing for this field, it is impossible to obtain profiles for all of the companies involved in this dynamic field.
Bioinformatics encompases the use of tools and techniques from three separate disciplines; molecular biology (the source of the data to be analyzed), computer science (supplies the hardware for running analysis and the networks to communicate the results), and the data analysis algorithms which strictly define bioinformatics. For this reson, the editors have decided to incorporate events from these areas into a brief history of the field.
A new technique, electrophoresis, is introduced by Tiselius for separating proteins in solution.
Pauling and Corey propose the structure for the alpha-helix and beta-sheet (Proc. Natl. Acad. Sci. USA, 27: 205-211, 1951; Proc. Natl. Acad. Sci. USA, 37: 729-740, 1951).
Watson and Crick propose the double helix model for DNA based on x-ray data obtained by Franklin and Wilkins (Nature, 171: 737-738, 1953).
Perutz's group develop heavy atom methods to solve the phase problem in protein crystallography.
The sequence of the first protein to be analyzed, bovine insulin, is announced by F. Sanger.
The first integrated circuit is constructed by Jack Kilby at Texas Instruments.
The Advanced Research Projects Agency (ARPA) is formed in the US
Packet-switching network protocols are presented to ARPA
The ARPANET is created by linking computers at Stanford, UCSB, The University of Utah and UCLA.
The details of the Needleman-Wunsch algorithm for sequence comparison are published.
The maiden voyage of the Boeing 747 is made on January 12.
Ray Tomlinson (BBN) invents the email program.
The first recombinant DNA molecule is created by Paul Berg and his group.
The Brookhaven Protein Data Bank is announced (Acta. Cryst. B, 1973, 29: 1746).
Robert Metcalfe receives his Ph.D. from Harvard University. His thesis describes Ethernet.
Vint Cerf and Robert Kahn develop the concept of connecting networks of computers into an "internet" and develop the Transmission Control Protocol (TCP).
Charles Goldfarb invents SGML (Standardized General Markup Language).
Microsoft Corporation is founded by Bill Gates and Paul Allen.
Two-dimensional electrophoresis, where separation of proteins on SDS polyacrylamide gel is combined with separation according to isoelectric points, is announced by P. H. O'Farrell (J. Biol. Chem., 250: 4007-4021, 1975).
E. M. Southern published the experimental details for the Southern Blot technique of specific sequences of DNA (J. Mol. Biol., 98: 503-517, 1975).
The Unix-To-Unix Copy Protocol (UUCP) is developed at Bell Labs.
The full description of the Brookhaven PDB (http://www.pdb.bnl.gov) is published (Bernstein, F.C.; Koetzle, T.F.; Williams, G.J.B.; Meyer, E.F.; Brice, M.D.; Rodgers, J.R.; Kennard, O.; Shimanouchi, T.; Tasumi, M.J.; J. Mol. Biol., 1977, 112:, 535).
Allan Maxam and Walter Gilbert (Harvard) and Frederick Sanger (U.K. Medical Research Council), report methods for sequencing DNA.
The first Usenet connection is established between Duke and the University of North Carolina at Chapel Hill by Tom Truscott, Jim Ellis and Steve Bellovin.
The first complete gene sequence for an organism (FX174) is published. The gene consists of 5,386 base pairs which code nine proteins.
Wüthrich et. al. publish paper detailing the use of multi-dimensional NMR for protein structure determination (Kumar, A.; Ernst, R.R.; Wüthrich, K.; Biochem. Biophys. Res. Comm., 1980, 95:, 1).
IntelliGenetics, Inc. founded in California. Their primary product is the IntelliGenetics Suite of programs for DNA and protein sequence analysis.
The Smith-Waterman algorithm for sequence alignment is published.
IBM introduces its Personal Computer to the market.
Genetics Computer Group (GCG) created as a part of the University of Wisconsin of Wisconsin Biotechnology Center. The company's primary product is The Wisconsin Suite of molecular biology tools.
The Compact Disk (CD) is launched.
Name servers are developed at the University of Wisconsin.
Jon Postel's Domain Name System (DNS) is placed on-line.
The Macintosh is announced by Apple Computer.
The FASTP algorithm is published.
The PCR reaction is described by Kary Mullis and co-workers.
The term "Genomics" appeared for the first time to describe the scientific discipline of mapping, sequencing, and analyzing genes. The term was coined by Thomas Roderick as a name for the new journal.
Amoco Technology Corporation acquires IntelliGenetics.
The SWISS-PROT database is created by the Department of Medical Biochemistry of the University of Geneva and the European Molecular Biology Laboratory (EMBL).
The use of yeast artifical chromosomes (YAC) is described (David T. Burke, et. al., Science, 236: 806-812).
The physical map of e. coli is published (Y. Kohara, et. al., Cell 51: 319-337).
Perl (Practical Extraction Report Language) is released by Larry Wall.
The National Center for Biotechnology Information (NCBI) is established at the National Cancer Institute.
The Human Genome Initiative is started (Commission on Life Sciences, National Research Council. Mapping and Sequencing the Human Genome, National Academy Press: Washington, D.C.), 1988.
The FASTA algorithm for sequence comparison is published by Pearson and Lupman.
Des Higgins and Paul Sharpe announce the development of CLUSTAL (Higgins, D.G.; Sharp, P.M. Fast and sensitive multiple sequence alignments on a microcomputer. Comput. Appl. Biosci. 1989, 5, 151-153; Higgins, D.G.; Sharp, P.M. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 1988, 73, 237-244.)
A new program, an Internet computer virus designed by a student, infects 6,000 military computers in the US.
The Genetics Computer Group (GCG) becomes a private company.
Oxford Molecular Group, Ltd. (OMG) founded in Oxford, UK by Anthony Marchington, David Ricketts, James Hiddleston, Anthony Rees, and W. Graham Richards. Primary products: Anaconda, Asp, Cameleon and others (molecular modeling, drug design, protein design).
The BLAST program (Altschul, et. al.) is implemented.
Molecular Applications Group is founded in California by Michael Levitt and Chris Lee. Their primary products are Look and SegMod which are used for molecular modeling and protein design.
InforMax is founded in Bethesda, MD. The company's products address sequence analysis, database and data management, searching, publication graphics, clone construction, mapping and primer design.
The HTTP 1.0 specification is published. Tim Berners-Lee publishes the first HTML document.
The research institute in Geneva (CERN) announces the creation of the protocols which make-up the World Wide Web.
Linus Torvalds announces a Unix-Like operating system which later becomes Linux.
The creation and use of expressed sequence tags (ESTs) is described (J. Craig Venter, et. al., Science, 252: 1651-1656).
Incyte Pharmaceuticals, a genomics company headquartered in Palo Alto California, is formed.
Myriad Genetics, Inc. is founded in Utah. The company's goal is to lead in the discovery of major common human disease genes and their related pathways. The Company has discovered and sequenced, with its academic collaborators, the following major genes: BRCA1, BRCA2, CHD1, MMAC1, MMSC1, MMSC2, CtIP, p16, p19, and MTS2.
Human Genome Systems, Gaithersburg Maryland, is formed by William Haseltine.
The Institute for Genomic Research (TIGR) is established by Craig Venter.
Genome Therapeutics announces its incorporation.
Mel Simon and coworkers announce the use of BACs for cloning.
CuraGen Corporation is formed in New Haven, CT.
Affymetrix begins independent operations in Santa Clara, California
Compugen begins operations in Israel.
InterNIC is created by the National Science Foundation.
Netscape Comminications Corporation founded and releases Navigator, the commercial version of NCSA's Mozilla.
Gene Logic is formed in Maryland.
The PRINTS database of protein motifs is published by Attwood and Beck.
Oxford Molecular Group acquires IntelliGenetics.
Microsoft releases version 1.0 of Internet Explorer.
Version 1.0 of Apache is released.
The Haemophilus influenzea genome (1.8 Mb) is sequenced.
The Mycoplasma genitalium genome is sequenced.
The working draft for XML is released by W3C.
Oxford Molecular Group acquires the MacVector product from Eastman Kodak.
The genome for Saccharomyces cerevisiae (baker's yeast, 12.1 Mb) is sequenced.
The Prosite database is reported by Bairoch, et.al.
Affymetrix produces the first commercial DNA chips.
Structural Bioinformatics, Inc. founded in San Diego, CA.
The genome for E. coli (4.7 Mbp) is published.
Oxford Molecular Group acquires the Genetics Computer Group.
LION bioscience AG founded as an integrated genomics company with strong focus on bioinformatics. The company is built from IP out of the European Molecular Biology Laboratory (EMBL), the European Bioinformatics Institute (EBI), the German Cancer Research Center (DKFZ), and the University of Heidelberg.
Paradigm Genetics Inc., a company focussed on the application of genomic technologies to enhance worldwide food and fiber production, is founded in Research Triangle Park, NC.
deCode genetics publishes a paper that described the location of the FET1 gene, which is responsible for familial essential tremor, on chromosome 13 (Nature Genetics).
The genomes for Caenorhabditis elegans and baker's yeast are published.
The Swiss Institute of Bioinformatics is established as a non-profit foundation.
Craig Venter forms Celera in Rockville, Maryland.
PE Informatics was formed as a Center of Excellence within PE Biosystems. This center brings together and leverages the complementary expertise of PE Nelson and Molecular Informatics, to further complement the genetic instrumentation expertise of Applied Biosystems.
Inpharmatica, a new Genomics and Bioinformatics company, is established by University College London, the Wolfson Institute for Biomedical Research, five leading scientists from major British academic centers and Unibio Limited.
GeneFormatics, a company dedicated to the analysis and prediction of protein structure and function, is formed in San Diego.
Molecular Simulations Inc. is acquired by Pharmacopeia
deCode genetics maps the gene linked to pre-eclampsia as a locus on chromosome 2p13.
The genome for Pseudomonas aeruginosa (6.3 Mbp) is published.
The A. thaliana genome (100 Mb) is secquenced.
The D. melanogaster genome (180Mb) is secquenced.
Pharmacopeia acquires Oxford Molecular Group.
The human genome (3,000 Mbp) is published.
Structural Bioinformatics and GeneFormatics merge
An international sequencing consortium published the full genome sequence of the common house mouse (2.5 Gb). Whitehead Institute researcher Kerstin Lindblad-Toh is the lead author on the paper; her institution lead the project and contributed about half of the sequence. Washington University School of Medicine delivered about 30 percent of the sequence, and created the mouse BAC-based physical map. The Wellcome Trust Sanger Institute in the UK was the third major partner. Other institutes in the International Mouse Genome Sequencing Consortium included the University of California at Santa Cruz, the Institute for Systems Biology, and the University of Geneva.
The draft genome sequence of the brown Norway laboratory rat, Rattus norvegicus, was completed by the Rat Genome Sequencing project Consortium. The paper appears in the April 1 edition of Nature.
NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:
- Network Science Corporation
- 4411 Connecticut Avenue NW, STE 514
- Washington, DC 20008
- Tel: (828) 817-9811
- E-mail: TheEditors@netsci.org
- Website Hosted by Total Choice