Publishing Chemistry on the Internet

Steven M. Bachrach
Department of Chemistry, Northern Illinois University, IL, 60115 USA

Peter Murray-Rust
Biomolecular Structure, Glaxo Wellcome, Greenford, Middsex, UB6 0HE, UK

Henry S. Rzepa
Department of Chemistry, Imperial College, London, SW7 2AY

Benjamin J. Whitaker
School of Chemistry, University of Leeds, UK



http://www.netsci.org/Science/Special/feature07.html

[ Introduction | Section 1: MIME | Section 2: VRML | Section 3: Java | Section 4: CML | Section 5: Standards in Chemical Publishing | Section 6: E-conferences and Journals | References ]

 

Imagine the reaction of a practising chemist of the turn of the century if he or she was suddenly transported into a modern chemical research facility. Outside of the odd pieces of glassware and the occasional Bunsen burner, the modern laboratory would be barely recognizable. Modern instrumentation has dramatically transformed the laboratory and the practices of chemists. IR, UV, and NMR spectrometers allow chemists to identify structures without destroying the sample. X-ray crystallography affords a detailed structure with bond distances and angles. Chromatography in its many variants allow for the separation of complex mixtures. While it is perhaps a simplification to call these techniques routine they are everyday workhorses in the modern research environment. Our visitor from the past would no doubt be amazed!

Now imagine he or she stepping into the research library. Whilst our time traveller might be in awe of the number of journals, the sheer volume of material published every year, the computer access to databases and the search tools, he or she would be perfectly comfortable with the primary means for disseminating chemical information - the printed journal! In fact, the chemical journal has changed very little this century. Except for the occasional inclusion of a color graphic, an issue of any chemistry journal looks very much the same as it did in 1900. Considering the remarkable innovations in communication sciences over the century, i.e. film, radio, television, computer graphics, digital audio, digital video, etc., the static nature of chemical communication is striking.

A staggering amount of chemical information is not included in traditional communications. For example, NMR spectra are often reported as simply the chemical shift and splitting pattern. IR spectra are reduced to absorption maxima. At best, an article might include a reproduction of the spectrum, but certainly not in digitized form. X-ray structural information is available only if the user has access to the Cambridge Crystallographic Database via a custom designed computer interface. Dynamic data, best represented as an animation, is obviously not included in a publication.

With the advent, proliferation, and commercialization of the Internet, we have an ideal opportunity to alter this scenario. The Internet offers a tremendous opportunity for the publication of chemical information. In this article we will address a number of developments of Internet technology that directly effect the way that chemists can communicate now, and hopefully, long into the future. The next sections introduce four technologies that have direct appeal to chemists, presented more or less in the order that the have evolved. These are (1) MIME (Multipurpose Internet Mail Extensions), (2) VRML (virtual reality markup language), (3) Java, and (4) Chemical Markup Language. In section (5), we discuss the use of chemical Internet standards. Finally in section (6) Electronic Conferencing and Journals, we examine how these technologies might be put to use in specific applications. Our intention throughout is to provide an introduction of these topics, and explore how they might be utilized by chemists.

We will assume that the reader has some knowledge of the World-Wide Web. This includes a basic understanding of HTML (Hypertext Markup Language), the concept of a client-server architecture, and how to use a web browser. A number of excellent resources for learning these basics are available on the web in addition to many books, including a guide to which we have contributed [1]. Our best advice is to play with the web, follow intriguing links and learn as you go!

[ Introduction | Section 1: MIME | Section 2: VRML | Section 3: Java | Section 4: CML | Section 5: Standards in Chemical Publishing | Section 6: E-conferences and Journals | References ]

1. Chemical MIME

A huge variety of file formats are currently utilized across the Internet. For example, graphic images can be stored in, among other choices, the GIF, JPEG, PostScript, TIFF, and PNG formats. This same freedom of choice is available for video, word-processors, page layouts and all other categories. To bring some order to this morass, Borenstein and Freed [2] developed a mechanism designed for electronic mail called Multipurpose Internet Mail Extensions (MIME). This places a content header at the top of each datafile which provides information for a mail handler on how to process the contents. The header is hierarchical, with a primary designation designed to provide some measure of sensible "default" handling of the content, and a sub-type which is more specific. So for example, a JPEG file will have the MIME image/jpeg, a Microsoft Word file is tagged as application/msword, a PostScript file has the MIME application/postscript, an HTML file is indicated by text/html, and a QuickTime movie has the MIME video/quicktime.

The real beauty of MIME is how Internet applications utilize them. Originally developed for electronic mail, the mechanism has been adopted for use with web and gopher clients. When a user makes a selection of a hyperlink within a HTML document, the client browser posts the request to the designated web server. The server accepts the request, locates the appropriate file(s) and then sends them to the client, with the correct MIME type attached. When the browser receives the file, it reads the MIME type to determine what to do with it. If the MIME type indicates that the browser can do so it displays the file within its own window. This is the case for files such as HTML documents or GIF images. If, however, the MIME type cannot be handled locally, a local preference file is inspected to determine what helper application can handle the file. The browser then launches the appropriate application program and pipes the file as input to it. The user then sees a newly opened window in which the requested file is displayed. So, for example, if a user requests a Microsoft Word file, the server delivers the file with the attached MIME type of application/msword. The browser identifies Microsoft Word as the appropriate helper application, launches Word and opens up the requested file. The user can them move to within the Word window and proceed to edit the document. Keep in mind that all the user has done here to initiate these events is to have clicked on a hyperlink!

So how does any of this relate to chemistry? Chemists use specialized drawing programs to create chemical structures, such as ChemDraw or Isis/Draw. Molecules are viewed in three-dimensions from data stored as output from molecular modelling programs or from x-ray crystallography. Spectra can be digitized and stored as x-y pairs for further manipulation. In early 1994, we recognised [3] that it would be useful to define a coherent set of standard chemical media types which the community could agree to use. Web or gopher servers with such documents could then be configured to adopt the same MIME type for files of any particular type. The advantages should be obvious. A crystal structure (stored in pdb format) can be placed on the Internet with the MIME type chemical/x-pdb, thus ensuring that it will be sent to all users with the same header. The user now has the task of deciding which program they will allocate the task of processing this file once they have received it via a Web client, e-mail program or whatever. Using a local configuration file, they can associate any particular MIME type with a local program of their choice. For pdb files, the most frequently used program is RasMol, written by Roger Sayle and freely distributable without licensing restrictions. A Netscape "plug-in" called Chemscape Chime from MDLI effectively implements RasMol directly within a Netscape browser window. The user can then manipulate the entire structure at will - rotate it, change the colors or zoom in on a particular feature (Figure 1). Contrast this with a single static and very probably monochrome image published in your favorite journal. The use of chemical MIME allows chemists for the first time to publish the actual data - structures, spectra, etc. that can then be used and manipulated by others.

chemical MIME

Figure 1. Crystal Structure of Halofantrine, an anti-maleria drug marketed by SmithKline Beecham. If you have installed Chemscape Chime as a "plug-in" to Netscape 2.0, this molecule should appear as a rotating image. If you are using other WWW clients, and have configured the chemical MIME type as chemical/x-pdb, clicking on the thumbnail image above will activate the molecule in e.g. a RasMol Window.

A number of different chemical MIME types were proposed originally, falling into several categories; (1) molecular coordinate and connectivity and sequence formats, (2) molecular modelling formats, (3) spectroscopic formats (4) generic "self-defining" modern formats (Table).

Table. Typical Examples of Chemical MIME Media Types
chemical/x-pdb Protein data base in the Brookhaven format, which has been adopted as the de facto standard by the NIH and other organizations.
chemical/x-mdl-molfile The database format defined by MDLI
chemical/x-daylight-smiles Daylight SMILES format for indicating chemical connectivity
chemical/x-jcamp-dx Format for defining IR and NMR spectra
chemical/x-cif Crystallographic Interchange format.

Here the "x" in say x-pdb is included because this proposal has not yet completed the full Internet Engineering Task Force ratification process. During the first half of 1996, the original specification [4] will be revised and submitted to IUPAC for further consideration. The open architecture also allows for the simple addition of new MIME types to be proposed and adopted without having to create new servers or clients. We envisage a small core of fundamental chemical MIME types passing through a formal ratification process, whilst new types of perhaps a more proprietary nature of interest to chemical software houses remain as "x" types, although nevertheless registered via some central mechanism for coordinating and documenting these types. Currently, these issues are debated via an e-mail list [5] and a set of guidelines for future progress have been published.

The proposal for chemical MIME type was the first watershed event in the development of chemistry publication on the Internet. This proposal opens up the Internet for the distribution of all chemical data in a fashion that allows colleagues to directly manipulate and analyse it. This is a tremendous advantage over the traditional publication route, where data would have to be hand entered back into the appropriate manipulation software. Applications of chemical MIME types are numerous. A few excellent examples are the Molecules R Us facility at the NIH, the Electronic Computational Chemistry Conferences (ECCC), the Electronic Conferences in Trends in Organic Chemistry (ECTOC), the Klotho Project, Project CORINA at Erlangen University, the ChemFinder Project by CambridgeSoft and the KISS project by Daylight Software. For further details and examples, see Chapter 12 of ref [1].

[ Introduction | Section 1: MIME | Section 2: VRML | Section 3: Java | Section 4: CML | Section 5: Standards in Chemical Publishing | Section 6: E-conferences and Journals | References ]

2. Virtual Reality Modelling Language

The MIME mechanism outlined above represents an essentially irreversible flow of data from an information server to a suitable display program on the user's computer. Early on, we were struck by the need for more subtle forms of communication between a document encoded in say HTML and displayed in a Web client window, and a molecule encoded in say pdb format and displayed in say a RasMol window. Unlike a HTML browser, programs such as RasMol cannot themselves resolve hyperlinks, and hence the chemically specific document becomes something of a cul-de-sac into which further "threads" of information cannot easily be inserted. An example of this might be the desire to associate peaks in a 2D NMR spectrum referenced with a HTML document with the individual protons responsible highlighted in a RasMol window fFigure 2). The user can navigate around the spectrum, identifying individual proton pairs associated with invidual cross-peaks in the spectrum. We implemented a mechanism for doing this under Unix using a script we termed "Chemical Structure Markup Language" or CSML [3b].

 

Figure 2. The Partial 2D NOESY Proton NMR Spectrum of the oligonucleotide CGCGTTTTCGCG, hyperlinked via an Imagemap to individual pairs of protons on a 3D display produced using RasMol and CSML markup. This "image map" is resolvable both locally (using Netscape 2.0 or Microsoft Internet Explorer) and from a server (using other browsers).

An excellent example of the use of CSML can be found in the work of Stuart Green at Leeds. He has used CSML to write an interactive undergraduate lecture course on Computer Aided Drug Design. This includes a number of interactive workshops in which students can explore, for example, the structural properties of inhibitors specific to HIV protease as potential therapeutic agents for blocking the course of HIV infection. Another example of CSML implemented at Imperial College by Chris Leach is a "guided tour" through the photosystem reaction centre, highlighting the course of electron transport in the molecule.

CSML was only fully implemented on Unix systems. Although demonstrating "proof of concept" the mechanism is rather clumsy, and furthermore is still asymmetric, in that it is not possible to select an atom in the RasMol display and have it associated with other sources of information.

Two recent developments offer attractive solutions to this problem. In 1995, a three dimensional object description language called VRML or Virtual Reality Modelling Language was introduced [6]. If HTML is thought of as a language used to choreograph the two dimensional ASCII character set, then VRML would correspond to a similar description of a set of three dimensional objects such as spheres, cylinders and other primitive graphical objects. VRML can be used to display these objects in 3D space, in which the user can move around. Unlike a custom display program such as RasMol, the VRML encoded document fully supports the hyperlink concept via URLs (Uniform Resource Locations). Thus a molecule described using VRML can have hyperlinks associated with various atoms, or larger groups, and thus a bidirectional information flow between say an HTML and a VRML encoded document can now be achieved, with each invoking the other as necessary [7]. VRML is supported both via separate browser programs, such as Webspace on Unix workstations, Whurlwind on Macintosh computers, or as "plug-ins" such as WebFX for the popular Netscape Web client. The latter also allows "in-lined" images to be displayed within the original HTML document window. We have used this mechanism to e.g. display annotated molecular wavefunctions [8a] and to associate an NMR Spectrum with individual atoms in a molecule [8b]. The mechanism is equally good at highlighting reactive centers in a molecule (Figure 3). Many other excellent examples of this mode can be viewed.

VRML

Figure 3. Hydrolysis of dimethyl sulfate, encoded in VRML, containing embedded hyperlinks associated with individual atoms and bonds. These point to HTML documents, which are in turn viewed using a suitable browser. If you have WebFX as a Netscape "plugin", the molecule will be rotatable within this area. If you are using another browser, ensure your MIME type is configured to x-world/x-vrml and that you have a VRML browser resident on your computer system.

[ Introduction | Section 1: MIME | Section 2: VRML | Section 3: Java | Section 4: CML | Section 5: Standards in Chemical Publishing | Section 6: E-conferences and Journals | References ]

3. Java

Whilst the use of VRML allows complex interactions between 3D molecular objects and other documents to be set up, the VRML objects themselves have no molecular semantics, i.e. it is not possible to perform molecularly sensible actions upon them. With the present version of VRML (1.0), it is not possible to switch from say a wireframe display of a molecule to a space-filling representation, since the bond vectors and the atomic spheres are unrelated objects in the VRML description. To do this, one needs a mechanism whereby the rendered object can be generated dynamically. In one sense, we are back to using a specific code such as RasMol. In order to retain the association of data, action and hyperlink in a single mechanism, however, one has to move to a new programming environment introduced recently called Java.

Java is an object oriented language based on C++ developed by Sun Microsystems [9]. The novelty is that Java applications can open and access objects across the net via URLs. This is the key which allows the display code, the display data (e.g. a pdb file) and the hyperlink communication to be built into a single self-contained application, or "applet" which the user downloads to their computer when an appropriate hyperlink is activated. The applet itself is integrated into the main body of the HTML document so that molecular data can be displayed seamlessly within the page. Because the object library is distributed the applet is able to pull-in any components needed to render the data that are not already on the local machine from the network. In this way the need for locally resident application programs (e.g. RasMol) to read and display specific data types is obviated. This makes life much easier for the user. Furthermore, two or more Java applets can be set up to establish mutual communication. In this way, for example, a 2D NMR spectrum can be associated with a 3D rotatable model of the corresponding molecule, with appropriate atoms again highlighted. The rotatable image can either be generated from a small subset of the type of code found in a custom application such as RasMol [10], or the Java applet can be used to encode the relevant information into VRML descriptors on the fly. This latter route has been used very effectively by Horst Vollhardt at Darmstadt, whereby individual frames from a molecular dynamics simulation can be reformulated into VRML files for further inspection.

Java was designed to be used over the network. The fact that object library components can be distributed means that executable code, potentially vulnerable to infection by viruses, is downloaded when an applet is invoked. This is potentially a serious security problem, however the language is designed to make it impossible for applications to forge access to data structures that do not belong to them. Another important consideration is that the Internet is not homogeneous but consists of a variety of hardware platforms running a plethora of operating systems. Java code is interpreted "on-the-fly" by the browser which while making it platform independent can unfortunately make rendering some large datasets, e.g. a protein structure, rather slow. However, the benefits outweigh the disadvantages and with faster networks and faster machines the future prospects for distributed and interconnected representations of molecular data are clear to see.

JAVA

Figure 4. A Window created by a Java Applet, downloaded to the user's computer. If your WWW client supports Java, you should see the image of a molecule in this window, and you should be able to rotate it with the mouse. If your browser is not "Java-enabled" only a static image will appear.

[ Introduction | Section 1: MIME | Section 2: VRML | Section 3: Java | Section 4: CML | Section 5: Standards in Chemical Publishing | Section 6: E-conferences and Journals | References ]

4. Chemical Markup Language

The technologies convered thus far relate to the visualisation and interpretation of a relatively small range of molecular data. However, the variety of disciplines and techniques that chemistry covers is enormous, so it's not surprising that information exchange between different types of molecular datafile is difficult. Moreover, everyone has their own, slighly different, view of what words like bond, valency, formula, mean. The traditional approach has been either to try to standardise on a single format (e.g. chemical/x-pdb for proteins, and increasingly for small molecules also) or to write conversion programs (such as Babel). Unfortunately the latter process always implies information loss: for example MDL Molfiles don't hold bibliographic information; PDB files don't hold full connection tables.

An even more serious problem is that legacy information decays. The formats used today may be (literally) undecipherable in 5 years' time; many do not even have formally published standards but rely on word-of-mouth and guesswork. Even when manuals are available, it's often difficult to know whether two developers apply the same semantics to a given term.

It is generally accepted that the best way to tackle these problems is through the use of markup languages and public discussion. You are reading a markup language (HTML) at the moment! Markup languages add meta-information to a document to tell the recipient more about it. For example, although humans 'know' that a title comes at the start of a document, computers don't and need something like:


<TITLE>The Nature of the Chemical Bond</TITLE>
<AUTHOR>Linus Pauling </AUTHOR>.


Having done that, however, enormous possibilities open up since this can be searched and indexed rigorously without guesswork.

The most widely used approach is SGML (Standard Generalised Markup Language) . SGML is a standard for writing languages such as HTML. (In technical terms HTML is an application, or Document Type Definition (DTD), of SGML). Two major initiatives are the marking up of the corpus of the world's literature (the Text Encoding Initiative - TEI ) and the defintion of music through SMDL (Standard Music Description Language) . Chemical Markup Language (CML) follows in this tradition.

Chemistry covers interstellar space, classification of minerals, organic molecules, proteins, nanotechnology and much else. We can agree with Democritos:



"Nothing exists except atoms and empty space; all else is opinion".

Unfortunately the 'opinion' is the hard bit and if we tackle it wrongly, severe flame-wars erupt in cyberspace! So in developing CML we have had to mix flexibility with rigour. It's being developed as a collaborative process and it's intended that discussion take place in public, in similar fashion to many other informatics tools. We are supporting this by setting up the Open Molecule Foundation, a non-profit, vendor-neutral organisation to support the development of CML and other standards and hope that many publishers will wish to be involved.

SGML gives many benefits and we'll highlight what CML will provide:

  • It separates syntax (the actual characters and their interpretation, the abstract structure of the document) from semantics (the value put on words, context, attributes, etc.)
  • It gives documents a much richer structure (e.g. for indexing, searching, etc.)
  • Documents can be linked to other documents rigorously. One important result is that terms in CML documents can be linked to publicly available glossaries.

There is a current dichotomy on the 'Net between content (what something is) and form (how it is to be rendered). Both are important, but much of the current 'hype' is over form, which means that some tools take little account of precise content. This is unacceptable for chemical applications where the partcipants need to know that precise information has been delivered. CML will adhere strictly to the SGML standard since without this it is impossible to pass content accurately.

CML consists of three parts (in ascending hierarchy):

These are quite general, so that markup might appear as



<X.VAR TITLE="Heat of Formation" REL="glossary" HREF="/chem/theor?deltahform" UNITS="kilocalorie/mole">12.34</X.VAR>

The most important result of this is that a very large body of current chemical information can be encoded with CML. CML documents can have a very flexible structure and have already been used to describe precisely:

  • Instrument output (e.g spectra and crystallography).
  • Program output (e.g. molecular orbital calculations).
  • Database entries.
  • Publications (management of whole papers is already tractable).

Extensibility comes through creating new document structures and adding terms to the glossaries, both of which can be done without redefining the format.

New horizons

Even 'simple' documents (such as a 'short communication') have a rich structure which can be described by groves in the DSSSL approach. A simple example is a fine grained Table of Contents (which we use for rendering and navigating CML documents). Tools such as CoST from Joe English allow the full structure of a document to be manipulated in many ways:

  • Rendering (e.g. as an interactive hypertext TOC).
  • Translation (e.g. into LaTeX, PostScript, HTML)
  • Searching. Finding specific nodes within the document tree by context, attributes, content (or combinations).
  • Merging/Updating/Editing, etc.
  • Association of properties (e.g. numeric quantities can have units which undergo dimensional analysis)

These can revolutionise the way that chemists view information. CML provides the richness for a continuously updated Laboratory Notebook that is future-proof and searchable. Very rich information can be extracted from documents which at present are only presented as text. In many cases data may be better organised as marked-up CML rather than trying to force them into (say) Relational form, since CML naturally supports the way that scientists think about information.

CML documents can cover a very wide variety of disciplines, but with tools like CoST it becomes possible to automatically search one document for components that are required by another application. For example, it's possible to request 'run semi-empirical calculations on all molecules in this document with less than 30 atoms. For the first time, data deposited with publications is an integral part and starts to live.

A collaborative approach

CML is offered as a communal standard and will flourish on the basis that lots of people make small affordable contributions. For example, if program and instrument developers commit to their output being available in CML it is is then automatically available to a huge range of applications (rendering, translation, calculation, editing) with no additional effort. There are already tools available (see the OMF site for postprocessing, and a powerful browser/rendering/search engine (UNIX, and hopefully other platforms shortly).

[ Introduction | Section 1: MIME | Section 2: VRML | Section 3: Java | Section 4: CML | Section 5: Standards in Chemical Publishing | Section 6: E-conferences and Journals | References ]

5. Standards in Chemical Publishing

The preceeding discussion highlights one pre-eminent theme that has only emerged in a relatively focused way only in the last year or two; if chemists are to use the Internet to its full potential, we will need to develop and widely adopt simple and flexible standards for the purpose. Many of the older de-facto standards for information exchange, which originated with the nomenclature studies in the first half of the century, and then evolved into simple formats for database use, will need to be supplemented with newer standards in which the content is structured (ie capable of unambiguous parsing) in a robust and future-proof manner. Experience suggest this is a non-trivial objective, and one which will need significant resource applied to it.

Perhaps one very explicit example of the problems we face is to be found in the HTML which defines this very paper. Early versions (ie pre 1993) of HTML allowed only text to be marked up. Images, with their very weakly defined semantic content (ie only a Human can really parse a bit-mapped image) were not a feature of this HTML definition. The introduction of the <IMG> tag (ie a tag which does not "containerise" information by bounding it with a </IMG> tag) added little to the semantic content of the document, but did contribute in no small measure to the explosive growth in popularity of the Web via its support in the Mosaic browser. Subsequently, the <IMG> tag was incorporated into the standards process for HTML (V 2.0), a process which has thus far taken around two years to achieve. Inevitably perhaps, developers who analysed the success of Mosaic decided that they would offer the audience what they thought they wanted in new generations of Web clients, in the expectation that the standards process would eventually catch up. One such tag which has already been used in the HTML present in this document is the <EMBED></EMBED> container. Like <IMG>, this offers the user "form" rather than "content", in defining how the information will be presented on the screen. The precise form used is;
<embed border=0 src="halo.pdb" align=abscenter width=150 height=150 spiny=360 startspin=true display3D=ball&stick pluginspage="http://www.ch.ic.ac.uk/cgi-bin/plugin.cgi"></em bed> <noembed> <A HREF="halo.pdb"><IMG width=150 height=150 SRC="halo.gif" alt="chemical MIME"></A></noembed>

The content is defined purely by the halo.pdb file, whilst the remainder of the attributes define how the content will be presented within Netscape 2. The <NOEMBED></NOEMBED> container identifies information that is displayed only with earlier versions of Netscape or with other browsers. These containers can be viewed as deriving from fairly lenient implementations of SGML, and have not yet been accepted as part of the standard HTML language. In a sense, HTML is showing signs of developing in a relatively ad-hoc manner, and of no longer being driven by strict implementation of SGML guidelines. Thus it is increasingly easy to construct examples of 'HTML' which display contradictory information on different browsers. For example, in the above sequence, the need that the <EMBED> and the <NOEMBED> containers (another example is the <FRAME> and <NOFRAMES> tags) should not contradict can only be achieved by a human, and not the structure of the language. Such a situation would be totally unacceptable for chemistry, and the only solution is for the community to agree to a single interpretation of each 'language'. We hope that by highlighting such problems in this article, the chemistry community can benefit and learn from experiences of the HTML development process.

[ Introduction | Section 1: MIME | Section 2: VRML | Section 3: Java | Section 4: CML | Section 5: Standards in Chemical Publishing | Section 6: E-conferences and Journals | References ]

6. Electronic Conferences and Journals

In this section, we illustrate two specific applications of the techniques discussed above. For the past two years, we have been actively developing technology for holding chemistry conferences on the Internet [11]. A detailed description of the first of these, the First Electronic Computational Chemistry Conference, has been published [11a], and we do not wish to repeat the details here. Instead, we will summarize the advantages and disadvantages of electronic conferences and then present a brief overview of our latest conferencing tools.

Advantages of electronic conferencing

  • Costs - Travel to traditional meetings can be very expensive - air fare, hotel, food. Electronic conferences require no travel. Participants enter the conference from their office or even from home.

  • Time - Traditional conferences are held on specific dates at specific times in specific sites. This requires frequently substantial time commitments - first to travel to the meeting, then attend (where they might be multiple concurrent sessions that force one to select which talks to hear and which to miss), and then to travel back. Electronic conferences have not yet been held in real-time, but rather as long- term poster sessions that allow participants to visit at their leisure, read the papers, contemplate them, craft a well-thought out comment or question. Electronic conferences have been held typically for one month at a time, providing ample opportunity to visit every paper, if so desired.

  • Discussions - Frequently, the discussion portions of lectures are the most interesting and stimulating. How often have we attended a lecture where the speaker has gone over the allotted time, thereby minimizing the question and answer period? How often do we come up with a question an hour after the presentation. The discussion period of an electronic conference lasts for the entire length of the conference. Discussions are held (typically) by electronic mail that is distributed to all participants. Without the time constraint, "e-conferees" can take time to understand the papers and prepare questions, and the presenters can then comment in detail.

  • Presentations - Electronic conferences have been held on the WWW and thereby can take advantage of Internet technology. Papers can be very detailed. Many have included structures, molecular coordinates, color images, and video, making use of MIMEs as discussed in the previous section. Real-time manipulation of data at traditional conferences is not standard practice, while it is routine at an e-conference.

  • Enfranchisement - Because of costs and travel restrictions, many scientists cannot attend meetings. For example, budget cuts at many universities, political circumstances at many emerging countries keep people for meetings. Graduate students are often not able to attend. E-conferences, lacking the travel requirement and generally cost-free, have been truly international and open to all.

Disadvantages

  • Personal contacts - Perhaps the most important missing component of electronic conferences is the personal and informal contacts and connections made at traditional in-person conferences. The ability to meet and talk in a casual setting (like over a beer) is difficult to recreate in a virtual meeting.

  • Acceptance - As with any novel technology, electronic conferences have been slow to catch on as a normal and accepted method from presenting research. As such, many scientists have been reluctant to present their research at an e-conference, saving it instead for more traditional publication and presentation avenues.

New Developments in Conferencing

While ECCC-1 was generally regarded as quite successful by the participants, there was some complaints concerning how discussions were held. Discussions were facilitated by an electronic mailing list. All participants subscribed to the list. A question or comment was then sent to the list, which automatically forwarded the message to all participants. We requested that each message have the paper number in the subject to enable everyone to filter messages. Many participants however felt that getting every single message, even ones relating to papers they had no interest in was a burden.

To address this concern, at Northern Illinois University we developed a new conferencing interface for ECCC-2. Instead of a mailing list, discussions proceeded through a form-based web interface. Every paper had a discussion section devoted solely to it. Comments and questions were sent via a form. Participants could then select the individual messages they wished to read. Direct response to each message was enabled again through a form. The advantage of this system is that each participant could read only messages they want, choosing to never look at messages relating to papers they were not interested in. The system allows each user to view messages by date, author, or thread, to mark messages as read, and thereby only see new messages, etc. Each user was then enabled to interact with the discussions in an individual manner. We hope to make this conferencing system available to the public in the near future.

The ECHET96 conference, organised from Imperial College, is expected to include many of the MIME, VRML and Java features described above, and to extend the concept of organising molecular information into a "hyperglossary" of indexable and searchable molecular content.

Electronic Journals

The ECTOC conferences, of which ECHET96 is the second, are also viewed very much as test-beds for new generations of electronic journals. Thus ECTOC is associated with the CLIC electronic journal project [12] for converting Chemical Communications to electronic form. Most commercial publishers also have experiments underway in electronic publishing, although the majority of these represent attempts to map an existing printed journal into electronic form. Inevitably, this is bound to limit the degree of experimentation possible, and it seems likely that the most innovative styles are likely to emerge from electronic-only media. A tantalising hint of what may emerge from such experiments makes use of <FRAME>, another "non-standard" form tag introduced by Netscape. This has allowed the introduction of a "chemical navigation strip" based on the chemical structures discussed in the main body of the article.

What is emerging from these pilot projects is that the distinction between what is currently termed a "conference" and the style of future electronic journals is likely to blur significantly. Both forums offer their participants many additional features compared with their conventional antecedants, in effect forming a diverse information environment within which chemistry can be conducted. "Clubs" such as BioMedNet are already available as commercial products, and such facilities for chemistry can also be expected to emerge.

Conclusions

We are still are the very beginning of what Internet technology can offer chemists. Documents such as the one you are viewing now represent a radical new departure from conventional paper bound publishing and the beginnings of a chemically specific software industry in this area can be seen. Whatever directions this develops in, we do make an impassioned plea here for adherance to standards. Gaining widespread acceptance for such standards at a time when technical developments often seem to occur with bewildering speed is going to test us to the limit. Perhaps the greatest challenge is thus not a technological one, but a sociological one of chemists learning how to use these technologies efficiently and most importantly to use them as creative tools to enhance the progress of their science.

[ Introduction | Section 1: MIME | Section 2: VRML | Section 3: Java | Section 4: CML | Section 5: Standards in Chemical Publishing | Section 6: E-conferences and Journals | References ]

References and Citations.

 
  1. The Internet: A Guide for Chemists (Ed S. M. Bachrach), ACS publications, 1996.
  2.   N. Borenstein, and N. Freed, "MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft, September 1993.
  3.   (a) H. S. Rzepa, B. J. Whitaker and M. J. Winter, J. Chem. Soc., Chem. Commun., 1994, 1907; (b) O. Casher, G. Chandramohan, M. Hargreaves, C. Leach, P. Murray-Rust, R. Sayle, H. S. Rzepa and B. J. Whitaker,J. Chem. Soc., Perkin Trans 2, 1995, 7. For a review of the history of the development of chemical MIME, see A. Davies, European Spectroscopy News, 1996, in press.
  4.   P. Murray-Rust, H. S. Rzepa and B. J. Whitaker, IETF Internet Draft, May-October 1995. See http://www.ch.ic.ac.uk/chemime/
  5.   To subscribe to this list, send a message to listserver@ic.ac.uk with the following content: subscribe chemime your name. To view these discussions, connect to http://www.ch.ic.ac.uk/hypermail/chemime/
  6.   G. Bell, A Parisi, M. Pesce, "The Virtual Reality Modeling Language", November 1994. See http://www.eit.com/vrml/vrmlspe c.html
  7.   O. Casher and H. S. Rzepa, J. Mol. Graphics,1995, 13, 268; O. Casher and H. S. Rzepa, Proceedings of the 14th Eurographics Meeting, March 28, 1996, London.
  8.   (a) O. Casher, G. Suñer and H. S. Rzepa, Electronic Conference on Trends in Organic Chemistry (Ed. H. S. Rzepa, J. M. Goodman and C. Leach), CD-ROM, Royal Society of Chemistry, 1996; (b) B. J. Whitaker and P. Pudiate, unpublished results.
  9.   The Java Language: A White Paper. Sun Microsystems, Mountain View, Ca. 1995.
  10.   R. L. Kinder, H. S. Rzepa and B. J. Whitaker, work in progress.
  11.   (a) S. M. Bachrach, J. Chem. Inf. Comp. Sci., 1995, 35, 431; (b) H. S. Rzepa, Tr. Analyt. Chem., 1995, 14, 240.
  12.   H. S. Rzepa. Trends in Analyt. Chem., 1995, 14, 464; B. J. Whitaker and H. S. Rzepa, Conference on Chemical Information, Nimes, France, October, 1995; D. James, B. J. Whitaker, C. Hildyard, H. S. Rzepa, O. Casher, J. M. Goodman, D. Riddick, P. Murray-Rust, New Review of Information Networking, December, 1995, in press.

[ Introduction | Section 1: MIME | Section 2: VRML | Section 3: Java | Section 4: CML | Section 5: Standards in Chemical Publishing | Section 6: E-conferences and Journals | References ]



NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:

Network Science Corporation
4411 Connecticut Avenue NW, STE 514
Washington, DC 20008
Tel: (828) 817-9811
E-mail: TheEditors@netsci.org
Website Hosted by Total Choice