World-Wide Web Scientific Publications:
Future Potential and Current Problems
Allen B. Richon, Ph.D.
Molecular Solutions Incorporated
1116 Miller Mtn. Road
Washington, DC 20008 USA
E-mail: mambos@molsol.com
Published April, 1997
The World Wide Web has become a ubiquitous presence in our lives. Consider that the Web, Netscape, Yahoo, Java, and a host of other terms were unknown just five years ago. Yet, at this point in time, it is virtually impossible to avoid the Web. The long shadow of the Web can be found on products ranging from automobiles (http://www.toyota.com) to video/music (http://www.sony.com), current movies (http://www.thelurker.com) to television (http://www.hbo.com), car repair (http://www.midas.com) to beer (http://www.millerlite.com), and almost any other item of commerce. URLs are now a routine part of almost every advertising campaign; whether the ad runs on television, at the movies, in a newspaper/magazine, on a billboard, or on a computer site. While the Web is still in a formative stage, this movement to a new user interface has afforded a unique opportunity to fundamentally change the manner in which scientific information is created and communicated.
In 1995, Network Science Corporation created a unique Web-only publication, NetSci, whose primary objective was to provide an interactive, up-to-date collection of resources dedicated to pharmaceutical and biotech R&D. NetSci was thus designed to be a central repository for primary research papers, industry news, profiles of emerging technologies and companies, literature reviews, new product announcements, financial data on publicly traded companies, software listings, Web resources of interest to scientific professionals, as well as other features requested by the readers. This paper examines some of the problems that the editors have encountered, reviews Web-based tools of interest to the chemistry community, and highlights several open issues that impact the effective use of the Web for the dissemination of scientific information. Sections for this paper include:
Definition of Terms (and a Little History)
Given current statistics for the growth in the use of the World Wide Web, it should not be surprising that many of people trying to navigate the Web have not been exposed to the terminology of this environment. Many of the questions that have been sent to the NetSci Web site highlight the lack of understanding that exists about the technology and how to use it effectively. This section will provide an introduction to the language of the Web and set the framework for some of the problems to be discussed in later sections of this paper.
A network is generally defined as a collection of two or more computers which are capable of communicating with each other (simple networks are also referred to as a Local Area Networks or LANs). In the next step up in the computer network food chain, an internet is a collection of two or more interconnected computer networks, also known as Wide Area Networks (WAN). The Internet is the world-wide collection of computers, software, data, and people which are connected by (and communicate by) a common set of protocols. In other words, the Internet is a large network of networks.
The foundation for the Internet started during the early 1970s as a part of the Defense Department's efforts to share data efficiently among its laboratories. The original network (which was called ARPANet) connected four high energy physics laboratories. The size of the national network grew significantly when NSF attempted to use ARPANet to connect its five supercomputing centers to the United States' university system. When bureaucracy and slow transmission rates hampered their efforts, NSF created its own network; known as NSFNet. While the network was originally designed to provide access to academic research laboratories, its use mushroomed quickly. By 1995, the network was using T3 lines (rather than the 56 Kbytes lines originally specified), was being managed by a dedicated group, and the restrictions against "commercial" traffic had been lifted. While the Internet is used as a means of commerce, the origins of the network have several implications on the manner in which business is transacted on the Net.
The historical growth of the Internet, as measured by the number of unique internet addresses requested, has been profiled by Matthew Gray as follows:
- 9.5 million hosts on the Internet in January 1996
- 4.9 million in 1995
- 2.2 million in 1994
- 1.3 million in 1993
A more restricted definition of the size of the Internet was derived by measuring the number of answers to electronic queries made by specific address on the net. This approach yielded the following estimate:
- 1.7 million hosts on the Internet in January 1996
- 1.0 million in 1995
- 0.6 million in 1994
- less than 0.4 million in 1993
Thus, the popular literature reports estimates ranging from 1.7 million to 10 million machines on the Internet. According to the Network Solutions registration group, the networkas of 1996 lists approximately 10 million hosts; and this total is growing at the rate of 80,000/month. These numbers however indicate only the number of unique machines that are connected to the Internet. According to Computer Intelligence Infocorp, La Jolla, California, the estimate for number of people who use the Internet regularly is 15 million. Yet another metric for the Internet has been published by WebCrawler. According to their counters, there are currently 145,166 public Web servers on the Internet. Given this volume of traffic, how are all of these machines connected and how do they communicate?

Figure 1: Growth of the Internet and World-Wide Web
Figures in Thousands
Computers attached to the Internet communicate using a set of protocols which are collectively referred to as the Transport (some texts use the term Transmission) Control Protocol/Internet Protocol, or TCP/IP. One of the items defined by TCP/IP protocols is the electronic address of each computer on the Internet. This address currently is a 32 bit binary number (10000110 00011000 00001000 01000010 for example) which is more generally reported in dotted-decimal notation (134.24.8.66). This figure combines the wide area network number (134 in this example), sub-network numbers (24 and 8) down to the individual machine (or host) number (66). Since sequences of random numbers are not easily remembered, IP address also can be assigned unique names (such as netsci.org, netsci.com, netsci.edu, etc.). At this time, these names, known as domain names, are assigned, registered, and managed by the Internet Network Information Center (InterNIC), a division of Network Solutions Inc., Reston, Virginia.
As noted above, domain names are composed of two parts: the top-level domain name and the second-level domain name. In the US, the top-level name is currently limited to .com for commercial organizations, .gov for government organizations, .edu for academic organizations, .org for non-profit groups, .net for network organizations, and .mil for military organizations. In countries outside of the US, the top-level name is a two letter country code such as .de for Germany. The second-level domain name is free text. Recently, the Internet working group has proposed several revisions to the domain name protocol. Under the guidelines proposed by the Internet Ad Hoc Committee (IAHC), the top-level domain name will be expanded with seven new generic names (.arts - culturally oriented groups, .firm - businesses, .web - web related, .rec - recreational, .info - information providers, .nom - individuals, and, .store - businesses). In addition, the number of firms permitted to register names will be increased from 1 to 28. There are also recommendations which will permit companies to reserve entire blocks of names. For example, AT&T would be able to purchase any domain name containing the second-level domain name att.
While some people find it easy to remember names, computers still communicate via bit streams. Thus, a mechanism is required which equates domain name to the physical IP address. The maps that keep domain names and their corresponding IP addresses straight are known as Domain Name Service (DNS) servers. Each network manager creates and maintains a copy of the DNS database for their site to enable users at the site to locate other computers on the Internet.
Computer naming schemes and networking protocols are only the start of a network. In addition, programs are required which facilitate the communication of information. During the late 1980s, scientists at the European Laboratory for Particle Physics (CERN) designed a new communications architecture which utilized the Internet as a delivery mechanism. Rather than relying on the traditional communication protocols which use large amounts of a computer system's resources (i.e., Telnet, File Transfer Protocol [FTP], Simple Mail Transfer Protocol [SMTP], etc.) the new protocol was designed to be a small process which established communication channels only when information was to be requested or transferred. The new protocol also facilitated access to a wide variety of information independent of the computer hardware on which the information resided.
The World-Wide Web (WWW) was thus defined to be a virtual network of documents formed by connections between computers which run a common set of protocols. The connections between the computers were formed by hypertext links which are known as Uniform Resource Locators or URLs (some publications also refer to URL as Universal Resource Locators). The main communication protocol used by the Web is the HyperText Transport Protocol (HTTP).
HTTP is a stateless transport mechanism. That is, it forms a connection between computer systems, transfers information or requests, then drops the connection and retains no information about past connections. With other types of communication protocols, one or more connections between computers are maintained until the user exits the process. As an example, when Telnet is run, it creates and maintains one communication channel during the entire time that a user is logged-on. When FTP is run, it opens two channels; one to send requests and a second to receive data. Since computer systems' architectures limit the number of simultaneous communication channels that can be run, these programs place a heavy load on the machine.
Under the HTTP paradigm, the computer which requests information is known as the client or Web browser. The system which delivers data in response to HTTP requests is the HTTP daemon or server. The naming scheme used to locate data on the Internet is the URL which has the general format:
protocol://hostname/pathname/filename
where the protocol is the communication method (http, ftp, mailto, gopher, usenet, ...) and the hostname is the Internet domain name of the server on which the information resides (eg., netsci.org). The pathname and filename are the specific disk location for the requested file (the full location for this article is thus http://www.netsci.org/Science/Special/feature12.html).
Recently, companies have started to utilize internet technology to organize corporate data. These collections of computers, Intranets, are usually isolated from the Internet behind a firewall. Intranets combine the communication and display mechanisms found on the Internet and the World-Wide Web with functionality contained in groupware programs such as Lotus Notes.
A few other terms related to the Web and Web traffic include:
- HTML (HyperText Markup Language): A subset of the Standard Generalized Markup
Language (SGML) which is the language used by most desktop publishing programs. HTML
defines the structure of a document. It is composed of text, image files,
hyperlinks (the URLs above), and sets of instructions (or tags) which define the manner
in which each of these elements will be displayed.
- CGI (Common Gateway Interface): The specification that defines how HTTP servers
communicate with programs which reside on the Web server. These programs are used
to process fill-in forms, perform database searches, and other applications.
- Hits: When Web sites report the "traffic" they have on their pages, they typically
count the number of times that a browser requests information from their site. This is
reported as the "hit rate" for the site. This number is generally very large compared to
the number of visitors to the site. As an example, consider a site which has one "page"
that contains several paragraphs of text and ten image files. When the http server
sends this page to the Web browser, it counts as "hits" the initial request to open a
connection to the server, the request for the block of text, and the requests for each
of the image files. The count for a single person accessing this page is therefore set to
12.
- MIME (Multi-purpose Internet Mail Extension): is an extension of the Internet mail
protocol which supports mail containing multimedia data. The MIME standard defines mail
headers that are used to inform the server that a mail message contains several parts,
that the parts are separated in a defined manner, and the type of data contained in each
part of the message. On the client side, the MIME standard is used to define the
application program needed to view the data being sent by the server.
Unexpected Events
In addition to database services, one of the areas of highest interest to the scientific community is electronic publishing. Unfortunately, assessing the potential impact of publishing on the Web is complicated by the task of separating hype from substance. NetSci's first issue went on-line in July, 1995. During the last two years, events occurred which were expected and managed (e.g., late authors, disappearing authors, articles with unacceptable content, and changing editorial focus) as well as those which were not expected, but which were managed. There were three events however, which were completely unexpected, but which have served to highlight the difficulties of creating a scientific publication on the Web.
Event 1 - Support:
When the possibility of publishing E-zines (electronic magazines) was realized, many organizations anticipated that they would be able to launch new publications and that their efforts would be financed through subscription fees and advertising revenue. After all, the Web offers several advantages when compared to print-based publications:
- low startup costs
- reduced ongoing publication costs
- rapid turn-around for issues
- the ability to deliver multimedia content
- accurate tracking for readership
However, as several large publishers have found1, traditional approaches and expectations have not translated from print to electronic media.
Subscription fees: As detailed in the Introduction, the creation of the Internet (and by extension, the WWW) was driven by U.S. academic centers and was funded by the National Science Foundation. Historically, the Internet has been used for the free interchange of ideas and has been provided to users free of charge. This culture has not been changed by the rush of commercial organizations who have joined the Internet.
In the popular press, several publishers including The Wall Street Journal, Time Warner, and Microsoft announced and/or have launched subscription-based products. To date, these efforts have met with little success. Gannett's USA Today electronic version was dropped after only 1,600 people subscribed to the service. The Wall Street Journal stated in a recent story that their on-line edition currently has 50,000 paid subscribers in contrast to the 600,000 users who registered before the organization began charging for access. As Michael Kinsley (Editor in Chief for Slate, the Microsoft political journal) stated in his humorous announcement that the E-zine would not charge for access, "Right now there are too many people who are too damned cheap ... er, we mean ... too engaged by the novelty of the medium to pay extra for content."
The scientific press has encountered similar difficulties in migrating to the Web. The American Association for the Advancement of Science has created Science Online, an extension to their weekly print publication, Science. The printed journal cites an ABC- audited paid circulation of 162,306. The electronic extension of the journal, Science Online (http://www.sciencemag.org), is available to subscribers for a surcharge of $12/year. Given the difficulty of accurately counting the number of "people" who visit a given Web site, readership figures for the E- journal have not been published. The American Chemical Society also has announced the creation of a new chemical Web site - ChemCenter. This site includes sections for professional services, lists of ACS sponsored conferences and publications, an education center, a shopping center for ACS merchandise, and a set of chemistry resources. Articles on the site are currently limited to the "Hot Article" of the week from C&E News and the Tables of Contents from selected journals. While much of the Web site is available at no charge, access to the journals is fee-based. Once again, traffic to the "for-fee" section of ChemCenter is not published.
If the experiences of Webmasters who have published statistics for their sites are any indication, only a percentage of the readership which regularly visits the open section of scientific Web sites will pay to access the fee-based content. Can this failure to attract paying customers be attributed to the "only what is available free" mentality? Perhaps. However, an additional factor also might impact these Web sites; many individuals who routinely surf the Web wish to remain anonymous and are therefore reticent about leaving identifying information.
NetSci's founding company, Molecular Solutions, conducts several surveys of the computational chemistry marketplace and publish this information in the form of annual reports of the spending and staffing trends for the various groups. In order to accurately gauge the level of interest in this type of information, the completion of registration forms is required prior to accessing the reports. Over the past two years, the editors at NetSci have found that less than 50% of the browsers that have encountered the registration form have entered the information and accessed the report - even though there is no charge for the information. While the desire to maintain privacy on the Web is understandable, as will be discussed later in this paper, it is misplaced given the technology that is available to monitor Web traffic.
Another marker of Web traffic and people's reaction to requests for personal information is the Guestbook. Almost every site on the Web has a guestbook where the Webmaster requests a minimal amount of information about people who visit the site. At most sites, only a small fraction of the people who visit will provide information. NetSci's ratio of "unique hits" to Guestbook entries currently is around 1%. Is the privacy mind-set a factor which impacts a commercial site's ability to attract paying members? While it would be interesting to compare notes with other commercial Web sites, most organizations consider traffic patterns at their sites to be confidential information.
Advertising: If it is not possible to support Web-based publications through subscription fees, the next potential source of revenue is advertising. According to Forrester Research, Internet advertising reached $44 million during 19962. However, most of this amount is split between the four leading online services, Yahoo, Netscape, and other "high volume" sites. Once again, models defined by traditional media have not translated to the Web.
Advertising revenues in the traditional media outlets are determined by demographics. Advertisers want to know
- how many individuals read a given publication?
- what is the background of the average reader?
- what is their spending authority?
- do they read the advertisement?
- do they remember the ad?
- do they buy the product as a result of the ad?
As an example, the Wall Street Journal has an audited daily circulation of 385,000. A full page, black and white ad will cost between $380,000 and $675,000 depending upon placement. The 1995 U.S. Rate Card for Science (162,306 circulation) sets the cost for a single full-run display page ad at $6,190. Prices climb sharply when one examines television rates. During the Super Bowl (100,000,000 circulation), the cost for advertising was $40,000 per second or $1,200,000 for each 30 second slot. While rates are directly tied to the number of readers and their ability to purchase the product being advertised, how does the company which placed the ad know how many people actually read it and responded to its message? In short, how can circulation be equated to readership?
Reaching millions of scientists weekly, BLAHBLAH has the largest worldwide circulation of any scientific publication |
![]() |
Circulation? |
Readership? |
Currently, the tools available to accurately monitor how many readers "see" a specific advertisement and purchase the product as a result of the ad are limited. Many of the industry trade magazines have attempted to quantify the data through the use of "bingo cards" and reader surveys. These surveys generally extrapolate readership from a limited set of responses. In contrast, online publications can provide readership demographics to the level that the amount of time a specific machine spends on a given "page" can be tracked, recorded, and reported. Additionally, the geographic location of the browser that accessed the page can be collected and analyzed. Despite the availability of these tools, only a limited number of E-zines have been able to leverage readership into ad revenues. Where print-based publications can equate readership levels of 10,000 into significant income, online publications are required to demonstrate millions of hits on a page to justify comparable ad rates.
In addition to advertising dollars, print publications have traditionally relied upon subscriptions as a major portion of their revenue. In the case of scientific publications, subscriptions are generated by scientific content. Since scientific reputation (and as a result professional reward) is determined in large measure by publication record, content is easy to generate. This has generally not been the case for online publications.
As one of NetSci's Corporate Sponsors, Apple Computer invited Network Science Corporation to share booth space during the Spring 1996 ACS meeting. One of the tasks that Netsci's editors undertook during the meeting was to issue a call for papers from the scientists attending the meeting. In most cases, the response to this call was "What's in it for me? I've got kids going to college and I need to pay their tuition." Based on informal surveys of other online-only journals, this experience is not unique to NetSci. When this is contrasted with the volumes of papers blindly submitted to "traditional" publications, the discrepancy becomes even more pronounced.
The impact that the issues above will have on the future of electronic media is easy to predict - without a method to generate the revenues needed to support operations, Web publishers will be unable to continue to deliver high quality material. Many Web sites were started as a "labor of love" with the idea that they would become self-supporting. If this transition does not occur, the quality of the content will become more limited as Web editors have less time (and resources) to devote to their publication. An obvious first step in resolving the problem of content is for the pool of authors within the scientific community to use online journals as an additional avenue to publish their work.
Event 2 - The Myth of the Anonymous User
If you have spent much time on the Web, you have encountered guest books, registration forms, and other methods that Web sites use to collect information about their readers. In general, groups which range from professional societies to pornography providers will use a variety of approaches to gather some amount of information in order to build sites which reflect their clients' preferences. Occasionally, NetSci has received registration forms such as the one below
- Date: Friday, 10 May 1996 18:27:02 -0400
- From: mickey@mouse.com
- To: netsci@netsci.org
- Subject: Guestbook Entry
- mickey@mouse.com registered with NetSci
- Mickey works in Chemistry
- They use Dilbert to Access the Web
- They registered to read the Introduction to Drug Development
While the Internet and the Web do grant a certain amount of anonymity, visitors to Web sites appear to be unaware of the very large footprints that they leave at every site they visit. And technology is making it possible to collect even more details about Web traffic.
Hit files and what can (and cannot) be done with them: Each time a browser accesses a Web site, the Web server needs to collect a certain amount of information in order to "serve" the Web pages that have been requested. At a minimum, this includes the name of the page requested and the address of the machine posting the request. A typical entry for a hit file has the form:
- 145.245.81.150 - - [26/aug/1996:02:20:50 -0500] "GET /netsci/Software/moilview.html HTTP/1.0" 25800
- saudade.pnl.gov - - [12/Feb/1997:19:37:12 -0500] "GET /netsci/Software/apex.html HTTP/1.0" 404 -
- 152.163.237.39 - - [12/Feb/1997:22:21:49 -0500] "GET /netsci/Resources/Biotech/arqule/top.html HTTP/1.0" 1354
The Web server (technically, the http daemon) maintains a file of every request made of a Web site, the date and time it was made, and the address of the machine that issues the request. Webmasters use these files to audit the use of the site and to look for problems such as broken references. If the http request is processed by a CGI program, a set of system variables can be used to ascertain the identity of the machine which posted the request. The variable REMOTE_HOST contains the name of the remote host. The REMOTE_ADDR variable contains the remote IP address. The guestbook entry above can thus be converted to:
- Date: Friday, 10 May 1996 18:27:02 -0400
- From: mickey@mouse.com
- To: netsci@netsci.org
- Subject: Guestbook Entry
- mickey@mouse.com registered with NetSci
- Mickey works in Chemistry
- They use Dilbert to Access the Web
- They registered to read the Introduction to Drug Development
- Remote host: stockyard61.onramp.net
- Remote IP address: 199.184.212.224
If the address is sent to the InterNIC database as a query, the name of the company which owns the computer, the system manager, and other information is returned. As a result, Mickey.Mouse can be tracked to the following level of detail:
- On-Ramp Technologies, Inc. (NET-ONRAMP-FTW-NET)
1950 Stemmons Freeway, Suite 5001
Dallas, TX 75207
Netname: ONRAMP-FTW-NET
Netnumber: 199.184.212.0
Coordinator (Note: this has been changed to save the manager grief)
System Manager (system) root@onramp.net
(214) 555-1212
Domain System inverse mapping provided by:
NS.ONRAMP.NET 199.1.11.2
NS2.ONRAMP.NET 199.1.11.15
Record last updated on 10-May-94.
The InterNIC Registration Services Host contains ONLY Internet Information
(Networks, ASN's, Domains, and POC's).
Please use the whois server at nic.ddn.mil for MILNET Information.
The type of log files discussed above are not only collected by Web servers, they are also collected by corporate firewalls which are used to control access to a company's data. Many corporations use these logs as a means of monitoring employee use of the WWW. Visits to Web sites such as Playboy, Penthouse, or assorted chat rooms during business hours can result in a visit from the corporate Internet police force and termination of your employment. Thus, it is wise to check on your company's policy regarding the use of the Internet in order to avoid problems.
Until recently, hit files were the only method available to monitor traffic at Web sites. The invention of Cookies has added a new tool in the drive to collect demographics. In the first implementation of the http/html protocol, the connection between client (the browser) and server (the Web page) was considered to be static; i.e., there was no ongoing connection between the two entities. Recent extensions to this standard have resulted in the adoption of a persistent client state model - the http cookie.
One of the many "Cookie" messages received at NetSci was the following:
- I am using Netscape 3 on a PC. I must have received at least a dozen requests for a Cookie from your software. I declined the first one, so why do you keep bothering me with requests. I will not accept a cookie since I do not know what a Cookie is or what it does.
Cookies are data files (tokens) that are exchanged between client and server. These are 256 character records (see the example below) which are created by the Web server and are stored on BOTH the server and the client computers. On the client system (the machine that is on your desk), the data is stored in the file MagicCookie (Macintosh) or COOKIE.TXT (PCs/Windows) which is located in the browser's preferences directory. All of the current versions of the browsers available support cookies, but why are they used?
-
# Netscape HTTP Cookie File
# http://www.netscape.com/newsref/std/cookie_spec.html
# This is a generated file! Do not edit.
- .netscape.com TRUE / FALSE 946684799 NETSCAPE_ID 10010408,10d4a276
An Example of the MagicCookie File
In many cases, Web sites would like to create unique user profiles so that visitors are not required to enter the same information each time they visit a site. The service at the 800 number for Peet's Coffee (800-999-2132) would be an example of the convenience that this approach to service offers. When a customer calls Peet's the first time, the operator collects your name, shipping address, credit card number, and your preference for coffee. The next time a call is received, all the customer needs to give is the last name and the coffee order. The operator verifies your address and credit card and takes the order - all in less than one minute.
The problem with attempting this level of customer service on the Web is that, if the Web is static, how can a site provide extended service. The easy answer is to collect information about each user and store this information in a location which can be queried. Thus, for each visitor, a Web site can record the IP address, the parts of the site that have been visited, credit card numbers supplied as a part of an order, e-mail name, etc. This data will then be used to create a file on the browser's disk which will be examined each time that the browser returns to the site. While the cookie specification limits the amount of data that can be stored (4KB storage/cookie, a total of 300 cookies, and a limit of 20 cookies per server according to the Netscape specification), this data can provide as in-depth profile of everyone who visits a site. Additionally, the cookie standard states that only sites which create the cookie file may access it. However, cookie files are ASCII data which have a standard location and can therefore be read by anyone with access to the browser's computer.
For individuals who do not wish this amount of data collected, there are two methods to stop the use of cookies. The first is to turn on the notification switch which tells the browser to notify you if a cookie request is sent. In Navigator, this switch is found under the Network Preferences menu which is located in the Options Toolbar. The flag can be set by selecting the Protocol cardstack and checking the box shown.
Full-sized image, 687 X 479 Pixels
After this switch has been set, each time that a cookie request is sent, the browser will "beep" and give the option to accept or reject the cookie. At sites which set multiple cookies, this can become very annoying. If a group's Web site is hosted at a local Internet Service Provider (ISP), the ISP site manager sets the cookie mechanism so that individual Web sites on the provider's computers cannot control cookie requests.
The second method to avoid having cookie data used is to download a cookie killer program. The Macintosh program is Cookie Monster which is available at any number of software download sites. Similar programs are available for Windows and UNIX browsers. These programs can be run at any time to delete a cookie file that has been placed upon a brower's computer system.
... and then came Java: The Web has evolved from a stateless connection to one which uses a semaphore system to provide a memory. The next step in the evolution of Web communication was the creation of programs which could be shared between client and server. These first appeared in the form of Java which since has been augmented by ActiveX.
The original specification for Java defined the language as a "simple, object-oriented, distributed, interpreted, robust, secure, architecture-neutral, portable, high- performance, multithreaded, and dynamic language." Using this paradigm, programs (applets) are downloaded as a part of a Web page and the resulting program is run on the client machine. Examples of Java range from flashing banner advertisements to interactive games.
There have been several "urban legends" which have been spread concerning rogue Java applets. These have ranged from stories of doomsday viruses to prank programs. Java is not a fully functional programming language, however, applets can access the common files on a client computer. For this reason, many Information Systems managers feel that while Java and ActiveX can be used to add spice to a Web site, they can also function as a Trojan Horse3. As a result, many organizations are beginning to limit a browser's ability to download Java-enriched pages from the Web.
Event 3 - The Audience:
The final surprise encountered in publishing a WWW journal was paraphrased accurately in the movie A Field of Dreams; "Build it, and they will come." In the Web publishing world, they not only come, but they can have a very interesting set of questions and requests for the Web site administrator.
The WWW is an open system. While most groups that establish Web sites have an intellectual understanding of this fact (and indeed count on it to reach millions of new "customers"), this is a concept that cannot be fully appreciated until one runs a site. Anyone in the world can access your page, view the contents, and send questions to your attention. Frequently, the NetSci site receives requests from high school students who want someone to (ghost)write reports for them. The site also receives requests for more esoteric information.
A recent article in the New York Times concerning the space shuttle Columbia mentions an experiment which involves unreeling 12.5 miles of cable. What are they up to and what is the purpose of the experiment? I would appreciate receiving any information you can furnish. Thank you.
If possible, I would like some assistance in determining the Total Maximum Daily Load (TMDL) to the southern part of the San Francisco Bay, South of the Dumbarton Bridge. Any assistance would be greatly appreciated.
I am in need of a source of papaverine, 30 mg capsules. Can you supply these??
what causes matter to decompose???
As a side note, does anyone know what the minimum amount of ionic copper, and ionic nickel are required for a healthy salt marsh type environment? Additionally, I am curious about what specific proteins or enzymes require copper, or nickel as co-factors. Also, I would like to know if there is any analytical techniques that can differentiate among ionic, chelated, complexed, and insoluble forms of copper and nickel. Is there any existing analytical equipment that can detect these species at the picogram/liter level? Can someone give me some information on laser polarimetry, and its applicability in the above case?
In each case, since the Web is an open forum, the editors at NetSci have attempted to answer each question that pertains to the content of this site or provide links to sites that may provide information. NetSci and many other Web sites attempt to "talk" with their readers for many reasons. The primary one however is that without feedback from the readership, e-zines cannot tailor their content to the level of their readers.
Open Issues
The potential of this media for the open communication of scientific information is in its infancy. However, there are a number of issues to which answers are currently unavailable. Given the speed with which the Web evolves, solutions will be created either with or without the active input of the scientific community. Discussion points highlighted by the traffic at NetSci are listed below. Readers are invited to comment.
Bandwidth: The rush to the Web is unparalleled in the history of communications. A vehicle which was originally designed to facilitate transfer of large data files between a limited number of high energy physics laboratories is now being used to broadcast radio, news, interactive chat, and more. The World-Wide Web has become known as the World-Wide Wait.
If you do not like the golden oldies playing on your local radio stations, you use the Real Audio Player to dial-in any one of 250 stations on your web browser and listen to the same music from cities like Las Vegas. If you are interested in using a delayed edition of CNN as a screen saver, Point Cast offers Windows users the ability to download and view audio/video in real time.
Many experts have stated that the Internet standard, as currently designed, will die under the load which is placed upon the backbone on a daily basis. Will ATM, Frame Relay, and other emerging standards answer the problem? If so, who will set the standards and policies for the new media? Will more laws such as the Communications Decency Act be enacted to define and control what is placed on the Web? What role will the scientific research establishment play in shaping the future of the Web? And, given the recent decision to provide new communication channels to the current members of the communications industry at no charge, who will pay for the future expansion of the technology?
Content: Web-based documents can include text, pictures, "pickable" 2D diagrams/graphs, interactive 3D graphics, sound, animation, threaded interactive discussions, "live" documents, corporate databases, and more. All of these documents can be dynamically interconnected, placed in relational searching systems, and distributed world-wide instantly. Further, every document on the Web (Internet and intranet) can be interactively searched by keyword, substructure, data element, ...
Contrasted to this power to manage information, publication costs are significantly reduced compared to printed media so that the barrier to entry for a publication has been eliminated. Articles can appear in a fraction of the time required for traditional avenues. In addition, feedback to the author can be instantaneous compared to the current peer-review system. And, more importantly, there is finally the possibility of a single interface standard for the spectrum of scientific software and publication systems. There are thus many compelling reasons to embrace the Web. However, a lot of material placed on the Web is junk. How will the scientific community oversee the content of its portion of this media without forcing it to adopt standards found in print media; thus disabling many of the strengths found on the Web? Why are more expectations and demands, as discussed above, placed upon Web publications relative to printed media?
Peer Review: The quality of the science within all disciplines has been a "self- policed" activity. While there have been many documented failures of this approach, it has served the scientific community well for several decades. However, is there a place in the scientific community for non-peer journals? If so, what is the content for this type of publication? How does the publication establish credibility? What is the standard for material presented in such a journal? What is the future for peer reviewed work? Given the current difficulty in funding electronic media from both a financial and content point of view, what are possible scenarios for the future use of this medium?
Electronic publications facilitate a freedom of information exchange. However, the question arises, given the fluid nature of this media, who owns the work presented? And who will gain the financial benefit from its publication (sale)? Currently, peer reviewed journals require that the rights to original work be signed over to the journal as a prerequisite to publication. In order to secure publication, journals also charge a per page fee. Further, once the data has been assigned, it can then be resold via search services. As an example, the American Chemical Society currently publishes 122,000 pages of scientific information per year4. According to figures from the ACS 1995 Annual Report, the revenues generated from the Information Services were $197,632,000.
In a recent examination of several aspects of electronic publishing, Stephen Heller touched on the impact of the current economic environment on publications from academic and government laboratories5. It can be argued that, rather than impacting academic research, the Web will make a more significant impact on the manner in which scientific research is made available from corporate research groups.
New lead discovery is an expensive undertaking. According to current estimates, the pharmaceutical industry currently invests over $281 million per year in research6. Given that:
- research is expensive,
- many organizations have focused attention on making discovery more productive, and
- the publication and dissemination of research in the form of papers and data is a very profitable business
one approach to recouping research costs would be for the larger pharmaceutical companies to establish a Web-based publishing company which would serve as the single source for articles from their scientists. Since the Web offers rapid publication, full searching, world-wide distribution, and low entry costs, a consortia of the larger companies could make a significant impact on the manner in which scientific material is published. Many individual companies have already initiated this approach to publishing by creating Intranets. If the past development of the Internet and the WWW are an indicator, the next step in the deployment of corporate technology is to link these Intranets into a multi-corporate publishing system. Is the scientific community willing to move to new publishing methods? Or is it willing to continue the current model for the distribution of their work and resulting fees associated with it? Do either of these models limit the availability of a free and open exchange of ideas?
A secondary aspect of electronic peer review that can be addressed through the use of Web technology is the manner in which young scientists learn the publishing and peer review processes. Electronic publications offer rising professionals an opportunity to prepare and present research information in a unique environment. On-line peer review of articles can augment traditional publishing venues and provide a global classroom for discussion of results and the way in which they are presented. Is this approach to scientific training acceptable to the community?
There are many possibilities for the use of the Web in science. Unfortunately, there are several conflicts which cloud the future for this media.
- Scientific Web publishing is dominated by groups promoting "value-
added" materials for their printed journals.
- Given the current culture for the Web, it is difficult/impossible to find
adequate funding mechanisms for Web-based journals.
- As opposed to print-based publications, it is difficult to solicit content
for Web-based journals.
- For each journal "reader" (defined as one who reads, comments, and
critiques the journal), there are several "lurkers" (one who only looks and wishes to
remain completely anonymous). Since this is an evolving media, lurkers make it difficult
to develop readership profiles and evolve the content of the publication.
Will the research community be willing to fund the Scientific Web, or will sites continue to rely on the author's and editor's ability to donate their time and money to maintaining them? This section has attempted to raise several questions about the future of the Web. What are your opinions?
Web Tools
There are an increasing number of chemistry oriented tools available for the Web which address three general areas. This listing will be updated as new Web sites are brought online.
- Structure drawing
- Structure viewing
- Information searchers
Web-oriented structure drawing programs are considered to be those which provide the ability to output a gif/jpeg formatted file. Current programs which provide this level of functionality include:
- ChemDraw: CambridgeSoft Corporation.
- ChemWeb: Softshell, International
- ISIS Draw: MDL Information Systems
Web-oriented structure viewing programs are those which provide the ability to view 2D and 3D structures on the Web. In general, all of the current programs evolved from RasMol, the molecular visualization program developed by Roger Sayle.
RasMol was developed to visualize and generate publication quality images of proteins, nucleic acids and small molecules. It was developed at the University of Edinburgh's Biocomputing Research Unit and the Biomolecular Structures Group at Glaxo Research and Development, Greenford, UK.
RasMol reads several formats of molecular co-ordinate files and interactively displays the molecule using a variety of color schemes and representations. Molecules may be rotated, translated, zoomed, z-clipped (slabbed) interactively. Rendered images may be written out in a variety of formats including both raster and vector PostScript, GIF, PPM, BMP, PICT, Sun rasterfile or as a MolScript input script or Kinemage. RasMol can be installed as a Browser helper application so that it is activated each time that a molecular data file is accessed.
Other programs which provide this level of functionality include:
-
ChemScape Chime (MDL Information Systems) is a
browser plugin that implements RasMol directly within the browser window. The product
is available free of charge for MacOS, Windows, and UNIX.
-
ChemSymphony (Cherwell Scientific) is a
platform-independent set of interactive JAVA applets that allows 3-D molecular
structures to be easily incorporated into HTML documents. The system understands
most of the common file formats. Structures can be manipulated in real time, rendered
in a variety of styles, and then edited by the user. Unlike traditional viewers which run
as helper applications under the browser, ChemSymphony is a set of applets which are
downloaded when a page is viewed. Thus, the Web site must subscribe to ChemSymphony
and imbed the applet commands in the HTML of the page being served in order to utilize
these features.
-
CS ChemOffice Net(CambridgeSoft) is a
subset of CS ChemOffice Pro which will allow viewing of the chemical
structures and molecular models available at the ChemFinder WebServer and at other
sites on the Net. Since CS ChemOffice Net is a suite that contains CS ChemDraw Net and
CS Chem3D Net, it combines features of a viewer and a Net-based structure editor.
-
The Molecular Inventor (Tripos, Inc.) is a
Netscape Navigator plugin which provides interactive 3D molecular visualization for
Silicon Graphics Workstations. The plugin is distributed free of charge as an unsupported
product. Tripos also provides Web-based products which interface their product line
through the Discovery.Net Club - a fee-based service.
Finally, there are several information searchers and data sources which have been given Web interfaces. These include:
-
ChemFinder (CambridgeSoft): interface to
chemical structures and data on the Web. Queries are created using CS ChemOffice Net
(or Pro).
-
ChemWeb (A joint venture between MDL and
Elsevier): is a fee-based collection of databases, information servers, and software.
- Discovery.Net (Tripos): is a fee-based site which offers a collection of down-loadable SGI-based CADD tools for Tripos' modeling and database programs.
References
-
"Facing Early Losses, Some Web Publishers Begin to Pull the Plug", Don Clark, The
Wall Street Journal, Tuesday, January 14, 1997.
-
"More Publishers Charging for Web Services", Partick M. Reilly, The Wall Street
Journal, Wednesday, May 8, 1996
-
"JavaMan. ActiveXecutables. Are they friend or foe?" Datamation, January
1997, 12.
-
Chemical & Engineering News, July 22, 1996, page 50.
-
Heller, Stephen R.; "Chemistry on the Internet - The Road to Everywhere and Nowhere"
J. Chem. Inf. Comput. Sci., 1996, 36, 205-213.
- "The Cost of Innovation in the Pharmaceutical Industry", DiMasi, J.A.; Hansen, R.W.; Grabowski, H.G.; Lasagna, L., J. Health Econ., 1991, 10, 107.
NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:
- Network Science Corporation
- 4411 Connecticut Avenue NW, STE 514
- Washington, DC 20008
- Tel: (828) 817-9811
- E-mail: TheEditors@netsci.org
- Website Hosted by Total Choice

