The MDL SCREEN Approach to the Problem of Managing Data
From Enterprise-Wide High Throughput Screening (HTS)
B. E. Bauer
Screening Technologies Group
MDL Information Systems, Inc.
14600 Catalina St.
San Leandro, CA 94577 USA
Tel: +1-510-895-1313
![]()
http://www.netsci.org/Science/Screening/feature02.html
Summary
A distributed information system strategy is presented for managing the process of screening and the large volumes of data expected from enterprise-wide High Throughput Screening (HTS) programs. Project-level HTS is managed on a dedicated, laboratory-based server (the WORKGROUP), in which all aspects of HTS data management, inventory management, and automation integration are performed. The individual WORKGROUPs are linked together though a central screening data server that integrates summary data from across the organization and coordinates workflow-based information shared across all WORKGROUPs. This architecture, as implemented within MDL SCREEN, gives the greatest flexibility for location, performance, data access, and growth for HTS.
The Problem
Screening Information Management Systems (SIMS) for High
Throughput Screening (HTS) are challenged with managing a complex,
multiple component process. As compared to conventional screening,
HTS deals with a more diverse input of samples, each with its
unique set of supplied data and tracking issues, and produces
testing results that have new components, such as plate
identifiers, that need to be correlated (Figure 1).
Figure 1: Sample and Information Flow Within HTS
The SIMS participates in all facets of the HTS workflow
(Figure 2): As an active partner in the laboratory with the
screening scientist, the SIMS is expected to perform in near
real-time managing a wide variety of different information types
and assay results.
Figure 2: The HTS Workflow
HTS, and therefore the data and process the SIMS manages, work
in conjunction with other disciplines within the Agricultural
Chemical Research Environment (Figure 3). Data must be
passed between all components in a timely manner to avoid
bottlenecks that slow down the HTS process.
Figure 3: Interrelationship Between Consumers and Producers of
Information in the Agricultural Chemical Environment
The problems of timely coordination of samples and information
is compounded in a large, international organization (Figure
4). Samples and results must now be tracked between separate
facilities, increasingly in multiple countries. The problem of
tracking is compounded by different time zones, the transit times
for samples, and collection and integration of the resulting
screening data.
Figure 4: Organizational Components in a Large, International
Organization
From the analyst's perspective, all the screening results need to be available no matter where the work was performed. Summary data, which usually takes the form of a single value averaged for replicate experiments but can also be the "actives," is acceptable as a starting point for data analysis as long as the full body of available data is accessible.
Distributed Information Management Strategy
One approach to distributed systems is to organize the SIMS as
workgroups coordinated by a central server (Figure 5). Each
workgroup is built around its own dedicated server that handles the
data and inventory management tasks associated with the HTS
workflow for a specific group of the screening scientists.
Additional screening laboratories would have their own dedicated
server. This allows each group to operate independently of each
other and affords all the greatest possible compute performance
while minimizing reliance on networks.
Figure 5: Screening Workgroups
The central SCREEN server acts as a resource coordinator for the individual workgroups that may be located in separate facilities in separate countries (Figure 6). Shared resources include experimental information such as assay protocols and plate layouts. It also includes making inventory data available to the workgroups when plates are prepared centrally. For multiple workgroup operations, the central server can track and coordinate plates through all the workgroups to improve productivity and not loose samples.
This model for a distributed SIMS is also easily scaleable. Additional workgroups can be added without disrupting other elements of the system. The location of the new workgroups is not critical because the communication link between workgroup and central serve does not require maximum possible performance and can perform on existing T1 type (Telephone) communications links.
The central SCREEN server also acts as a data accumulator for
summary results coming from all the screening workgroups (Figure
7). The nature of the summary data is specific for individual
organizations. Because the workgroup and central servers can share
information between them, there is no compelling reason to move all
the data from the workgroup server to the central server. Negative
results from single point data, such as percent inhibition commonly
produced by HTS, are of little value for modeling and data analysis
purposes; a situation that is compounded by the high volume of
negative results expected from properly functioning HTS assays.
This is a viable strategy as long as the data browser is aware that
samples have been screened and results can be obtained on
demand.
Figure 6: Interrelationship Between Workgroups, Central SCREEN
Server and Data Browsers in the Full Distributed SIMS Environment
Figure 7: Sample and Data Flow Between Workgroup and Central
Servers
The central server also functions as a buffer between data
browsers and workgroups (Figure 8). The demands on computer
and network resources are quite different for screening scientists
involved in the process of screening and data browsers looking for
information. By keeping the browsers off the workgroups, data
system performance is greatly improved for the screeners.
Figure 8: Buffering Function of the Central Server
The data browsers also benefit from the use of the central server. Initial queries for data can be made without concern for which workgroup generated the data and the results from multiple workgroups are available. Queries such as screening results for a single sample across all screening programs, especially when the screening activity is internationally distributed, are simplified. Answers can be gained in a timely manner and with reduced risk of information loss.
MDL SCREEN and Distributed High Throughput Screening
MDL recognized the requirements for distributed SIMS for many of the large life science research organizations. MDL SCREEN version 1.0 utilizes an Oracle-based, two-tier, client-server data system and a new technique referred to as Detail-On-Demand that automates the process of navigating between multiple computer systems during data drill-down processes. The workgroups are self contained screening data systems coordinated by a central server consisting of the SCREEN workgroup data model augmented with additional workflow- and process-management capabilities. Each site can tailor its interpretation of summary data and resource requirements to optimize the inventory information distributed to the workgroups and data moved to the central server.
MDL SCREEN has the capacity to handle the expected data volume. The workgroup servers can be configured to manage up to 50 million data points in as little as 16 Gigabytes of disk space on a modern, medium-sized server. The central server employing a high end, possibly multiprocessor-equipped server and RAID disk storage arrays has a data capacity in excess of 109 data points (experimental measurements) per year. This capacity is critical for organizations that are capable of generating 5-20 million data points per year per workgroup and will allow the screening scientists to keep on-line 2 or more years of complete results.
Overall, MDL SCREEN provides a complete solution to enterprise-wide HTS. The workgroup approach provides the best balance between maximized performance and reliability for the screening scientists, and information availability across the entire organization. The entire architecture is scaleable which facilitates growth of the screening effort with minimum disruption of other components of the SIMS.
NetSci, ISSN 1092-7360, is published by Network Science Corporation. Except where expressly stated, content at this site is copyright (© 1995 - 2010) by Network Science Corporation and is for your personal use only. No redistribution is allowed without written permission from Network Science Corporation. This web site is managed by:
- Network Science Corporation
- 4411 Connecticut Avenue NW, STE 514
- Washington, DC 20008
- Tel: (828) 817-9811
- E-mail: TheEditors@netsci.org
- Website Hosted by Total Choice