Camera / Telescope & Site / Data Management

Data Products / Facilities / Pipelines / Historical Documents

LSST Data Management

The LSST data management system must

The LSST data management system is composed of a series of processing pipelines, a large archive of images, and a number of catalogs containing the detected astronomical sources and resolved astronomical objects. Underneath these are the software middleware and technology infrastructure that permit the visible elements to work securely and reliably. The processing and data are distributed across multiple computing centers on the observatory mountaintop, in a base facility near the observatory, and at multiple archive centers and data centers.

The results of such a processing run form a data release, which is a static, self-consistent data set for use in performing scientific analysis of LSST data and publication of the results.

Periodically new calibration data products are created, such as bias frames and flat fields that will be used by the other processing functions.

All LSST data must be made available through an interface that utilizes, to the maximum possible extent, community-based standards such as those being developed by the Virtual Observatory.

Overview of Data Management

The rapid cadence of the LSST observing program will produce an enormous volume of data, ~30 TB per night, leading to a total database over the ten years of operations of 60 PB for the raw data, and 30 PB for the catalog database. The total data volume after processing will be several hundred PB, processed using ~150 TFlops of computing power. Processing such a large volume of data, converting the raw images into a faithful representation of the universe, automated data quality assessment, automated discovery of moving or transient sources, and archiving the results in useful form for a broad community of users is a major challenge.

The data management system is configured in three levels: an infrastructure layer consisting of the computing, storage, and networking hardware and system software; a middleware layer, which handles distributed processing, data access, user interface, and system operations services; and an applications layer, which includes the data pipelines and products and the science data archives. The application layer is organized around the data products being produced.

The nightly pipelines are based on image subtraction, and are designed to rapidly detect interesting transient events in the image stream and send out alerts to the community within 30 seconds from completing the image readout. The data release pipelines, in contrast, are intended to produce the most completely analyzed data products of the survey, in particular those that measure very faint objects and cover long time scales. A new run begins each year, processing the entire survey data set that is available. The data release pipelines consume most of the computing power of the data management system. The calibration products pipeline produces the wide variety of calibration data required by the other pipelines. All of these pipelines are architected to make efficient use of linux clusters with thousands of nodes.

There will be both mountain summit and base computing facilities, as well as a central archive facility and multiple data access centers. The data will be transported over existing high-speed optical fiber links from South America to the U.S. Although the data processing center will have substantial computing power (~100 TFlops, equal to the world's most powerful computer in 2004), the continuation of current trends suggests that it will not even qualify for the top 500 list by the time of first light in 2014. Hence, while LSST is making a novel use of advances in information technology, it is not taking the risk of pushing the expected technology to the limit.

Learn more on the following pages: