Opening a Window of Discovery on the Dynamic Universe

Data Management

An illustration of the innovative "data mining sphere" developed by the LSST database team.

The speed with which LSST maps the southern sky and the depth to which it can see will produce an enormous volume of data, about 15 terabytes (TB), or 15 trillion bytes of raw data per night. The total amount of data collected over the ten years of operation will be 60 petabytes (PB), and processing this data will produce a 15 PB catalog database. The total data volume after processing will be several hundred PB, processed using about 150 TFLOPS (trillion floating point operations per second) of computing power for the first Data Release, increasing to 950 TFLOPS by Data Release 11 at the end of the ten-year survey. Processing such a large volume of data, converting the raw images into a faithful representation of the universe, implementing automated data quality assessment and automated discovery of moving or transient sources, and archiving the results in useful form for a broad community of users is a major challenge.

The data management system is architected in three layers: an infrastructure layer consisting of the computing, storage, and networking hardware and system software; a middleware layer, which handles distributed processing, data access, the user interface, and system operations services; and an applications layer, which includes the data pipelines and products and the science data archives. The applications layer is organized around the data products being produced.

The nightly pipelines are based on image subtraction, a process that highlights differences between two exposures of the same field, and are designed to rapidly detect interesting transient events in the image stream and send out alerts to the community within 60 seconds of completing the image readout. 

The data release pipelines, in contrast, are intended to produce the most completely analyzed data products of the survey, in particular those that measure very faint objects and cover long time scales. Each year, a new run processes the entire available survey data set, cumulatively increasing the depth and completeness of the available data. The data release pipelines consume most of the computing power of the data management system. 

The calibration products pipeline produces the wide variety of calibration data required by the other pipelines. 

All of these pipelines are architected to operate on very small and medium sized platforms as well as make efficient use of Linux clusters with thousands of nodes.

Although the data management facilities will have substantial computing power (the 150 TFLOPS required for processing the first Data Release equals the world's most powerful computer in 2004), if current trends continue, they won't even qualify for the top 500 list when LSST sees its first light through the telescope. Hence, while LSST is making a novel use of advances in information technology, it is not taking the risk of pushing the expected technology to the limit.

 
Image Credit: 
LSST

Financial support for LSST comes from the National Science Foundation (NSF) through Cooperative Agreement No. 1258333, the Department of Energy (DOE) Office of Science under Contract No. DE-AC02-76SF00515, and private funding raised by the LSST Corporation. The NSF-funded LSST Project Office for construction was established as an operating center under management of the Association of Universities for Research in Astronomy (AURA).  The DOE-funded effort to build the LSST camera is managed by the SLAC National Accelerator Laboratory (SLAC). 


Contact   |   We are Hiring   |   Business with LSST

Admin Login

Back to Top