Subscribe | Unsubscribe
It’s TeraGrid Time
Map of TeraGrid supercomputer clusters. Credit: Courtesy of Indiana University, based on illustration by Nicolle Rager Fuller, National Science Foundation.
We’ve just received word that the Data Management (DM) team has been awarded significant resources on the TeraGrid, the National Science Foundation’s supercomputing infrastructure consisting of large clusters of computers located at eleven centers in the US. This award will allow the team to perform its most ambitious test to date as it practices processing the massive amounts of data LSST will produce.
As it carries out its 10-year survey, LSST will produce over 15 terabytes of raw astronomical data each night (30 terabytes processed), resulting in a database catalog of 22 petabytes and an image archive of 100 petabytes.
How much data?
During the LSST design & development phase, the DM group has been developing a software framework and science codes with the scalability and robustness necessary to process this unprecedented data stream.
In order to test the scalability of the software, the LSST Data Management team has performed a series of Data Challenges — targeted demonstrations of the processing software, with each challenge encompassing tasks of incrementally larger scope and complexity building toward the final production code that will be used during operations. Data challenges to date were performed on a fairly modest TeraGrid allocation and on High Performance Computing (HPC) clusters hosted at National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, an LSST Institutional Member.
The DM team is now at the point with the current data challenge, DC3b, where it will process 10 TB of data from an existing astronomical survey and 47 TB of a simulated LSST data set. DC3b will be done in a series of three incrementally more demanding performance tests, resulting in the production of science data products from an archive’s worth of data, at a scale of 15% of the operational DM System. The goals of these tests are to verify the code for correctness and robustness, understand the code’s performance, and to create a large dataset that can be used by astronomers to plan science projects for the LSST.
“Although we use images from previous surveys, our heavy reliance on simulated images drives the need for 1.5 million core hours on the TeraGrid for the next stage to be conducted over the next few months,” comments Tim Axelrod, the LSST DM System Scientist.
In January, 2010, the LSST Data Management project turned in the proposal to the TeraGrid program requesting infrastructure for DM design and development. Several lead scientists and engineers on the DM team developed the proposal under the leadership of NCSA, who have a long history of involvement in the TeraGrid. The period of the allocation is from April, 2010 through March, 2011. The TeraGrid infrastructure allocated will be provided by systems from NCSA, TACC, LONI, and Purdue.
Mike Freemon, Infrastructure Lead for DM and Project Manager at NCSA, says the team’s proposal was awarded their full request of TeraGrid resources both CPU hours and data storage: 1.51M Service Units (CPU-hours), 400TB of dual-copy mass storage, and 20TB spinning disk storage.
NCSA has led the effort to provide infrastructure for DC3b, which in addition to the TeraGrid allocation includes contributions from SLAC, SDSC, IN2P3, CalTech, Purdue, and the REDDnet project/Vanderbilt University. This architecture includes data production and archiving capabilities, database scaling test resources, and for the first time, resources to replicate and serve the input and output data to scientific users in the LSST Science Collaborations for validation and experimentation.
And if the TeraGrid proposal had not been successful what were the options? Tim tells us DC3b could be run on a fast PC, but it would take 1.5 million hours — about 200 years!
Suzanne Jacoby, Jeff Kantor, Tim Axelrod and Anna Spitz contributed to this article.
LSST is a public-private partnership. Funding for design and development activity comes from the National Science Foundation, private donations, grants to universities, and in-kind support at Department of Energy laboratories and other LSSTC Institutional Members:
Brookhaven National Laboratory; California Institute of Technology; Carnegie Mellon University; Chile; Cornell University; Drexel University; Google Inc.; Harvard-Smithsonian Center for Astrophysics; Institut de Physique Nucléaire et de Physique des Particules (IN2P3); Johns Hopkins University; Kavli Institute for Particle Astrophysics and Cosmology at Stanford University; Las Cumbres Observatory Global Telescope Network, Inc.; Lawrence Livermore National Laboratory; Los Alamos National Laboratory; National Optical Astronomy Observatory; Princeton University; Purdue University; Research Corporation for Science Advancement; Rutgers University; SLAC National Accelerator Laboratory; Space Telescope Science Institute; The Pennsylvania State University; The University of Arizona; University of California, Davis; University of California, Irvine; University of Illinois at Urbana-Champaign; University of Michigan; University of Pennsylvania; University of Pittsburgh; University of Washington; Vanderbilt University
LSST E-News Team:
LSST E-News is a free email publication of the Large Synoptic Survey Telescope Project. It is for informational purposes only, and the information is subject to change without notice.