Subscribe | Unsubscribe



October 2012  •  Volume 5 Number 2

The LSST Data Avalanche: Astroinformatics Rises to the Challenge

Unlike previous articles in this series, this E-News article is not based on a chapter from the LSST Science Book. LSST formed the Informatics and Statistics Science Collaboration in 2009. Kirk D. Borne is the Chair of the collaboration. The members of the Collaboration are listed at the end of the article.

LSST opens the world of data-intensive astronomy, requiring skills in the area of computational and data sciences in order to maximize the opportunities for knowledge. (Graphic: Emily Acosta, LSST)  

Every night for 10 years LSST will obtain approximately 2,000 images of the sky with its 3-billion pixel camera. This corresponds to about 15 terabytes of data daily for 10 years. As the survey progresses, researchers will have hundreds of petabytes of data to access, analyze, and interpret. Adjectives such as “flood,” “avalanche,” “fire hose,” and “big data” are used to describe this onslaught of data. One of the major questions facing the LSST scientists and engineers is how to handle the large and complex data collection that LSST will generate. The Informatics and Statistics Science Collaboration is researching the science and engineering of this challenge. To keep up with the flood of data, researchers will need to develop more powerful algorithms, methodologies, and approaches. Rising to the challenge will enable scientists to undertake new modes of discovery, where data-driven, data-rich science goes beyond traditional science.

This new “big data” isn’t limited to large astronomy surveys. The growth of data volumes in nearly all scientific disciplines, business sectors, and government is swamping our ability to gain useful insights and understanding from the data in an efficient or effective way. How are we going to access, retrieve, interpret, analyze, mine, integrate, and visualize massive quantities of data? The answer is the informatics approach: the use of digital data, information, and related services for research and knowledge generation [D.N. Baker, EOS 89 (2008)]. Researchers will use the discipline of informatics, or more specifically, astroinformatics, to organize, explore, visualize, and mine the LSST data for new astronomical discoveries. A data-driven revolution in science is underway.

Astroinformatics encompasses a set of naturally related specialties including data organization, data descriptions, astronomical classification taxonomies, astronomical concept ontologies, data mining, visualization, and statistics. The accompanying cyberinfrastructure includes databases, virtual observatories (distributed data), high-performance computing (clusters and petascale machines), distributed computing (the Grid, the Cloud, and peer-to-peer networks), intelligent search and discovery tools, and innovative visualization environments.

Astroinformatics will allow data integration, data mining, and knowledge discovery across heterogeneous massive data collections. It will allow re-use and re-purposing of archival data for new projects, integration of data within different contexts, literature linkages, classification of objects, quantitative scoring of classifications, discovery of “interesting” objects and new classes of object, development of an astronomical “genome,” and employment of data in educational settings among other uses. According to Borne, “We are not just using more data; qualitatively different methods for doing science with big data are required. It’s a revolutionary new way to do science.”

Borne sees a wide variety of data mining and statistics use cases for the LSST data collection. These include:

  • Provide rapid probabilistic classifications for millions of events each night;
  • Find new multivariate correlations and associations in high-dimension (dimensions around 1,000) astronomical attribute parameter space;
  • Discover voids in these high dimensional parameter spaces, for example, period gaps;
  • Discover new and exotic classes and subclasses of objects and astrophysical processes, along with new properties of known classes;
  • Discover new and improved rules for classifying known classes of objects;
  • Identify novel, unexpected behavior in the time domain from time series data;
  • Hypothesis testing – verify existing (or generate new) astronomical hypotheses with strong statistical confidence, using millions of training samples;
  • Serendipity – discover the rare one-in-a-billion type of objects through outlier detection, which Borne calls “Surprise Discovery” algorithms;
  • Quality Assurance – identify data pipeline processing errors through deviation detection.

The landscape of astronomical research is changing rapidly. With powerful statistical and informatics methods and the advent of large surveys and massive data collections, astronomers will be able to meet the massive data-to-knowledge challenges of LSST and to discover the unknown unknowns at an unprecedented rate.

For more information:

K.D. Borne (2006) Data-Driven Discovery through e-Science Technologies. 2nd IEEE International Conference on Space Mission Challenges for Information Technology (SMC-IT’06).

K.D. Borne and T. Eastman (2006) Collaborative Knowledge Sharing for E-Science. AAAI Workshop on the Semantic Web for Collaborative Knowledge Acquisition, 104-105.

K.D. Borne (2010) Astroinformatics: Data-Oriented Astronomy Research and Education. Journal of Earth Science Informatics, 3, 5-17.

R. McKercher and S. Jacoby (2011). LSST Key Player in Sea Change of Data Availability E-News 4 (2).

Informatics and Statistics Science Collaboration

  • Ethan Anderes
  • Jogesh Babu
  • Jacek Becla
  • Kirk Borne
  • Robert Brunner
  • Tamas Budavari
  • Douglas Burke
  • Nathaniel Butler
  • David Chernoff
  • Jim Cordes
  • George Djorgovski
  • Eric Feigelson
  • Peter Freeman
  • Christopher Genovese
  • Matthew Graham
  • Alexander Gray
  • Carlo Graziani
  • Jon Hakkila
  • Zeljko Ivezic
  • Vinay L. Kashyap
  • Kevin Knuth
  • Simon Krughoff
  • Tom Loredo
  • Ashish Mahabal
  • Bruce McCollum
  • Chris Miller
  • Misha Pesenson
  • Andrew Ptak
  • Joseph Richards
  • Jeffrey Scargle
  • Chad Schafer
  • Sam Schmidt
  • Lior Shamir
  • Aneta Siemiginowska
  • Keivan Stassun
  • John Wallin
  • Martin Weinberg
  • Roy Williams
  • Robert Wolpert
  • Michael Woodroofe

Article written by Anna H. Spitz and Kirk D. Borne


LSST is a public-private partnership. Funding for design and development activity comes from the National Science Foundation, private donations, grants to universities, and in-kind support at Department of Energy laboratories and other LSSTC Institutional Members:

Adler Planetarium; Argonne National Laboratory; Brookhaven National Laboratory (BNL); California Institute of Technology; Carnegie Mellon University; Chile; Cornell University; Drexel University; Fermi National Accelerator Laboratory; George Mason University; Google, Inc.; Harvard-Smithsonian Center for Astrophysics; Institut de Physique Nucléaire et de Physique des Particules (IN2P3); Johns Hopkins University; Kavli Institute for Particle Astrophysics and Cosmology (KIPAC) – Stanford University; Las Cumbres Observatory Global Telescope Network, Inc.; Lawrence Livermore National Laboratory (LLNL); Los Alamos National Laboratory (LANL); National Optical Astronomy Observatory; National Radio Astronomy Observatory; Princeton University; Purdue University; Research Corporation for Science Advancement; Rutgers University; SLAC National Accelerator Laboratory; Space Telescope Science Institute; Texas A & M University; The Pennsylvania State University; The University of Arizona; University of California at Davis; University of California at Irvine; University of Illinois at Urbana-Champaign; University of Michigan; University of Pennsylvania; University of Pittsburgh; University of Washington; Vanderbilt University

LSST E-News Team:

  • Suzanne Jacoby (Editor-in-Chief)
  • Anna Spitz (Writer at Large)
  • Robert McKercher (Staff Writer)
  • Mark Newhouse (Design & Production: Web)
  • Emily Acosta (Design & Production: PDF/Print)
  • Sidney Wolff (Editorial Consultant)
  • Additional contributors as noted

LSST E-News is a free email publication of the Large Synoptic Survey Telescope Project. It is for informational purposes only, and the information is subject to change without notice.

Subscribe | Unsubscribe

Copyright © 2012 LSST Corp., Tucson, AZ •