Rapid Cross Identifications of Large Astronomical Catalogs

J. Ma (JPL/Caltech), J. Good (JPL/Caltech)T. Handley (JPL/Caltech)

Abstract:

The explosive growth of the data volume of astronomical catalogs not only bring exciting new era to our understanding of the universe, but also brings numerous challenges to computer science in terms of data management and data analysis.

One of the necessities for astronomical catalog generations and scientific research is the cross-identification between catalogs in terms of proximity. However, the conventional cross-identification algorithms are prohibitively expensive in terms of CPU requirements for large volume of data.

In this paper, we present an O(N) fast algorithm for the cross-identifications of large astronomical catalogs, where N is the number of sources involved.

The algorithm presented has been successfully deployed within JPL/Caltech's Infrared Science Archive (IRSA) for an initial project, NASA's 2MASS (Two Micro All Sky Survey). The algorithm is being utilized to obtain scientific results, that is, cross-identification among different data sets, and within the 2MASS project for quality assurance and source selection.

The 2MASS date set will be very large, eventually exceeding 6x10^8 point sources provided through incremental data releases over several years.

The algorithm has the performance and scalability to handle the anticipated one billion sources in the final release of 2MASS.

A version of the fast algorithm has also been implemented on parallel computer HP exemplar under the auspice of NPACI (National Partnership for Advanced Computational Infrastructure) Digital Sky Project.



Web User
9/20/1999