Data Mining at the Canadian Astronomy Data Centre

Patrick Dowler (NRC/HIA/CADC), David Schade (NRC/HIA/CADC), Randy Zingle (NRC/HIA/CADC), Daniel Durand (NRC/HIA/CADC), Severin Gaudet (NRC/HIA/CADC)

Abstract:

The CADC has undertaken an innovative and ambitious project to develop a data-mining system which will enable astronomers to search and analyze the vast amount of available scientific data in a structured and efficient manner. The data-mining project involves two new and excting avenues of research. The first is the development of a science archive which stores both pixel data and scientific results in a highly cross-referenced database which supports ad-hoc querying by users. This archive is primarily an extragalactic object archive based on the results of large collaborations, surveys, and other major astronomical projects. Second, we are developing a multi-teir server system to support efficient exploration, querying, and analysis of the science archive content, distributed processing of pixel data to create new scientific results, and (eventually) uniform access to other information services (external sites which provide interesting astronomical data, catalogs, preprints, electronic publications, etc).

The central concept in the CADC data-mining architecture is that the user specifies the information they require and the data-mining system acquires all the available information that satisfies the requirements. The acquisition of information may involve queries to the science archive, access to external information services, and server-side or client-side processing of raw data or intermediate results. We are developing client software to aid the users interaction with the data mining system which supports the building of information requests, receives and organizes the requested information, and provides a variety of visualization and analysis tools to aid in interpreting the results. We also intend to support third party tools through a published interface that uses XML to format both information requests and the results. This XML interface is made up of two parts: a DTD which specifies a generic scientific querying language and a DTD which specifies a generic scientific data structure.



Web User
9/20/1999