CFHT archive manual - distribution 

Data Archiving and Distribution

DADS means "Data Archiving And Distribution" and refers to a project at CFHT to revise the telescope archiving system and to implement a method for distributing data acquired for investigators in the queue mode of observing. This document is an overview of the distribution process and includes information for investigators receiving data through this channel. Information for CFHT staff and administrators on the distribution systems is available here.

The first step in data distribution is the archive of each image taken by the CFHT. Every exposure, even focus frames are archived and cataloged by the archive pipeline in two copies. One copy is kept at CFHT and one is sent to the CADC for public access after the proprietary period. After being archived, the exposure is saved to disk in the cluster of archive fileservers where it is kept until all data for the semester in which it was taken has been distributed. In the course of each semester the CFHT acquires around two terabytes of data.

When a program is reported to be ready for distribution by the queue system and all the associated exposures have been validated by elixir a list of files is generated that include all the exposures taken for a program and all the relevant standard star exposures to be included. The distribution process is then started.

The distribution tools read in the list of files and first query the elixir database for a list of all the master detrend frames that will be used by elixir to process the data. Then for each night on which there are observations a report of weather conditions is generated using the datalogger which includes a numeric table of values in ten minute intervals and a graphical chart of values throughout the night. The exposures taken for the program specifically are then processed in batches with elixir and written to the requested media. Thumbnail images of each exposure are also generated and a copy of the fits header data is separated out for later use. The standard stars are similarly processed and finally the master detrend data is written to tape. Once the writing of the processed data to tape is complete a manifest file is printed that details the location of each file on each tape. Another set of tools is run which generates a set of browsable HTML files that can be viewed by a java enabled web browser. These files, written to a CDROM in ISO9960 format, provide an interface to the weather data, thumbnails, QSO and elixir reports fits headers, and media manifest.

The tapes, printed media manifest and CDROM are the products of a distribution that are mailed to an investigator as fulfillment of their observing request via the QSO system. The internal data from a distribution, excluding the processed science exposures, are archived and kept at CFHT indefinitely. This data permits the DADS team to troubleshoot potential problems with the distributed data and to reproduce the distribution from raw archive data at a later date if necessary.

During the initial stages of requesting an observation via QSO an investigator can make some specific requests about how they would like to receive data. When raw data is specified, both the science data and standard stars are written to tape in unprocessed MEF (Multi Extension Fits) format. The applicable elixir master detrend files that would have been used can also be included in the distribution. In split format, the data is processed with elixir and written to tape as a directory for each exposure with a separate fits file for each CCD. By default the data is distributed as processed MEF files. Currently data is only distributed on DLT tape at DLT7000 (35Gb) density or less. In the future we will allow the selection of alternate media types and retrieval via FTP. Tapes are written in standard tar format using the GNU tar utility in sets of a few gigabytes each.


Kanoa

Last modified: Thur Sept 27 15:31:21 HST 2001