CFHT archive manual - overview

Current Status

The Canada France Hawaii telescope employs a system which archives a copy of every telescope exposure. Currently, two copies of each piece of archive media are made; one which is kept on site in Hawaii and another which is sent to the
Canadian Astronomy Data Center (CADC).

Data is copied from the summit to the CFHT headquarters in Waimea where most of the archiving processes occur. Data is processed in a number of ways for the purpooses of screening and cataloging and then both written to archive media and copied to an online storage system for the duration of a semester.

The archiving system includes the software and resources for distributing data acquired in the queue observing mode. This involves bulk handling and organisation of data, elixir processing and media handling.

The software system generally consists of a pipeline of daemons that pass data to each other performing some process at each step. Each daemon employs hadlers which make the various software and hardware interactions generic. The daemons are modualar and the pipeline extensible so new processes and pipelines can be added or removed rapidly and without interfering with the existing data path. All the software utilised in the CFHT archiving system has been written in house.

The system currently employs a Sun ultra10, a 40 tape DLT autoloader, three 800 gigabyte fileservers and numerous individual tape drives.

Historical Perspective

In the early history of CFHT, the images taken at the telescope were archived more or less manually on round magnetic tapes. In 1989, it was decided that every single frame taken, regardless of its content, should be automatically archived by a transparent process. To achieve this goal, optical disk drives were purchased from Delta Microsystem. Each optical disk (Maxtor disk) had the capacity of 800 megabytes (400 Mb/side). Rick McGonegal and Bob Link from the software group implemented the first archiving software and since then, it was the work of the French cooperant to maintain and improve what became known as the ``CFHT archiving system'' under the supervision of Bob Link.

With the spectacular increase in the size of the CCDs, the Maxtor optical disks soon proved to be unable to face the huge quantity of data taken every night (up to 800Mb with 2Kx2K CCDs). In the middle of 1992, a new generation optical disk drive, featuring a capacity 3Gb per disk side, was purchased from Sony and was accommodated in the archive pipeline. It was also decided that the CADC (Canadian Astronomy Data Centre) in Victoria would archive the CFHT data and that the optical disks were to be shipped to them when full. To be able to keep a back-up copy as a permanent record for CFHT, the archive pipeline was also modified to mirror the archived data onto an Exabyte tape. Simultaneously, some miscellaneous features were added, such as an automatic image quality evalution and an automatic header extraction process.

The supporting software however, still based on the original implementation with a lot of successive changes, was becoming weak and unstructured. Moreover, the Sony optical disk drive is a very expensive hardware and soon it was decided that a maintenance contract couldn't be purchased. Concerns then raised about what was going to happen in case of a hardware failure as the current software was assuming the Sony optical disk as the master archive media and the whole implementation was very hardware dependent. Similarly, C-Shell inherent weaknesses such as the poor interruption control made the administration of the whole pipeline delicate at best

Therefore it was decided in late fall 1993 to launch a project to reorganize the archiving system. After studies, it became obvious that a completely new architecture was required to solve the previous problems. The resulting archive system, revised primarily by Jean-Pierre Veran, was more modular and robust and included inline processes for image quality analysis and autmatic header cataloging. It was also desiged to utilise magnetic tape exclusively, intitally exabyte and then DLT.

With the intruduction of ever larger CCD mosaics and the advent of queue observing the amount of data generated each night began to overtax the archiving system and in 2001, yet another revision was undertaken this time by Kanoa Withington. In addition to a host system upgrade this revision introduced a DLT library and enabled the system to change media automatically, reducing user intervention and downtime. Since the previous pipeline was designed around being shut down and restared with each change of media the new pipeline included facilites for self monitoring and maintenance so that the daemons could run indefinetly. In addition to these revisions was the introduction of the online storage system that allowed the archive system to keep a semester's worth of data on a distributed collection of network filesystems. Simultaneously the development of the data distribution system was initiated which handles the preparation and processing of data acquired in the queue observing mode for distribution to the investigators requesting the observations.

Future Developments

With the introduction of MegaCam the peak load on the archive system will likely double and will certainly require further modifications. Forunately the archive system is largely device independant and new hardware can be intruduced as the technology becomes appropriate and available. Online storage units can be added dynamically as they become available and there is room for further parralellization in the pipeline should it become necessary.

The CADC has begun a project to keep cfh12k data online in DVD jukeboxes and so it is planned that the CFHT archive will in time write data directly to DVD for shipment to the CADC.
Kanoa Withington
Last modified: Tue Jul 17 14:37:07 HST 2001