CFHT archive manual - overview

Current Status

The Canada-France-Hawaii Telescope employs a system which archives a copy of every telescope exposure. Currently, two copies of each piece of archive media are made; one which is kept on site in Hawaii and another which is sent to the Canadian Astronomy Data Center (CADC).

Data is copied from the summit to the CFHT headquarters in Waimea where most of the archiving processes occur. Data is first processed in a number of ways for the purposes of screening and cataloging, then it is written to archive media and copied to an online storage system for the duration of a semester.

The archiving system includes the software and resources for distributing data acquired in the queue observing mode. This involves bulk handling and organization of data, elixir processing and media handling.

The software system generally consists of a pipeline of daemons which pass data to each other and perform a process at each step. Each daemon employs handlers which make the various software and hardware interactions generic. The daemons are modular and the pipeline extendable so new processes and pipelines can be added or removed rapidly without interfering with the existing data path. All the software utilized in the CFHT archiving system has been written in house.

The system currently employs a Sun ultra10, a 40 tape DLT autoloader, four linux fileservers with a combined capacity of 3.1 terabytes and numerous individual tape drives. 

Historical Perspective

In the early history of CFHT, the images taken at the telescope were archived manually on round magnetic tapes. In 1989, it was decided that every single frame taken, regardless of its content, should be automatically archived by a transparent process. To achieve this goal, optical disk drives were purchased from Delta Microsystem. Each optical disk (Maxtor disk) had the capacity of 800 megabytes (400 Mb/side). Rick McGonegal and Bob Link from the software group implemented the first archiving software and since then the French cooperant, under the supervision of Bob Link, was to maintain and improve what became known as the "CFHT archiving system."

With the spectacular increase in the size of the CCDs, the Maxtor optical disks soon proved to be unable to cope with the huge quantity of data taken every night (up to 800Mb with 2Kx2K CCDs). In the middle of 1992, a new generation optical disk drive featuring a capacity 3GB per disk side, was purchased from Sony and accommodated in the archive pipeline. It was also decided that the CADC (Canadian Astronomy Data Centre) in Victoria would archive the CFHT data and the optical disks were to be shipped to them when full. In order to keep a back-up copy as a permanent record for CFHT, the archive pipeline was also modified to mirror the archived data onto an Exabyte tape. Simultaneously, some miscellaneous features were added, such as an automatic image quality evaluation and an automatic header extraction process.

The supporting software however, still based on the original implementation with a lot of successive changes, was becoming weak and unstructured. Moreover, the Sony optical disk drive was very expensive to maintain and soon it was decided that a maintenance contract could not be purchased. Concerns were then raised about what was going to happen in case of a hardware failure as the current software was assuming the Sony optical disk as the master archive media and the whole implementation was very hardware dependent. Similarly, C-Shell inherent weaknesses such as the poor interruption control made the administration of the whole pipeline delicate at best.

It was therefore decided in late fall 1993 to launch a project to reorganize the archiving system. After studies, it became obvious that a completely new architecture was required to solve the previous problems. The resulting archive system, revised primarily by Jean-Pierre Veran, was more modular and robust and included inline processes for image quality analysis and automatic header cataloging. It was also designed to utilize magnetic tape exclusively, initially exabyte and then DLT.

With the introduction of ever larger CCD mosaics and the advent of queue observing, the amount of data generated each night began to overtax the archiving system and in 2001 yet another revision was undertaken, this time by Kanoa Withington. In addition to a host system upgrade this revision introduced a DLT library and enabled the system to change media automatically, reducing user intervention and downtime. Since the previous pipeline was designed around being shut down and restarted with each change of media, the new pipeline included facilities for self monitoring and maintenance so that the daemons could run indefinitely. In addition to these revisions was the introduction of the online storage system that allowed the archive system to keep a semester's worth of data on a distributed collection of network filesystems. Simultaneously, the development of the data distribution system was initiated which handles the preparation and processing of data acquired in the queue observing mode for distribution to the investigators requesting the observations.

Future Developments

With the introduction of MegaCam, the peak load on the archive system will likely double and will certainly require further modifications. Fortunately the archive system is largely device independent and new hardware can be introduced as the technology becomes appropriate and available. Online storage units can be added dynamically as they become available and there is room for further parralellization in the pipeline should it become necessary.

The CADC has begun a project to keep cfh12k data online in DVD jukeboxes and so it is planned that the CFHT archive will in time write data directly to DVD for shipment to the CADC. 


Kanoa

Last modified: Tue Jul 17 14:37:07 HST 2001