CFHTLS Data Flow

Defining, obtaining, archiving, pre-processing, calibrating, processing, extracting the sources, and distributing the CFHTLS data is a highly complex undertaking. Three separate entities are involved in the CFHTLS global data flow that eventually brings the data to the users:

  • CFHT: data acquisition, pre-processing and calibration
  • Terapix: data stacking, fine astrometric and photometric calibration, source catalogs generation
  • CADC: data archiving and distribution of all CFHTLS products

The following diagram, though primarily focused on describing the details of the CFHT pipeline, shows the complete data flow of the CFHTLS: the black ellipses represent points where the end CFHTLS users can download from CADC the data product represented on that branch. These black ellipses have a time tag ("T = X t") corresponding to the time needed for the particular data set to reach that point from the time it was acquired on the telescope. Each step leading to each of these levels of data digestion by the various pipelines is described below:

Definition and acquisition of the data

After the Steering Group defined the fields in consultation with the CFHTLS community, the next phase consisted in cutting down the global exposure times per field and per year in order to optimize the observing efficiency (see the MegaPrime site "Exposure Time Calculator" page for more details on such issues). The observing strategy was then entered in the Queued Service Observing (QSO) tool, PH2. See the "Q Service Observing Home" for more on the way the service observing operates at CFHT. QSO's PH2 was flexible enough to allow the CFHTLS coordinators tweaking the observing strategy from run to run in order to optimize the use of sky time (bad weather is by far the worst factor affecting the obsersing efficiency, and tunings are necessary to ensure the completion of the primary science drivers of the survey).

Raw data archiving

Within minutes after an exposure has been obtained on the telescope, the image (a single MEF 700 Mbytes large file) is archived at the CFHT headquarters in Waimea (the summit to the Waimea headquarters link is a DS3 line) synchronously on two SDLT tapes (100 Gbytes capacity). MegaCam generates data at the rate of approximately 100 Gbytes a night. All raw data are immediately transfered through the network to CADC where they are promptly put online and made available to the users who gain access to raw data within 1 day typically, 2 days maximum, from the time it was acquired on the sky.

Pre-processing for the Real Time Analyses Systems

CFHT hosts three clusters of powerful computers dedicated to real time data analysis: 2 for the supernovae program (SNLS: 1 Canada, 1 France) using the Deep component data, and a third for GRBs (from France) using the data from the Very Wide component. The machines are operated by CFHT and users connect remotely from their home institution to install and run their softwares (Linux architecture). The dedicated pipelines on these machines use data which have been pre-processed by Elixir using master detrending frames (bias, flat-field, fringe frames, etc...) from the previous observing run. This proves to be of quality high enough for the science not to be affected by this compromise which however allows a consistent data quality throughout the run, starting at the very first night. Since some spectroscopic follow-up programs wait for some supernovae candidates, it is important to get the raw frames transfered to the Waimea archive as soon as possible as Elixir real-time processing operates only on data already archived. For that very reason, a priority scheme has been setup to prioritize the transfer of the CFHTLS Deep frames to Waimea as soon as they are acquired (otherwise the lag due to the limited throughput of the DS3 line can be up to 1 hour versus a few minutes with the priority scheme). After the image has arrived in Waimea, it gets fully processed by Elixir in about 3 minutes and is immediately delivered to the Real Time Analysis Systems. The pipelines then analyze the data (or the set of data) and the results are published on the teams' site usually the day after and are visible by the whole world. No pixel data are made available though, in compliance with the CFHTLS data policy.

Elixir Processing

As part of the New Observing Process, CFHT is committed to pre-process the data at the pixel level (removal of the instrumental signature). The Elixir pipeline process covers: bad pixels masking, bias & overscan correction, flat-fielding, photometric superflat, and fringe correction for the i' and z' data. Elixir also derives an astrometric solution on a per CCD basis at the 0.2 arcsec. scale, and computes the zero point per filter for the whole run using the collection of photometric standards acquired throughout the observing run. A very detailed description of this process if presented on the MegaPrime and CFHTLS Observing Status pages in the "Data Preprocessing & Calibration" section. Creating the master detrending frames and deriving all the images characteristics for the whole observing run takes no more than a week.

CFHT data products distribution and archiving

When Elixir is finished calibrating the whole run, the CFHT distribution system (DADS) can at that point trigger the processing and distribution of the specific subset of CFHTLS images captured during the run (among the global pool of data that also includes the PI frames). The Elixir data come along with the MetaData, a large collection of auxiliary data related to the observing process: weather statistics, observer comments, sky transparency for the night, etc... All this is transfered through the network to CADC where they are made available to the community for download. This entire process takes at the most a total of 3 weeks.

Terapix processing

As soon as the Elixir data arrive at CADC, they are copied to Terapix via a high speed network. The Terapix data center is primarily focused on handling the CFHTLS data and provides its services to the whole CFHTLS community with the production of the weight and flag map images attached to each megacam image, the data stacking, fine astrometric calibration, and source catalogs generation. It takes months to gather all the frames to make proper releases of such data set. Terapix has settled on delivering global CFHTLS releases to the LS community every six months.

Terapix data products archiving and distribution

While Terapix proposes many web pages focused on data quality and the survey progress, its primary products for the CFHTLS community (images and catalogs) are made available only through CADC. Note that all major releases (named TXXXX, starting at T0001) have a large suite of data products for quality control, all of them can be freely accessed on the Terapix web site.