Image Registration database

Every image which is known by the Elixir system, whether obtained by a camera intimately connected to Elixir or introduced from outside by a user, is recorded in the registration database, Imreg.db. This database contains various pieces of information about the image which are useful for organizing or categorizing them, and for identifying trends.

The normal way in which images are introduce to Elixir is during at the time of acquisition, with a method to be discussed below. Data which is stored in the Imreg.db focuses on data which is determined from the telescope (such as RA \& DEC), measured by the telescope support environment, such as various temperatures, or can be easily determined from the images themselves with little effort (bias, sky).

The Imreg.db, like all of the Elixir databases consists of a binary file with a FITS-like header. The binary data of the file consists of a series of C-structures written in sequence, with a defined byte order for multiple byte fields. APIs are defined by the Elixir system to make access of the data very straightforeward.

The following C structure defines the Imreg.db:

 /* structure for Image Registration Database */
 typedef struct {
   char filename[64];     /* image filename */
   char pathname[128];    /* image pathname */
   char filter[32];       /* filter name */
   char instrument[32];   /* name of camera */
   char ccd;              /* number of ccd if mosaic */
   char mode;             /* format of image file (MEF/SPLIT) */
   char type;             /* imagetype (OBJECT, FLAT, etc) */
   char junk[25];         /* extra space */
 
   float exptime;         /* exposure time, seconds */
   float airmass;         /* airmass */
   float sky;             /* median of data region */
   float bias;            /* median of overscan */
   float fwhm;            /* fwhm of stars in image */
 
   float telfocus;        /* telescope focus setting for image */
   float xprobe, yprobe;  /* x & y position of guide probe */
   float zprobe;          /* z position of camera */
   float dettemp;         /* temperature of CCD, K */
   float teltemp[4];      /* several environment temps, C */
   float rotangle;        /* rotation angle of instrument */
   float ra, dec;         /* TCS-reported ra & dec */
 
   unsigned long int obstime;  /* time of image */
   unsigned long int regtime;  /* time image was registered */
 } RegImage;  /* 360 bytes / image */
 

This structure defines a wide variety of information which may not be either relevant or automatically available at all sites. This database is intended to allow diagnostics of the telescope system, so while it is convenient to fill as many of these fields as possible, it is not particularly necessary to other parts of the Elixir system. There are some entries which deserve extra attention.

At CFHT, most of the data in this structure can be determined by examining the image headers. Currently (10/2000), the names of the relevant header keywords are hard-coded in the relevant software. However a reasonable and obvious extention would provide these entries in a lookup table of some sort. Some entries are not available in the image header and have to be provided in an alternate way.

The telescope temperature data is not provided in the image headers. Currently, these are recorded by the 'data logger', a daemon which runs at the summit and stores a wide variety of data in a proprietary database. This data is relevant not only to images which come off the telescope, but also to images which are being analysed from the archives. To access this data with Elixir, we have provided two mechanisms. First, we have a small script which can extract the data for the current night. This script runs as a daemon during a 12k run, and places the temperature information in a specified location. Then, the Elixir programs can search this file for the appropriate temperatures at the appropriate time. Second, for an archive analysis, the temperatures for the appropriate time range (ie, Sept 1999) are similarly extracted into a specific file. The Elixir software then searches though this file for the temperatures for a given image. This two step process is somewhat cumbersome, but is used to speed up the otherwise somewhat slow access to the complete datalogger database. Ideally, the data which goes in this database should be available from the image itself. This suggests that the TCS should write the temperatures of interest in the header, which would avoid the current cumbersome system.

The image name and pathname are available to the program which registers the image in the database. This program, since it must read data from the image header, must have access to the image itself. It can either be passed a full path to the image or it can be given a relative path. In either case, it records the current path to the image in the 'pathname' variable. This information is used by later elements of Elixir to find and display images as desired. One drawback of the current implementation is that it assumes the images remain on disk in the same location. This is clearly not guaranteed. It is one thing if the user removes the image of interest: this is expected and a simple error saying the image is unavailable can be returned. But, if the user moves the directories containing the images, it would be helpful to provide a mechanism to allow the database to make this change trivially. Possible options along these lines might include: 1) A program to change the pathname for specific images or for groups of images following specific rules (perhaps similar to sed's substitution rules). 2) The upper fraction of the path can be represented by a variable which can be easily changed by the user at will. 3) The move / remove function for images can be implemented with an Elixir specific version of mv / rm, which would make the update automatically. Each of these strategies has advantages and disadvantages, and which should be weighed before implementation.

The remaining type of data are those which cannot be determined by the image header, but rather from the image itself. These include the bias and data region median values, the latter representing the sky flux, and the average FWHM of objects in the image. Within the Elixir system, these values are determined by a fast analysis pipeline called 'imstats', which is documented in detail elsewhere. These values are determined and added to Imreg.db after the images themselves have been added.

One point to be made relates to the issue of mosaic data formats. There are two typical ways mosaic images are stored: MEF and SPLIT. In MEF, the entire mosaic is stored as a series of extensions in a single FITS file. In SPLIT, each CCD image is stored as a separate FITS file with related names. The Imreg.db allows for three types of images: MEF, SPLIT, and SINGLE, the last referring to images from a single CCD detector. A philosophical choice we have made is to have every CCD represented in the Imreg.db. This implies an entry for each CCD of a MEF image, even if these have the same file name (not to mention RA, DEC, temp, etc). This is necessary since several elements of the database table (sky, bias, ccd) refer to CCD-specific information. It is also necessary to minimize the differences in handling the the SPLIT and MEF images.

Image Registration Database input / output functions

There are several routines which are related to maintaining Imreg.db. Several are used to introduce images or relevant data into the database, while others are used to extract data as needed from the database.

The basic data entry program is 'imregister', which places the basic information about an image in the database, as determined from the header. The program is invoked with: imregister (filename) [-split]. The optional flag is used to tell the program to distinguish the individual (SPLIT) frames of a mosaic CCD from an individual CCD which should be treated as an isolated image (SINGLE). The MEF images can be identified by the information in the image header. The imregister program also searches for temperature information as described above.

A varient of the 'imregister' program is 'imsort', which does the same task as 'imregister', but it also sends a trigger to the IMSTAT elixir and if needed to the PTOLEMY elixir. These triggers tell these elixirs to perform their analyses on the image. In 'imsort' and in the related elixirs, the -split flag makes a difference in the naming convention of the derived data products. The basic point is that a SINGLE image /fullpath/filename.fits produces analysis files of the form /newpath/filename.ext while a SPLIT image will have the form /fullpath/word/wordNN.fits and output files of the form /newpath/word/wordNN.ext. The SPLIT maintains an extra directory level in the path in common with the input path.

As mentioned above, Imreg.db include statistics determined from the images themselves by the imstats elixir process. The results of these measurements are included in the database with the command 'imstatreg', which finds the database entry for the given image and adds in the new statistics as needed.

For data extraction, the program 'imsearch' is an all-purpose searching tool. This program lists all images which match a set of constraints listed on the command line. Without any arguments, it therefore lists all images in the Imreg.db, along with a summary of the interesting information. Flags to the program can restrict the search, including options such as:

  • -ccd N : restrict search to CCD number N
  • -type WORD : restrict by image type (flat, etc)
  • -mode WORD : restrict by MEF, SPLIT, SINGLE
  • -filter NAME : restrict by filter
  • -time date range : restrict by date \& time period
  • -trange date1 date2 : restrict to range date1 - date2
  • name?

These allow for easy searches of specific image. Since the data is passed to standard out, more sophisticated searches and analyses may be easily performed by passing the data to other UNIX filters like sort, grep, and awk. The above entries are generally case insensitive. The dates may be specified in the format: [yy]yy/mm/dd,hh:mm:ss. The separators may be any character except space or period (.), as long as it is not parse by the shell (ie, ?). Entries in the dates may be dropped from the right (least significance) as needed and default to their minimum values. The words TODAY and NOW may also be used. The date may also be specified as a julian day if suffixed with 'j' or as a length of time since Jan 1, 1970 (or is it REFDATE?), if the units are specified with suffixes as describe below. The time range can be specified in several units if specified with the following suffices: s (seconds), m (minutes), h (hours), d (days), M(months), y(years). Note that only minutes and months are case sensitive here.

Data extraction may also be performed in more detail with the 'status' program. Functions in 'status' allow the extraction of the various fields into vectors (or string vectors??) which may then be manipulated as needed. Unlike imsearch, where only a minimal subset of the available data are reported, status allows access to all entries in the database table.

short side note: implementation specific issues} there are two major aspects of the analysis process which depend to some extent on the assumptions we have made at CFHT. I have tried to minimize the number of places that programs need to know something specific to CFHT that might not be treated exactly the same elsewhere. The first of these aspects is the use of SPLIT vs MEF format images for the files. It is necessary to distinguish them and to treat these differently in some cases (particularly when an image is loaded). It is particularly difficult to decide when a specific CCD in SPLIT format is one of a mosaic or if it is just an isolated (SINGLE) CCD image. Header keywords to disinguish these cases are not well defined. Related to this is the question of the number of CCDs in the mosaic. In a limited number of locations, it is important to know that there are 12 (or N) CCDs in the mosaic. The other implementation specific issue is the naming convention. This is related to the SPLIT/MEF issue as well. At CFHT, we have used the convention that a MEF image has a name of the form /some/path/NNNNNNx.fits, where NNNNNN is a sequence number and the 'x' is a flag associated with the image type (FLAT, OBJECT, etc). For a SPLIT image, the NN CCD images are placed in files with names of the form /some/path/NNNNNNx/NNNNNNxMM.fits where MM is a 2 digit number representing the CCD number. At some level, it shouldn't matter that this is the format for the split image. We could just accept each file of the form NNNNNNxMM.fits as a separate fits image. But, we have chosen to keep the organizational structure and allow the processing result files (products produced by 'elixir' typically) to have names of the format /new/path/NNNNNNx/NNNNNNxMM.ext, not only for the SPLIT but also for the MEF products. As a result, it is necessary for the certain functions to know this naming convention and apply it as necessary.