| CFHT archive manual - dearchiving | |
![]() |
|
The current archiving system pulls new files from the summit to a host in Waimea and writes the files to two identical DLT tapes. One tape is sent to CADC to be archived there and one is kept at CFHT in Waimea. Generally astronomers are encouraged to copy and care for their data at the time of acquisition, but occasionally files need to be retrieved from the archives. The aim of this document is to assist in that process. General archiving questions can be directed to the archive administration group: archive@cfht.hawaii.edu.
The general archiving process works as follows: Files are accumulated on the archiving host in Waimea (kapu) until they reach a size or quantity threshold, ie. 2 gigabyte or 100 files. Those files are then written in sets to DLT tapes. Information is kept on the archive host about which set on which tape each file was written and when.
For general information on tapes, tape devices and facilities at CFHT see the tape information page in this manual. There are several in-house tools that have been built to help you find,
extract, and work with files from the archives. A description of those
tools, and how to use them follows.
The general procedure for recovering a file from the archives is:
The tape and set number are retrieved from the records and the
tape itself is retrieved from the data vault.
The tape is loaded into an
appropriate drive and positioned to the appropriate set.
The file is read
off of the tape onto the local host or a remote host.
If the file was read
onto the local host the file is transferred, over the network or via another
tape, to wherever it needs to go and the temporary copy is erased.
It is
very important that at the end of this process the tape is returned to
the data vault.
Usually the machine Kapu, which is located next to the archiving host, is appropriate for working with the archives. As any CFHT user you will have access to the tools described below. If you are just looking for a single file you can skip to the section "Dearchiving a single file.".
find-archive.tcl
scan the archiving records for information about a file
extract.tcl
dearchive an entire DLT
extract-sets.tcl
dearchive an entire set or range of sets
mta.tcl select the
dearchived files you want and put them on some other media
Find-archive searches the archiving records for information about a file. Note that find-archive exits as soon as it finds a match so it only returns one entry. Find-archive is flexible so only give it as much of the name of the file as you are sure of and don't give it any wildcards (like *,? [], etc). Use it like this:
[archive@kapu:~] find-archive.tcl file
for example:
[archive@kapu:~] find-archive.tcl 528777
working...
going back further into the records...
528777 found in document tarexa_list.CADC223 :
528777f.fits on CFHT-CADC223_TAR - tar set 4 - Thu Apr 13 21:28:46
HST 2000 10494400
[archive@kapu:~]
CFHT-CADC233 is the tape the file is on.
Extract takes three arguments, the destination directory, the source device and the number of sets on the tape. The number of sets on a tape should be written on the outside of it's box. So the construction would look like this:
[archive@kapu:~] extract.tcl -dst destination -src source device -sets no. of sets
for example:
[archive@kapu:~] extract.tcl -dst /local/data/dearchive -src /dev/rmt/0 -sets 33
Extract makes a subdirectory for each set where it writes the files
names 0,1,2,3 etc.
Note that each DLT holds about 35 gigabytes of data, so dearchiving
an entire DLT takes time and space.
Extract-sets works like extract except that it extracts only the sets that you specify. It is also interactive so rather than giving it a string of arguments you enter information as it asks for it. It needs essentially the same information as extract except that it asks for a first and last set number rather than a total tape. One other difference is that it puts all of the files into the current directory which makes it easier to sort them.
Mta works on a directory of dearchived files, spooling them into a temporary directory in groups appropriate to the size of the media you are using, then writing the files to tapes. mta can use any size of media and can write to any number of tapes. It gets information from you interactively so you don't send it any arguments. It needs to know the directory you want to work in, the device you want to write to, the size of the media you will be using in kilobytes (a 5g exabyte tape for example would be 5000000), the first file in your range and the last file. When entering the file names just enter the number associated with it, for example if the first file is 528777f.fits just enter 528777.
As with any dearchive operation the first step is to get information about the archive. To extract a single file first query find-archive for information. Once you have found the tape that the file is on, get an archive administrator to retrieve the tape for you. Take the tape to the machine Kapu next to Ono, or another appropriate tape host in the computer room. Get a terminal window open and get yourself into a scratch directory you can work in, like /local/data/dearchive:
[archive@kapu:~]
cd /local/data/dearchive
[archive@kapu:/local/data/dearchive]
Completely insert the DLT into the drive and slowly close the lever. DLT tapes and drives are delicate so please be cautious. Once the tape has been loaded and the lights on the drive have stopped flashing you can enter the command to position the tape.
Working with sets: Files are written to DLT tapes in sets to facilitate retrieval. To access files in a particular set you first position the tape to the beginning of that set with the mt command. It's important when you are positioning a tape that you use the non rewinding device for the tape drive. In Solaris usage the non rewinding device usually ends in an "n", so the non-rewinding device of /dev/rmt/0 is /dev/rmt/0n. In linux the "n" comes before the "st" so /dev/st0 would be /dev/nst0. If you are going to extract files by hand you need to skip forward a number of sets one less than the one you want to extract from. For example if you want to extract a file from set 4 on a particular tape you would first skip forward over the first three:
[archive@kapu:/local/data/dearchive] mt -t /dev/rmt/0n fsf 3
Once the tape is in position your prompt will return and you may issue the command to extract the file:
[archive@kapu:/local/data/dearchive] tar xvf device file
for example:
[archive@kapu:/local/data/dearchive] tar xvf /dev/rmt/0 528777f.fits
If you don't specify a file the entire set is extracted. Once this command is complete your file will be in the current directory and you may copy it off of the machine. You may also put it on another tape using:
[archive@kapu:/local/data/dearchive] tar cvf device file
for example, to put it on an exabyte tape:
[archive@kapu:/local/data/dearchive] tar cvf /dev/rmt/1 528777f.fits
Please erase the file from Kapu when you are done, an do not leave anything
important in this area as it is "cleaned up" periodically.
Working with a range of files is a little different. Use find-archive
to determine the information for the fist and last files in the range.
Then create a directory in the scratch
space to work in so your files will not be confused with any others.
In that directory use extract-sets
to dearchive the sets that contain the files that you want. Then you can use mta
to
sort out exactly the files you want and write them to one or more new tapes
that you can take with you, or to a tarfile you can send over the network.
Alternately you can use wildcards to sort your files. For use of the tools
mentioned see below.