Re: e-transfer from terapix

From: Séverin Gaudet <severin.gaudet_at_nrc-cnrc.gc.ca>
Date: Tue, 27 Mar 2007 10:00:01 -0700

Hi Kanoa

You are correct in your analysis but there is a solution. I have
talked with both Rob Seaman (the author of the FITS checksum and co-
author of the tile compression we are using) and Bill Pence (the
author of cfitsio and co-author with Rob on the tile compression).
And here is the solution.
        - the latest version of the cfitsio library (version 3.030) supports
round trip of CHECKSUM and DATASUM keywords. That is, the keywords
are renamed to ZCHECKSUM and ZDATASUM, the file is compressed and new
values for CHECKSUM and DATASUM are computed and stored. Upon
decompression, the original values are copied back to the keywords.
        - the latest version of the compression and decompression
applications fpack/funpack (version 0.93) support this. I have tested
them.

In addition, the application fitsverify which we use in the etransfer
process currently supports checking the CHECKSUM and DATASUM values
and reports when they are inconsistent, e.g.,
> =================== HDU 2: BINARY Table ====================
> *** Warning: Data checksum is not consistent with the DATASUM keyword
> *** Warning: HDU checksum is not in agreement with CHECKSUM.

So we can implement stand-alone file consistency checking on both
compressed and decompressed versions of a file without having to
carry an external piece of meta-data. This will also benefit users
since getting an external piece of metadata to them for independent
verification is problematic.

Séverin



On Mar 21, 2007, at 2:03 PM, kanoa wrote:

> Hi JJ,
>
> We have not been using the FITS standard checksums on science data
> though we do use it for metadata tables.
>
> I'm a little concerned about the 'standard' method in that the
> checksums are inserted into the header, which is fine until you
> compress the file. If you use a tile compression like RICE the
> headers are still readable and they have checksum data but the
> checksums are wrong. Then we would also like to have checksums of
> compressed files, as we do now, because that is the format we
> usually transmit data in.
>
> So what do we do? Update the checksums after compressing? Then they
> will be invalid when we decompress the file and we're making an
> unreasonable assumption about data integrity through the codec.
>
> We could record the original checksum values to compare with the
> decompressed file later but we would first have to update the
> checksum header values in the decompressed file first since we
> cannot trust that they represent the actual data anymore. Then we
> would need to compare header and data checksums for each extension,
> 72 checksums for each megacam image.
>
> I don't see an easy way out of the mess which is why I've been
> taking md5sums of the uncompressed file and storing them
> externally. Maybe we could store the CRC of the compressed file in
> addition? The CRC would help us track the file in our systems and
> the md5 would help users who have decompressed the file.
>
> -Kanoa
>
> JJ Kavelaars wrote:
>> On 20-Mar-07, at 7:34 PM, kanoa wrote:
>>>
>>> I'll second this, it would be nice to use a widely available
>>> hash format so we (CFHT) and other users can confirm data
>>> integrity. md5 would be ideal since we already track md5
>>> signiatures for all CFHT data.
>>>
>>> -Kanoa
>> BTW Kanoa, the cadcCRC binary is on the machines at CFHT also.
>> How- ever, as mentioned earlier, we are moving to MD5 and also
>> looking at complying with the FITS-standard initiative of CRC in
>> the header.
>> Has CFHT been looking at following the FITS standard on checksums?
>> JJ
>>>
>>>
>>> Frederic Magnard wrote:
>>>
>>>>> Alternatively, I can provide you with our 'CRC' of the file
>>>>> and you can
>>>>> compare our 'CRC' to value you get by running our crc-
>>>>> generator on the file.
>>>>> The binary for our crc generator is on your machine at
>>>>>
>>>>> clix.iap.fr:/home/nis/cadc/bin/cadcCRC
>>>>
>>>> Thanks. Could we please have the source code of this program ?
>>>> Do you plan to switch to a checksum like md5 ? It's almost 2
>>>> times faster,
>>>> and the resulting hash is 4 times longer. Did you ever had
>>>> collision
>>>> problems with this CRC ?
>>>
>>>
Received on Tue Mar 27 2007 - 07:00:03 HST

This archive was generated by hypermail 2.3.0 : Thu Jul 27 2017 - 17:52:27 HST