Fwd: compression, floating point and checksum
At our last meeting, there were some questions about using checksums
on floating point data. The answer is that tile compression cannot
guarantee that the decompressed file is _exactly_ the same as the
original (before compression) file - see below. This means that if we
"fingerprint" a floating point data file before compression, that
fingerprint will not be valid for the decompressed file. This also
means that the checksum has little value. We should seriously look at
converting floating point to integer with BSCALE and BZERO, adding
checksums and then compressing. That way both the fingerprint and the
checksum will provide reliable means of data consistency checking.
Séverin
Begin forwarded message:
> From: William Pence <pence_at_milkyway.gsfc.nasa.gov>
> Date: March 29, 2007 2:41:04 PM PDT (CA)
> To: Séverin Gaudet <severin.gaudet_at_nrc-cnrc.gc.ca>
> Cc: Seaman Rob <seaman_at_noao.edu>
> Subject: Re: compression, floating point and checksum
>
> Séverin Gaudet wrote:
>> Hi Bill, Rob
>> In conversation with Rob earlier today, two issues came up which
>> Rob suggested I pass on to you. I also discovered a new behaviour
>> in funpack which Rob may want to look at.
>> 1. Floating point data and checksum
>> I cannot compress and decompress an MEF of floating point data and
>> get the same file back. This points out two things:
>> - the funpack operation does not restore the CHECKSUM and DATASUM
>> keywords but generates new ones. This is probably not the desired
>> behaviour (Rob). If the ZHECKSUM and ZDATASUM keywords exist, they
>> should be restored to CHECKSUM and DATASUM to allow detection of
>> changed bits.
>> - the decompression does not restore the original data as
>> evidenced by a different DATASUM. Is this known behaviour? If so,
>> can it be made otherwise?
>
> The floating point data in general must be compressed with a lossy
> compression algorithm, so the restored file will not be absolutely
> identical to the original. In this case, funpack appears to be
> working correctly in recomputing the checksum keywords and not just
> restoring the cached ZHECKSUM and ZDATASUM keywords. The
> differences are almost certainly scientifically insignificant, but
> the nbits parameter may be used to control how closely the restored
> file will match the original. Directly compressing floating point
> data is not effective because of the random noise bits in the
> mantissa, so we use the algorithm supplied by Rick White which
> converts the floating point values into scaled integers before
> compressing them.
>
>> 2. Data in four dimensions
>> The of the data products of the CGPS is a 4 dimension data cube
>> (see below). The JCMT will soon be producing large 4 dimensional
>> data cubes as will.
>> gaudet amrod ~/checksum_test [79] ./cfitsio3030/listhead
>> CGPS_MEV1_HI_line_image.fits | grep NAXIS
>> NAXIS = 4 / number of data axes
>> NAXIS1 = 1024 / length of data axis 1
>> NAXIS2 = 1024 / length of data axis 2
>> NAXIS3 = 272 / length of data axis 3
>> NAXIS4 = 1 / length of data axis 4
>> Currently, imcopy generates a message when NAXIS is > 3.
>> gaudet amrod ~/checksum_test [78] ./cfitsio3030/imcopy
>> CGPS_MEV1_HI_line_image.fits 'CGPS_MEV1_HI_line_image.fits.fz
>> [compress]'
>> FITSIO status = 413: error compressing image
>> only 1D, 2D, or 3D images are currently supported
>
> There is no intrinsic limit on the number of dimensions that can be
> supported. It just requires more sets of nested loops in the
> software for each dimension. I can probably add support for 4D
> data to imcopy without too much trouble.
>
> Bill
> --
> ____________________________________________________________________
> Dr. William Pence pence_at_milkyway.gsfc.nasa.gov
> NASA/GSFC Code 662 HEASARC +1-301-286-4599 (voice)
> Greenbelt MD 20771 +1-301-286-1684 (fax)
>
>
Received on Thu Mar 29 2007 - 13:49:55 HST
This archive was generated by hypermail 2.3.0
: Thu Jul 27 2017 - 17:52:27 HST