Re: e-transfer from terapix from JJ Kavelaars on 2007-03-20 (dog)

From: JJ Kavelaars <jjk_at_hia-iha.nrc-cnrc.gc.ca>
Date: Tue, 20 Mar 2007 22:05:40 -0700

Thanks for the response,

We will install the e-transfer system on clix in the next days.
>> We also
>> request that you 'batch' the placement of the sym-links so that we
>> don't have
>> a process that is trying to get more then about 100Gbytes of data
>> in one
>> step, just to be sure we aren't making mistakes.
>
> Couldn't this check be done at the e-transfer level ? This would
> avoid a layer
> on our side.

When you want to send us a file you will put a sym link into the
transfer directory on the transfer machine. We will then transfer
the file. You can place as many files as you would like in that area
at one time. I like to batch processing to a few 100Gb at one time,
but you can push more across at once if you choose.

>
> The step1 has 15850 RICE compressed weightmaps, which is 4.14 TiB
> of data.
> Some of those weightmaps are the same as in T0003, with the same
> name, e.g.
> 716303p_weight.fits.fz.
>
>> A work around for you is to place all sym-links into the new
>> area. If we
>> already have a copy with that name we will put the sym link into
>> an area like
>> rejected/not-new. Then you can move it to replace. We will check
>> the replace
>> area file and if we have one with the same CRC then we'll put the
>> symlink into
>> the directory rejected/not-replace.
>
> OK, then the CRC is done only at this stage, right ? This saves
> computer
> resources, and bandwidth.

Yes, we save as much of each as possible. We do the CRC at your end
before we take the file, and only if the file is in `replace'

>
>> Alternatively, I can provide you with our 'CRC' of the file and
>> you can
>> compare our 'CRC' to value you get by running our crc-generator on
>> the file.
>> The binary for our crc generator is on your machine at
>>
>> clix.iap.fr:/home/nis/cadc/bin/cadcCRC
>
> Thanks. Could we please have the source code of this program ?
> Do you plan to switch to a checksum like md5 ? It's almost 2 times
> faster,
> and the resulting hash is 4 times longer. Did you ever had collision
> problems with this CRC ?

Currently the source code of our CRC is not public [sorry]. We have
not had collisions with this CRC since the CRC+file_name is the
uniqueness we look for, not just the CRC is the same.

We are moving to md5, migration should start in ~6 months but depends
on other pressures at the archive. We are also looking at the
standard expressed in MissFITs document.

>
>> I can add the CRC values we produce to the megaprime_proxy table.
>
> Yes, that would be great. I can then move to new only the needed
> files.
> Could you please also explain again how to use it (and its output
> format),
> as I guess it might have changed a bit ?
>
> The list of weight images is ready, just waiting to be filtered to
> eliminate
> already transfered files. This time, the links will point to NFS
> mounted
> filesystems. That's why, I asked, for the near future, to have one
> e-transfer daemon per machine hosting the data.
>
> The proxy http://cadcwww.hia.nrc.ca/cadcbin/cfhtInfo seems to still be
> active, could you please explain how to use it, if it can be
> useful ? It
> looks like it can return the CRC too.

If you use the cfhtInfo proxy then you can get all the info you need
and I don't need to bother with the megaprime_table changes. To get
the CRC for a file [such as file 789054p.fits] use the command:

curl 'http://cadcwww.hia.nrc.ca/cadcbin/cfhtInfo?
file=789054p&options=-fileCrc'

this will return the CRC of the file. [you just take the .fits of the
filename in the URL] So, you can use cfhtInfo to check the CRC.

as another example I checked the weightmap image you mention above:

csh> curl 'http://cadcwww.hia.nrc.ca/cadcbin/cfhtInfo?
file=716303p_weight&options=-fileCrc'
csh> 0xb6adf2a4

>
> The csv dump of image_name, grade, comment is in
> /data/clix/fc6/cadc/new/grade_comments_step1_T0004.csv

Thanks,

JJ
Received on Tue Mar 20 2007 - 19:06:14 HST

This archive was generated by hypermail 2.3.0 : Thu Jul 27 2017 - 17:52:27 HST