Hi Kanoa -- going through the files that have been rejected due to
fitsverify errors, it looks like the errors are rather benign: they seem
to be failing due to a problem with the AUTHOR keyword.
I'm loathe to just turn off the fits verification, and would rather that
we determine why some files have the above errors and others do not.
I'm not sure whether we can choose to ignore certain FITS errors while
still catching fatal errors, but I will enquire with the developers here.
J.
Kanoa Withington wrote:
> Hi John, Hi Fred,
>
> Four stream sounds very low and 60 minutes per stream sounds too high.
> Maybe we should consider increasing the number of streams in addition to
> checking the TCP optimization?
>
> Unless fitsverify is actually catching real errors, maybe it should be
> turned off for time-sensitive releases like this. Presumably Terapix has
> done their own quality control and is happy with the data. Loosing 7% of
> the files to potentially harmless details sounds expensive.
>
> Aloha,
>
> -Kanoa
>
> John Ouellette wrote, On 09/14/2009 11:25 AM:
> > Hi Fred -- There were a couple parameters on etrans1 that were not the
> > same as on your host, so I have tweaked these to see if they make a
> > difference.
> >
> > It might be worth running a few more detailed performance tests (after
> > the T0006 files have finished transferring, which looks like this should
> > be early tomorrow). Would you be willing to run a few tests? We have a
> > locally-developed tool for this purpose, which are using to test the
> > network performance between our users and the CADC.
> >
> > Also, I don't know if you are aware of our eTransfer state tool: you can
> > monitor the state of all Terapix files in the etransfer system, all the
> > way from being dropped in the pick-up directory to when they leave the
> > system after being ingested into the archive. It also shows when files
> > get 'stuck' in the various error states.
> >
> >
> http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/cadcbin/etransferState?source=terapix&orderby=datetime
> <http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/cadcbin/etransferState?source=terapix&orderby=datetime>
> >
> >
> > J.
> >
> > Frédéric Magnard wrote:
> >> Hi John,
> >>
> >> On 09/14/09 17:08, John Ouellette wrote:
> >> > Hi Yannick -- Several of the last batch of files are still awaiting
> >> > transfer from France (85, as of a few moments ago), and there are 561
> >> > files in various rejected states (most having failed FITS
> >> > verification). It takes almost an hour to transfer each file, so
> that
> >> > has definitely proven to be a bottleneck.
> >>
> >> A few years ago, we set up some tcp stack optimization in the linux
> >> kernel with
> >> Gerald Justice to speed up transfer between terapix and cadc. I still
> >> have this
> >> setup on terapix side. Could you please check if it on the
> >> etrans1.cadc.dao.nrc.ca machine ?
> >>
> >> Here is what is in our /etc/sysctl.conf :
> >> # TCP tuning cf. http://www-didc.lbl.gov/TCP-tuning/linux.html
> >> # increase TCP max buffer size
> >> net.core.rmem_max = 16777216
> >> net.core.wmem_max = 16777216
> >> # increase Linux autotuning TCP buffer limits
> >> # min, default, and max number of bytes to use
> >> net.ipv4.tcp_rmem = 4096 87380 16777216
> >> net.ipv4.tcp_wmem = 4096 65536 16777216
> >>
> >> # don't cache ssthresh from previous connection (default: 0)
> >> net.ipv4.tcp_no_metrics_save = 1
> >> # recommended to increase this for 1000 BT or higher (default: 300)
> >> net.core.netdev_max_backlog = 2500
> >>
> >> Cheers,
> >> Fred.
> >>
> >> PS: I checked that packets are going through the geant network, which
> >> should
> >> provide us better bandwidth than the 1MB/s we are currently getting:
> >>
> >> # tracepath etrans1.cadc.dao.nrc.ca
> >> 1: fcix3.iap.fr (194.57.221.23) 0.163ms
> >> pmtu 1500
> >> 1: 194.57.221.1 (194.57.221.1) 0.437ms
> >> 2: no reply
> >> 3: interco-A17-jussieu.rap.prd.fr (195.221.127.213) asymm 4
> >> 0.679ms
> >> 4: vl165-gi4-0-0-jussieu.noc.renater.fr (193.51.181.102) 0.949ms
> >> 5: vl171-te1-2-paris1-rtr-021.noc.renater.fr (193.51.179.93) asymm
> >> 8 1.467ms
> >> 6: te0-0-0-3-paris1-rtr-001.noc.renater.fr (193.51.189.37) 1.887ms
> >> 7: renater.rt1.par.fr.geant2.net (62.40.124.69) asymm 8
> >> 1.570ms
> >> 8: so-3-0-0.rt1.lon.uk.geant2.net (62.40.112.106) asymm 9
> >> 8.967ms
> >> 9: so-2-0-0.rt1.ams.nl.geant2.net (62.40.112.137) asymm 10
> >> 17.103ms
> >> 10: canarie-gw.rt1.ams.nl.geant2.net (62.40.124.222) asymm 11
> >> 113.209ms
> >> 11: no reply
> >> 12: hia-gi-1-10.nrnet2.nrc.ca (132.246.4.130) asymm 14
> >> 172.975ms
> >> 13: 132.246.4.190 (132.246.4.190) asymm 15
> >> 173.609ms
> >> 14: 132.246.192.2 (132.246.192.2) asymm 16
> >> 172.948ms
> >> 15: no reply
> >> 16: no reply
> >> 17: no reply
> >> 18: no reply
> >> 19: no reply
> >> 20: no reply
> >> 21: no reply
> >> 22: no reply
> >> 23: no reply
> >> 24: no reply
> >> 25: no reply
> >> 26: no reply
> >> 27: no reply
> >> 28: no reply
> >> 29: no reply
> >> 30: no reply
> >> 31: no reply
> >> Too many hops: pmtu 1500
> >> Resume: pmtu 1500
> >>
> >
>
--
Dr. John Ouellette
Operations Manager
Canadian Astronomy Data Centre
Herzberg Institute of Astrophysics
National Research Council Canada
5071 West Saanich Road, Victoria BC V9E 2E7 Canada
Phone: 250-363-3037
Received on Mon Sep 14 2009 - 12:53:50 HST