Hi Fred -- There were a couple parameters on etrans1 that were not the
same as on your host, so I have tweaked these to see if they make a
difference.
It might be worth running a few more detailed performance tests (after
the T0006 files have finished transferring, which looks like this should
be early tomorrow). Would you be willing to run a few tests? We have a
locally-developed tool for this purpose, which are using to test the
network performance between our users and the CADC.
Also, I don't know if you are aware of our eTransfer state tool: you can
monitor the state of all Terapix files in the etransfer system, all the
way from being dropped in the pick-up directory to when they leave the
system after being ingested into the archive. It also shows when files
get 'stuck' in the various error states.
http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/cadcbin/etransferState?source=terapix&orderby=datetime
J.
Frédéric Magnard wrote:
> Hi John,
>
> On 09/14/09 17:08, John Ouellette wrote:
> > Hi Yannick -- Several of the last batch of files are still awaiting
> > transfer from France (85, as of a few moments ago), and there are 561
> > files in various rejected states (most having failed FITS
> > verification). It takes almost an hour to transfer each file, so that
> > has definitely proven to be a bottleneck.
>
> A few years ago, we set up some tcp stack optimization in the linux
> kernel with
> Gerald Justice to speed up transfer between terapix and cadc. I still
> have this
> setup on terapix side. Could you please check if it on the
> etrans1.cadc.dao.nrc.ca machine ?
>
> Here is what is in our /etc/sysctl.conf :
> # TCP tuning cf. http://www-didc.lbl.gov/TCP-tuning/linux.html
> # increase TCP max buffer size
> net.core.rmem_max = 16777216
> net.core.wmem_max = 16777216
> # increase Linux autotuning TCP buffer limits
> # min, default, and max number of bytes to use
> net.ipv4.tcp_rmem = 4096 87380 16777216
> net.ipv4.tcp_wmem = 4096 65536 16777216
>
> # don't cache ssthresh from previous connection (default: 0)
> net.ipv4.tcp_no_metrics_save = 1
> # recommended to increase this for 1000 BT or higher (default: 300)
> net.core.netdev_max_backlog = 2500
>
> Cheers,
> Fred.
>
> PS: I checked that packets are going through the geant network, which should
> provide us better bandwidth than the 1MB/s we are currently getting:
>
> # tracepath etrans1.cadc.dao.nrc.ca
> 1: fcix3.iap.fr (194.57.221.23) 0.163ms pmtu
> 1500
> 1: 194.57.221.1 (194.57.221.1) 0.437ms
> 2: no reply
> 3: interco-A17-jussieu.rap.prd.fr (195.221.127.213) asymm 4 0.679ms
> 4: vl165-gi4-0-0-jussieu.noc.renater.fr (193.51.181.102) 0.949ms
> 5: vl171-te1-2-paris1-rtr-021.noc.renater.fr (193.51.179.93) asymm
> 8 1.467ms
> 6: te0-0-0-3-paris1-rtr-001.noc.renater.fr (193.51.189.37) 1.887ms
> 7: renater.rt1.par.fr.geant2.net (62.40.124.69) asymm 8 1.570ms
> 8: so-3-0-0.rt1.lon.uk.geant2.net (62.40.112.106) asymm 9 8.967ms
> 9: so-2-0-0.rt1.ams.nl.geant2.net (62.40.112.137) asymm 10 17.103ms
> 10: canarie-gw.rt1.ams.nl.geant2.net (62.40.124.222) asymm 11 113.209ms
> 11: no reply
> 12: hia-gi-1-10.nrnet2.nrc.ca (132.246.4.130) asymm 14 172.975ms
> 13: 132.246.4.190 (132.246.4.190) asymm 15 173.609ms
> 14: 132.246.192.2 (132.246.192.2) asymm 16 172.948ms
> 15: no reply
> 16: no reply
> 17: no reply
> 18: no reply
> 19: no reply
> 20: no reply
> 21: no reply
> 22: no reply
> 23: no reply
> 24: no reply
> 25: no reply
> 26: no reply
> 27: no reply
> 28: no reply
> 29: no reply
> 30: no reply
> 31: no reply
> Too many hops: pmtu 1500
> Resume: pmtu 1500
>
--
Dr. John Ouellette
Operations Manager
Canadian Astronomy Data Centre
Herzberg Institute of Astrophysics
National Research Council Canada
5071 West Saanich Road, Victoria BC V9E 2E7 Canada
Phone: 250-363-3037
Received on Mon Sep 14 2009 - 11:25:46 HST