![]() |
![]() ![]() ![]() |
![]() |
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
![]() |
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Monitoring iperf using web100So lets recap: transfers (both lan and wan) using iperf 1.2 (April 2001) seem to give a really poor performance after about 60seconds. Explain.... SetupScripts used: Note that do_time.pl was modified to handle logvars monitoring. do_time.pl, graph.pl, logvars (1.0.2) Interesting results will be the evolution of cwnd, the number of slowstarts, and as a general overview the number of packets and databytes out of the system. All tests unless otherwise stated are between pc55 topc56. with 8m receiver buffer and 25k send buffer. Run4transferring iperf for 180 seconds between pc55 and pc56 via gigE seems to generate a web100 log file that was just a little too big - 12mb worth! My little graphing program can't handle that many points, and nor could Excel, so i'm gonna lower the logvars resolution from 10ms to 100ms. This should only create a 1mb file then... Anyway, for those who are interested, heres the log file! Run5Same again, but with a 100ms frequency of web100 traps. (hopefully i'd be able to plot it!) ------------------------------------------------------------
The high value of ssthresh initially is always like this - its built into the kernel. The change at just before 100secs is due to congestion events halving the ssthresh to half of the cwnd each time. Right, these show the transfer against time. The dip just before 100 seconds imply something happened. As the ssthresh fell, it is possible that this was due to a congestion event. However, the databytes out and pkts doesn't really change much after it. This is mostly due to the fact that currentssthresh determines when to go back into congestion avoidance. Lets look at timeouts
We can see that the timeout was caused by the complete lack of ack packets coming in. This forces tcp to retransmit the 13 pkts. These also have the effect of changing the SACK status:
As we can see the recv is sending ACKs back to us saying that it hasn't recieved a certain packet. However, the question still remains as to the cause of this timeout. It's loss, of course, but from where? Just to see whether this was a one off anonoly, we do the test again.
Run6So here's the same thing again: ------------------------------------------------------------
Yep, same thing again - although this time, it happened a bit earlier... :( Maybe its the amount of data transfered?
Same again, just earlier. Okay, lets' see how much data is transferred. Well, according to DataBytesOut_(Value), at the point just before the timeout, the total mbytes send was 6103.2 - or 5.96gbytes of data. For run5, 6492.87Mbytes or 6.34Gbytes. How about number of packets? run5: 4801201 of which were datapkts 4801199 number of acks in: 1614444 Hmm.. can't really say anything here:( However, it is a way of indicating loss: it's approximately 13 in 4,500,000
or 3e-5, which is actually quite high! although one can seriously question
why we get them all at once! Run7We should see whether this is periodic or not. As such, run7 is done with a 1000sec test, and an web100 trap rate of 250ms. ------------------------------------------------------------
The drops are less pronounced, mainly because of the higher trap rate. But there are periods in which the number drops. Looking at cwnd etc.
There's no peridocity here.... Let's look back at the iperf transfer logs: In web100: it says that 65284.93754mbytes were transferred. So this is
65284.9/1000 sec = 65.28 mbytes a second average!! That's 522.27 mbit/sec.
Which is actually okay :) Iperf states that it has transferred 4031684608*8/1000000 mbytes = 32253.47 mbytes! in 1000 seconds, this is 32.25 mbits/sec ... something is fishy here.... which is the same as the final figure on it - 32253456/1024/1024 mbits/sec = 32.25 mbits/sec. Thought this was a bit strange as the databytesout didn't change... should have calculated that! oh well. So web100/tcp says that we are actually sending out the right amount of stuff... BUT iperf seems to be lying... probably because it's using floats instead of double for example. I suppose the next thing to do as a result from this is to find out what the maximum amount of data trasfer is before the counter is wrong... i should also investigate into the 1.6 version to see whether it has the same bug. From the graphs of the previous lan tests, one can see that with an average of about 550mbits/sec, its okay upto 60 seconds, so it start messing up about 33,000 mbits, ie 4.1Gbytes. Also, for small windows, 240mbitsx120sec = 28,800 = 3.5Gbytes. Using the iperf -n option to transfer a set number of bytes, i'll choose {1.5gb, 2gb, 2.5gb, 3gb, 3.5gb, 4gb, 4.5gb, 5gb, 5.5gb}, again with the 25k window. These should take about (assume 550mbits/sec = 68mbytes/sec) {23,30,38,45,53,60,68,75,83} Run8In order to do this, we must write a new script - so i'm gonna hack do_time.pl to do_bytes.pl. whilst i wait for the results, if a variable has type float of 32bits, then it can only hold 4,294,967,296 values (if unsigned). This is only 4gb if we convert to bytes, ie 4096. So the chances are that once we send more than 4gb of data, the values are overwritten hence making a mess of the values
As you can see, the 4gb transfer has duration 0! the the link_utilisation is very small (probably as a consequence), and then the time difference expected for each 0.5gb is used as the duration instead. Also the bytes_transferredfor the values >4gb are wrong, instead its the amount greater than the value of 4gb that is recorded. Strangely tho, the link_utilisations for these are correct (or at least look okay). And it's not a result of the bytes_transferred/duration as they would give {561.1,496.5,529.7}mbits/sec. So the chances are the values are the real averages of the entire transfer. Hmm... okay, so why the strange fall when transfers are greater than 60 seconds? I would have guessed that it's related to this 4gb thing. Due to a problem with my logging of web100, i had to redo the tests of 4096 and greater.
This table shows that differences in what the tcp is detecting and what iperf is giving. As you can see, the rates are different (yep, i've checked the cooked files), with iperf overestimating what web100 states. Checking the math, for the first one 12288/23.4 = 525.13mbits/sec - which is much closer to that reported by web100 than iperf. Similarly for the rest of them! Iperf can't do math!! For the other bits, ie files larger than 4gb, according to web100, it doesn't actually transfer more than 4gbs. This is shown in the web100 traces of >=4gb which show that only the difference is transferred (haven't i already said that!?). And this is confirmed for the duration of the tests. So to summarise, iperf wraps transfers to multiples of 4gb. It then only transfers the remainder of the value. However, i'm still not sure of the >60secs problem, although i would at the moment guess that this has something to do with this 4gb problem (i know windows used to have problems with this limit, but i thought it was only windows!). Looking at the plots on the previous page; 4gbytes = 32,768mbits, given that the transfer rate from man to cern is about 550mbits for large buffers, means that it would take about 60 seconds to transfer 4gb....!! Strange thing is that SLAC transferred about 11gb with iperf! see here. and they didn't see this problem... i'm just unlikely i suppose :( Is there a kernel parameter i need to fiddle with? One thing to consider is that with this 4gbyte limit, if we could get 1000mbits/sec out of our card, that means we can get 125mbytes/sec... so we can only transfer for 33 seconds to get sensible results from iperf..... Other things to check are: Whether this applies to the 1.6 release of iperf. However, these tests were conducted with the -fb option to get the bytes and bits rather than the k/m values. Doing a quick test without the -fb option, gives [ytl@pc55:~/iperf]$ iperf -c 192.168.0.56 -w 25k
-t 180 So it seems like as though iperf doesn't work for transfer longer than 4gbyte in size... :( Lets see, 550mbits/sec for 180 is 12,375mbytes = 12.08 gbytes. iperf reports that it only transferred 3.4gb - which we know is a lie - 3.4gb = 27,853mbits. 27,8753/180 = 154.7mbits/sec... lower than the value given, but near. So much for using it for background traffic then... hmmm....
Iperf 1.6.1Let's try out iperf v1.6 then.... It seems like 1.6.1 is out here. installed on both pc55 and pc56... Initial tests by setting -n 4096m FAILED! [ytl@pc55:~/iperf-1.6.1]$ ./iperf -c 192.168.0.56
-w 25k -t 180 oh well.... maybe i should report this... then again, it is a security feature i suppose... but 33 seconds... gotta check with SLAC. Right, /proc settings - on pc55 [ytl@pc55:/proc/sys/net/ipv4]$ cat tcp_rmem and pc56 [ytl@pc56:/proc/sys/net/ipv4]$ cat tcp_rmem Mine are slightly different to the ones on the slac site - most noticeably the tcp_{r|w}mem values. They refer to "min default max" of the autotune buffers. Hmmm... am i limiting the transfer rate by having the value set in the rmem_max et al? I suppose with iperf, it's actually giving out the correct values so it's okay.... lanl suggest: echo 6553500 > /proc/sys/net/core/wmem_max But i think 64m for the max is a bit extreme, so i've toned it down to 8m on both machines.. Running iperf 1.6.1 again on this new configuration gives... [ytl@pc55:~/iperf-1.6.1]$ ./iperf -c 192.168.0.56 -w
25k -t 180 ie the same.. so much for tha
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
![]() |
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
![]() |
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2001-2003, Yee-Ting Li, email: ytl@hep.ucl.ac.uk,
Tel: +44 (0) 20 7679 1376, Fax: +44 (0) 20 7679 7145 Room D14, High Energy Particle Physics, Dept. of Physics & Astronomy, UCL, Gower St, London, WC1E 6BT |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |