![]() |
![]() ![]() ![]() |
![]() |
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
![]() |
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Network Analysis of CERN <-> RAL LinkSeptember 2002 Goal To characterise and suggest improvements of the link between CERN and RAL for higher throughput transport using existing software technologies and available solutions for large file transfers.
Method To look at a lot of graphs of network measurements between CERN and RAL and try to understand why throughput is low. Compare transport results to ICMP pings, traceroutes, understand the differences between the difference between transport protocols and the software that utilises them. This will not be easy.... :(
Overview This shows the udp and iperf throughputs from ral to cern for the last two weeks. Notice that the udp throughput is much higher than that of the tcp iperf throughput. We are using relatively high socket buffer sizes for the tcp throughput. Test duration of the the tcp stream is 30 seconds. A previous study showed that this was the optimal value in order to conduct iperf tests between Mancester and CERN sites. See here. there was a period between the 29th August and the 2nd September in which no tests were performed. This is shown by the linear lines between these regions. The tests were turned off on purpose for maintainance checks. This figure shows the same link but in the opposing direction, ie from CERN to RAL. Notice also that the tcp throughput is much lower than that of the udp throughput with the same configuration. The results from this link seems to be smoother and less deviant. It is even possible to see the cyclic variation of the iperf transfters in september. Questions:
Looking at the configurations of the pcs:
Looking at this table - the machines are quite different, not only are they running different cpus altogether, but they also have different OS versions installed.
Iperf During the period we altered the socket buffer sizes in order to see the variation in throughput as a function of socket buffer size. From this graph, we see that there was a large change in the tcp throughput for large buffer sizes after August. (shown in kbytes). We also see that beyond about 1024kB, there does not seem to be any huge benefit in increasing the socket buffer size during the start of September. This shows similar results to the previous graph, but no quite as extreme. Socket Buffer Sizes
As the files for the 2.2 kernel tcp settings are also available in the 2.4 kernel, they are also shown here.
Theory suggest that past a linear region for low socket buffer sizes, we should see a plateau where increasing the socket buffer size does not affect the throughput at all. The start of this plateau is the optimal window size for the link for a long term tcp file transfer. Lets plot the difference between the readings in sept and aug: Edit: Throughput is actually in mbits/sec - NOT kbit as
The file for these graphs can be found here. (Excel) There is a possible answer to TCP2: It was found that the maximum window size settable by the iperf server under linux machines to be just 2mB; this would suggest that for client socket buffer sizes over 2mB, we are actually using a larger socket buffer size. However, TCP states that the maximum sending window be min(cwnd, advertised recv window) - this means that we should get the same throughput for all values over 2mB (as we're limited by the window..)...!!! hmmm.... Anyway, Theory suggest that at low socket buffer sizes, the throughput is wholly limited by what is available to send. Hence, even though the tcp protocol can allow for higher throughput, there is no data to send and thus, the thoughput is reduced. This should a linear relation between the two variables. We should also see a good one-to-one relationship betwen the maximum throughput and the statistical ave and or median values as a result of this relation.
We can see that for the September transfers, the low socket buffer sizes matches to theory quite well, with very little variation in the throughput values for anything below half a meg. As we increase socket buffer size from 128k, we see a gradual deviation of the maximum throughput to the mediam and ave values. On the contrary, the August values have relatively high throughputs for low buffer sizes: almost equally that of the 256k socket buffer size for both links.
A lan test of the effects of socket buffer size is shown here. The results the experiment back the theory that the results shoudl plaeau - however, this plateau is more likely due to performance restrictions on the nic rather than the network (a wire in this case). However, the results also so that the variation at 'low' socket buffer sizes to the throughput is actually quite large - this is most likely due to the the high throughputs.
Let's take a step back and look at the ping's from the link
ICMP echo/replies Again, there are no results fro the period between 29th and 2nd - which explains the straight lines in the graph. The main thing to notice is that the minimum value of the pings are very constant. A useful statistical value would be the standard deviation of the sample set. Need to implement into the program. It may also be useful to find out the 90th percentile etc. as well. However, even though the minimum rtt is quite constant, the average rtt can flucuate quite highly - with values almost 3 times the min rtt.
The same applies to the other link. One interesting feature of this graph is that the min seems slightly lower in september than in august. Looking more closely:
Yep, it does seem to exist (although we're only comparing it for half the number of samples). For the smaller packet size, there appears to be about 0.2 msec difference, whilst with the larger packets, the difference it less pronounced. Was their a new policy installed during this time? Software changes?
UDP Throughput This graph shows the frequency distribution of the udp data for the graph shown above. It's not a very smooth graph - which suggests that i need more data points to do a good analysis. But the things we can say about this is:
We should investigate into working out a percentile in which we can extrapolate expected throughput. For example, a possibiliy is to give the 10th to the 90th percentile throughputs. We can look at the differences in August and September for the same graph:
Notice that the August values have a better percentile value than the September. This is attributed to poor udp performance on the 3rd September. Looking at the reverse route: We see that unlike the RAL to CERN path, there is not a high frequency of results for the highest throughputs, which suggests that there is something happening that is causing a an unusual distribution of throughputs. Possible suggestions include:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
![]() |
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
![]() |
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2001-2003, Yee-Ting Li, email: ytl@hep.ucl.ac.uk,
Tel: +44 (0) 20 7679 1376, Fax: +44 (0) 20 7679 7145 Room D14, High Energy Particle Physics, Dept. of Physics & Astronomy, UCL, Gower St, London, WC1E 6BT |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |