UCL
 

Personal Miscellaneous TCP/IP GRID Quality of Service Multi-Cast

 

lan wan

 

Network Analysis of CERN <-> RAL Link

September 2002

Goal

To characterise and suggest improvements of the link between CERN and RAL for higher throughput transport using existing software technologies and available solutions for large file transfers.

 

Method

To look at a lot of graphs of network measurements between CERN and RAL and try to understand why throughput is low. Compare transport results to ICMP pings, traceroutes, understand the differences between the difference between transport protocols and the software that utilises them.

This will not be easy.... :(

 

Overview

This shows the udp and iperf throughputs from ral to cern for the last two weeks. Notice that the udp throughput is much higher than that of the tcp iperf throughput. We are using relatively high socket buffer sizes for the tcp throughput. Test duration of the the tcp stream is 30 seconds. A previous study showed that this was the optimal value in order to conduct iperf tests between Mancester and CERN sites. See here.

there was a period between the 29th August and the 2nd September in which no tests were performed. This is shown by the linear lines between these regions. The tests were turned off on purpose for maintainance checks.

This figure shows the same link but in the opposing direction, ie from CERN to RAL. Notice also that the tcp throughput is much lower than that of the udp throughput with the same configuration. The results from this link seems to be smoother and less deviant. It is even possible to see the cyclic variation of the iperf transfters in september.

Questions:

  • O1: Why did we see a change in pattern in August to September? ie how comes we can't see the cyclic variations prior to September.
  • O2: Why do we seem to get a higher throughput how than we did in September?
  • O3: Why do we get a much higher throughput for udp than tcp?

Looking at the configurations of the pcs:

  Number of CPU CPU Vendor Family Model ID Level CPU Name Cache Stepping Speed (Mhz) BogoMIPS FPU WP FPU Ex Coma F00f Hlt Fdiv OS-Release
CERN 1 GenuineIntel 15 0 2 Intel(R) Pentium(R) 4 CPU 1500MHz 256 7 1495.506 2981.88 1 1 1 0 0 0 0 2.4.16-web100
RAL 1 AuthenticAMD 6 1 1 AMD-K7(tm) Processor 512 2 604.252 1205.86 1 1 1 0 0 0 0 2.2.16-3

Looking at this table - the machines are quite different, not only are they running different cpus altogether, but they also have different OS versions installed.

  • CPU: Can the RAL machine actually cope with Gig transfers? it's only 600Mhz!

 

Iperf

During the period we altered the socket buffer sizes in order to see the variation in throughput as a function of socket buffer size.

From this graph, we see that there was a large change in the tcp throughput for large buffer sizes after August. (shown in kbytes). We also see that beyond about 1024kB, there does not seem to be any huge benefit in increasing the socket buffer size during the start of September.

This shows similar results to the previous graph, but no quite as extreme.

Socket Buffer Sizes
 
2.2 Kernel Parameters
2.4 Kernel Parameters
  rmem_default rmem_max wmem_default rmem_max rmem_min rmem_default rmem_max wmem_min wmem_default wmem_max
CERN 63.999 8191.999 63.999 8191.999 4 85.332 8181.999 4 63.994 8191.999
RAL 64 8192 64 8192 - - - - -  

As the files for the 2.2 kernel tcp settings are also available in the 2.4 kernel, they are also shown here.

  • TCP1: When the 2.2 tcp files are edited in a 2.4 setup, what configuration takes precedence?
  • TCP8: What is the effect of the 2.4 kernel tuning?

Theory suggest that past a linear region for low socket buffer sizes, we should see a plateau where increasing the socket buffer size does not affect the throughput at all. The start of this plateau is the optimal window size for the link for a long term tcp file transfer.

Lets plot the difference between the readings in sept and aug:

Edit: Throughput is actually in mbits/sec - NOT kbit as

  End of August Start of September
RAL to CERN

As expected, the results look messy! The interesting thing here is that the median value for the transfers seem to be less than 100mbit/s irrespective of the sb size. The max, however, goes way beyond the ave and median values.

  • TCP5: How reliable are the iperf results? ie, does iperf actually report results accurately? if so, what is the accuracy rate?

Pretty normal graph - except for the fact that the maximum transfer rate seems to actually decrease as we get beyond the 2mB threshold. Also the minimum transfer rate is actually quite flat at around 25mBytes.

  • TCP2: What is causing maximum the throughput to decrease as we are increasing the socket buffer size?
  • TCP3: Why is the minimum throughput always around the same regardless of the socket buffer size?
CERN to RAL

This graphs shows serious problems, with a huge peak at about 2mbytes socket buffer.

  • TCP6: Why is there a noticeable peak at 2meg socket buffer?

 

This graph is quite strange: Even though the median and average plots are exactly as expected, the min and max curves are quite unpredicatable. At least with the RAL to CERN graph, the minimum was always at the same value - possibly indicating major congestion along the link - and that we're hitting that quite frequently. Then again, this variation in the min and max throughputs could just mean that the congestion on the opposing link is quite random. Given enough samples (we're only looking at about 50 for each socket buffer size), we should get roughly the same min and max values.

  • TCP4: Investigate into the variations of min and max throughputs.

The file for these graphs can be found here. (Excel)

There is a possible answer to TCP2: It was found that the maximum window size settable by the iperf server under linux machines to be just 2mB; this would suggest that for client socket buffer sizes over 2mB, we are actually using a larger socket buffer size. However, TCP states that the maximum sending window be min(cwnd, advertised recv window) - this means that we should get the same throughput for all values over 2mB (as we're limited by the window..)...!!! hmmm....

Anyway,

Theory suggest that at low socket buffer sizes, the throughput is wholly limited by what is available to send. Hence, even though the tcp protocol can allow for higher throughput, there is no data to send and thus, the thoughput is reduced. This should a linear relation between the two variables.

We should also see a good one-to-one relationship betwen the maximum throughput and the statistical ave and or median values as a result of this relation.

  End of August Start of September
RAL to CERN

CERN to RAL

 

We can see that for the September transfers, the low socket buffer sizes matches to theory quite well, with very little variation in the throughput values for anything below half a meg. As we increase socket buffer size from 128k, we see a gradual deviation of the maximum throughput to the mediam and ave values.

On the contrary, the August values have relatively high throughputs for low buffer sizes: almost equally that of the 256k socket buffer size for both links.

  • TCP7: Why are the August maximum values for low socket buffer sizes not linear?

A lan test of the effects of socket buffer size is shown here. The results the experiment back the theory that the results shoudl plaeau - however, this plateau is more likely due to performance restrictions on the nic rather than the network (a wire in this case). However, the results also so that the variation at 'low' socket buffer sizes to the throughput is actually quite large - this is most likely due to the the high throughputs.

  • TCP9: Formal investigation into the TCP LAN environment required.

Let's take a step back and look at the ping's from the link

 

ICMP echo/replies

Again, there are no results fro the period between 29th and 2nd - which explains the straight lines in the graph. The main thing to notice is that the minimum value of the pings are very constant. A useful statistical value would be the standard deviation of the sample set. Need to implement into the program. It may also be useful to find out the 90th percentile etc. as well.

However, even though the minimum rtt is quite constant, the average rtt can flucuate quite highly - with values almost 3 times the min rtt.

 

The same applies to the other link. One interesting feature of this graph is that the min seems slightly lower in september than in august. Looking more closely:

  Packet Size Singletons Min Max Ave Median
August 108 241 17.684 18.054 17.914 17.91
September 108 112 17.472 18.315 17.637 17.615
             
August 1480 241 18.684 19.08 18.952 18.949
Spetember 1480 101 18.524 19.249 18.683 17.615

Yep, it does seem to exist (although we're only comparing it for half the number of samples). For the smaller packet size, there appears to be about 0.2 msec difference, whilst with the larger packets, the difference it less pronounced. Was their a new policy installed during this time? Software changes?

 

UDP Throughput

This graph shows the frequency distribution of the udp data for the graph shown above. It's not a very smooth graph - which suggests that i need more data points to do a good analysis. But the things we can say about this is:

  1. We get a relatively good throughput distribution as the higher throughputs have high frequencies.
  2. Low throughputs are expected, but most throughputs can be expected to be above 480mbits/sec

We should investigate into working out a percentile in which we can extrapolate expected throughput. For example, a possibiliy is to give the 10th to the 90th percentile throughputs.

We can look at the differences in August and September for the same graph:

Notice that the August values have a better percentile value than the September. This is attributed to poor udp performance on the 3rd September.

Looking at the reverse route:

We see that unlike the RAL to CERN path, there is not a high frequency of results for the highest throughputs, which suggests that there is something happening that is causing a an unusual distribution of throughputs. Possible suggestions include:

  1. Congestion on the network that takes up the remaining amount of bandwidth - this is possible as more people download from CERN than to CERN, so there is bound to be some congestion in the networks.
  2. Errr...???

 

 

 

 

 

 

 

 

 

Wed, 23 July, 2003 13:07 Previous PageNext Page

 
 
    email me!
© 2001-2003, Yee-Ting Li, email: ytl@hep.ucl.ac.uk, Tel: +44 (0) 20 7679 1376, Fax: +44 (0) 20 7679 7145
Room D14, High Energy Particle Physics, Dept. of Physics & Astronomy, UCL, Gower St, London, WC1E 6BT