Network Analysis of CERN <-> RAL Link

September 2002

Goal

To characterise and suggest improvements of the link between CERN and RAL for higher throughput transport using existing software technologies and available solutions for large file transfers.

Method

To look at a lot of graphs of network measurements between CERN and RAL and try to understand why throughput is low. Compare transport results to ICMP pings, traceroutes, understand the differences between the difference between transport protocols and the software that utilises them.

This will not be easy.... :(

Overview

This shows the udp and iperf throughputs from ral to cern for the last two weeks. Notice that the udp throughput is much higher than that of the tcp iperf throughput. We are using relatively high socket buffer sizes for the tcp throughput. Test duration of the the tcp stream is 30 seconds. A previous study showed that this was the optimal value in order to conduct iperf tests between Mancester and CERN sites. See here.

there was a period between the 29th August and the 2nd September in which no tests were performed. This is shown by the linear lines between these regions. The tests were turned off on purpose for maintainance checks.

This figure shows the same link but in the opposing direction, ie from CERN to RAL. Notice also that the tcp throughput is much lower than that of the udp throughput with the same configuration. The results from this link seems to be smoother and less deviant. It is even possible to see the cyclic variation of the iperf transfters in september.

Questions:

O1: Why did we see a change in pattern in August to September? ie how comes we can't see the cyclic variations prior to September.
O2: Why do we seem to get a higher throughput how than we did in September?
O3: Why do we get a much higher throughput for udp than tcp?

Looking at the configurations of the pcs:

	Number of CPU	CPU Vendor	Family	Model	ID Level	CPU Name	Cache	Stepping	Speed (Mhz)	BogoMIPS	FPU	WP	FPU Ex	Coma	F00f	Hlt	Fdiv	OS-Release
CERN	1	GenuineIntel	15	0	2	Intel(R) Pentium(R) 4 CPU 1500MHz	256	7	1495.506	2981.88	1	1	1	0	0	0	0	2.4.16-web100
RAL	1	AuthenticAMD	6	1	1	AMD-K7(tm) Processor	512	2	604.252	1205.86	1	1	1	0	0	0	0	2.2.16-3

Looking at this table - the machines are quite different, not only are they running different cpus altogether, but they also have different OS versions installed.

CPU: Can the RAL machine actually cope with Gig transfers? it's only 600Mhz!

Iperf

During the period we altered the socket buffer sizes in order to see the variation in throughput as a function of socket buffer size.

From this graph, we see that there was a large change in the tcp throughput for large buffer sizes after August. (shown in kbytes). We also see that beyond about 1024kB, there does not seem to be any huge benefit in increasing the socket buffer size during the start of September.

This shows similar results to the previous graph, but no quite as extreme.

Socket Buffer Sizes

2.2 Kernel Parameters

2.4 Kernel Parameters

rmem_default rmem_max wmem_default rmem_max rmem_min rmem_default rmem_max wmem_min wmem_default wmem_max

CERN 63.999 8191.999 63.999 8191.999 4 85.332 8181.999 4 63.994 8191.999

RAL 64 8192 64 8192 - - - - -

As the files for the 2.2 kernel tcp settings are also available in the 2.4 kernel, they are also shown here.

TCP1: When the 2.2 tcp files are edited in a 2.4 setup, what configuration takes precedence?
TCP8: What is the effect of the 2.4 kernel tuning?

Theory suggest that past a linear region for low socket buffer sizes, we should see a plateau where increasing the socket buffer size does not affect the throughput at all. The start of this plateau is the optimal window size for the link for a long term tcp file transfer.

Lets plot the difference between the readings in sept and aug:

Edit: Throughput is actually in mbits/sec - NOT kbit as

End of August

Start of September

RAL to CERN

As expected, the results look messy! The interesting thing here is that the median value for the transfers seem to be less than 100mbit/s irrespective of the sb size. The max, however, goes way beyond the ave and median values.

TCP5: How reliable are the iperf results? ie, does iperf actually report results accurately? if so, what is the accuracy rate?

Pretty normal graph - except for the fact that the maximum transfer rate seems to actually decrease as we get beyond the 2mB threshold. Also the minimum transfer rate is actually quite flat at around 25mBytes.

TCP2: What is causing maximum the throughput to decrease as we are increasing the socket buffer size?
TCP3: Why is the minimum throughput always around the same regardless of the socket buffer size?

CERN to RAL

This graphs shows serious problems, with a huge peak at about 2mbytes socket buffer.

TCP6: Why is there a noticeable peak at 2meg socket buffer?

This graph is quite strange: Even though the median and average plots are exactly as expected, the min and max curves are quite unpredicatable. At least with the RAL to CERN graph, the minimum was always at the same value - possibly indicating major congestion along the link - and that we're hitting that quite frequently. Then again, this variation in the min and max throughputs could just mean that the congestion on the opposing link is quite random. Given enough samples (we're only looking at about 50 for each socket buffer size), we should get roughly the same min and max values.

TCP4: Investigate into the variations of min and max throughputs.

The file for these graphs can be found here. (Excel)

There is a possible answer to TCP2: It was found that the maximum window size settable by the iperf server under linux machines to be just 2mB; this would suggest that for client socket buffer sizes over 2mB, we are actually using a larger socket buffer size. However, TCP states that the maximum sending window be min(cwnd, advertised recv window) - this means that we should get the same throughput for all values over 2mB (as we're limited by the window..)...!!! hmmm....

Anyway,

Theory suggest that at low socket buffer sizes, the throughput is wholly limited by what is available to send. Hence, even though the tcp protocol can allow for higher throughput, there is no data to send and thus, the thoughput is reduced. This should a linear relation between the two variables.

We should also see a good one-to-one relationship betwen the maximum throughput and the statistical ave and or median values as a result of this relation.

	End of August	Start of September
RAL to CERN
CERN to RAL

We can see that for the September transfers, the low socket buffer sizes matches to theory quite well, with very little variation in the throughput values for anything below half a meg. As we increase socket buffer size from 128k, we see a gradual deviation of the maximum throughput to the mediam and ave values.

On the contrary, the August values have relatively high throughputs for low buffer sizes: almost equally that of the 256k socket buffer size for both links.

TCP7: Why are the August maximum values for low socket buffer sizes not linear?

A lan test of the effects of socket buffer size is shown here. The results the experiment back the theory that the results shoudl plaeau - however, this plateau is more likely due to performance restrictions on the nic rather than the network (a wire in this case). However, the results also so that the variation at 'low' socket buffer sizes to the throughput is actually quite large - this is most likely due to the the high throughputs.

TCP9: Formal investigation into the TCP LAN environment required.

Let's take a step back and look at the ping's from the link

ICMP echo/replies

Again, there are no results fro the period between 29th and 2nd - which explains the straight lines in the graph. The main thing to notice is that the minimum value of the pings are very constant. A useful statistical value would be the standard deviation of the sample set. Need to implement into the program. It may also be useful to find out the 90th percentile etc. as well.

However, even though the minimum rtt is quite constant, the average rtt can flucuate quite highly - with values almost 3 times the min rtt.

The same applies to the other link. One interesting feature of this graph is that the min seems slightly lower in september than in august. Looking more closely:

	Packet Size	Singletons	Min	Max	Ave	Median
August	108	241	17.684	18.054	17.914	17.91
September	108	112	17.472	18.315	17.637	17.615

August	1480	241	18.684	19.08	18.952	18.949
Spetember	1480	101	18.524	19.249	18.683	17.615

Yep, it does seem to exist (although we're only comparing it for half the number of samples). For the smaller packet size, there appears to be about 0.2 msec difference, whilst with the larger packets, the difference it less pronounced. Was their a new policy installed during this time? Software changes?

UDP Throughput

This graph shows the frequency distribution of the udp data for the graph shown above. It's not a very smooth graph - which suggests that i need more data points to do a good analysis. But the things we can say about this is:

We get a relatively good throughput distribution as the higher throughputs have high frequencies.
Low throughputs are expected, but most throughputs can be expected to be above 480mbits/sec

We should investigate into working out a percentile in which we can extrapolate expected throughput. For example, a possibiliy is to give the 10th to the 90th percentile throughputs.

We can look at the differences in August and September for the same graph:

Notice that the August values have a better percentile value than the September. This is attributed to poor udp performance on the 3rd September.

Looking at the reverse route:

We see that unlike the RAL to CERN path, there is not a high frequency of results for the highest throughputs, which suggests that there is something happening that is causing a an unusual distribution of throughputs. Possible suggestions include:

Congestion on the network that takes up the remaining amount of bandwidth - this is possible as more people download from CERN than to CERN, so there is bound to be some congestion in the networks.
Errr...???

Wed, 23 July, 2003 13:07

© 2001-2003, Yee-Ting Li, email: ytl@hep.ucl.ac.uk, Tel: +44 (0) 20 7679 1376, Fax: +44 (0) 20 7679 7145
Room D14, High Energy Particle Physics, Dept. of Physics & Astronomy, UCL, Gower St, London, WC1E 6BT