Effect of TxqueueLen on High Bandwidth Delay Product Network (DataTAG)
This investigation is an extension of the previous tests but on a long distance network. More specifically, it is a 120ms link between CERN and STARLIGHT. The spec of the machines are slightly higher, with 2.8Ghz Dual Hyperthreaded Xeons, and connected via Syskonnect cards instead of e1000s.
It was chosen that each test duration would be 240sec as initial tests showed that for longer durations, the throughput suffered.
This graph shows vividly the effects of the txqueuelen on the throughput. We can see that from these results, a txqueuelen of less than about 50,000 simply does not reach anywhere near line rate, and that for the standard txqueuelen size o 100, we don't even reach 100mbit/sec.
More closely, we can see that the cwnd does not open, thus restricting the output rate of the packets from the sender. Surprisingly, for txqueuelens of 10,000 and above, we get an average of about 1 timeout for the duration of each test (4 minutes).
Looking at the number of tcp packets reported as leaving and entering the sender, we see that we send over 1.5e7 packets. If we had a line loss rate of 1e-7, we are likely to physically loose a packet. However, if this were to happen, the FastRetrans mechanism should recover from this.
Looking at the device counters:
There appears to be no loss packets from the recv side.
However, there does appear to be a consistent number of packets being erorred on the rx queue on the sender (Acks). The relative numbers are low (considering we are sending millions of packets), but what is strange is that it is independent of the txqueuelen; therefore also the throughput and the number of packets sent out.
It is possible that these errored packets are a caused during slow start, although further investigation is required.
The effect of all these lost packets are shown above. We can see that we get an increase in the number of fast retrans and the number of packets being fast retransmitted. It is difficult to see from these results, but it appears that the number of fast retrans plateaus off after txq 10,000.
This suggests that the optimal value of txqueuelen for this particular network is over 10,000.
Looking at the number of sendstalls experiences, the optimal value appears to be correct; as we goet no sendstalls after 10,000.
Whilst the number of Congestion Signals remains low, we can see that there is indeed the effect of an increase in the number of OtherReductions with an increased txqueuelen size.
The number of congestionavoids shows similar trends to that of the back to back tests. Although, in this case, we see an increase in the number of congavoids just before the drop.
Similarly, we see the increase in the number of slow starts for the larger txqlen; but we do not see an initial high number of slowstarts for small txqueuelens.
Looking at the calculated rtts, we also see that there is a sharp increase in the monitored rtts as a result of the increase in txqueuelen. The question is whether this is due to the increase number of dupacks, the timeouts, or a combination of both that is the cause of this rtt increase.
|© 2001-2003, Yee-Ting Li, email: firstname.lastname@example.org,
Tel: +44 (0) 20 7679 1376, Fax: +44 (0) 20 7679 7145
Room D14, High Energy Particle Physics, Dept. of Physics & Astronomy, UCL, Gower St, London, WC1E 6BT