Can TCP use bandwidth available in long latency High bandwidth Paths?

20020124 Bologna
Antony Antony <Antony@nikhef.nl>

How end hosts can utilize (all) bandwidth fairly.

Paths with round trip time ~100msec or more bandwidth > 100 Mb

Current approaches, higher socket size, parallel streams.

Available Capacity on bottle neck link. 110Mbit (MRTG 5minAvg)
http://m-s-wh-mrtg.fnal.gov/r-x-esnet/r-x-esnet.port5-0-0.html

NIKHEF -> FNAL one stream 4-12Mb Using iperf, 2-4MBytes
Linearly increases with number of streams upto 70-80Mbit

A tuned stack can get about 4-8Mb per flow using iperf.

|----------------------------------110msec 110Mbit----------------------------------------------|
NIKHEF--------------------------+--SURFNet------+----ESNet-----+------FNAL
keeshond.nikhef.nl ------1G--+---622M----------+------155-------+-----d0test.fnal.gov

2CPU 2GB RAM
SK943 GigE 64bit/66MHz
Linux 2.4.9 with Web100 and
Locally modified TCP

SGI 16CPU, 8GB RAM
GigE Card
IRIX 6.5 can modify Socket size

Current Work @NIKHEF/UvA:

Optimizing Sender Side TCP Stack on Linux 2.4.9 Web100 enabled Kernel.

Started toy with these idea since Oct 2001.
Initial code taken from Sylvain Ravot.

Avoid Slow Start.
Increase CWnd faster than one packet/RTT after congestion Avoidance
Artificial Max limit CWnd to avoid .

Avoid cases where CWnd is reduced drastically.

Get upto 50-55Mb per flow and two flows could get upto 100-110Mb

Advantage:
More Effective utilization of Bandwidth.

Disadvantage

Need to estimate the available bandwidth prior to starting secession.

Conventional wisdom : May be unfriendly to other TCP flows ?
We don't see this in our experiments. 2 Hosts with modified Kernel
and unmodified kernel fairly Share the bandwidth.

UvA - Alaska : 160Mbits Max (Limited by socket sizes 4Mbyte) 198msec RTT
     Linux (2.2.x)       - Linux(2.2.x)   ~ 40 drop after Congestion Avoidance
     SUN Solaris       -   Linux(2.2.x)   ~160 drops down to 40Mbit after congestion
     Keeshond(2.4.9) - Linux(2.2.x)   ~160 Consistent

Alaska - UvA
     Linux(2.2.x)         - SUN Solaris ~160Mb
                  Duration 20 Sec
                  Slow start
                  drops to ~ 40Mb after first packet loss.
                  Out of order packets             501/114702 0.44%
                  Retransmitted packet           103/114702 0.08%

     Linux(2.2.x)         - Linux(2.{2,4}) 40 - 60 Mbit
                 Durtation 96 Sec
                    No slow start.
                 Rapid increase of OWind (estimated CWnd) bring to congestion avoidance
                 Out of order pkts:                    296/120977   0.24
                 Retransmitted packet            530/120977   0.44%

Observation: receiver OS does make a difference ?

Future Work:

Make current modifications into a kernel module.

Use modified TCP algorithms only for specific destination
(initially to be implemented via proc interface)

Need More testing sites with 100msec RTT , bandwidth 200Mbit - 900Mbit

Monitor with in presence of artifically injected large number(100s) of small(1Mb) flows

Volunteers to run iperf/gridftp , run experimental kernel code ?

Other Solutions ECN! Linux has experimental support,
Routers don't support now ( May Need HW and SW upgrade for routers with high Packets per Sec)

Conclusion:

It is observed that TCP can't use high Bandwidth available in long latency high bandwidth paths with background Traffic
due artifacts congestion avoidence methods.

Congestion avoidence algorithms should be considerate to high throughput Flows.