Yee's Homepage | TCP/IP

TCP Variants

The evolution of the TCP transport protocols has changed slowly but progressively since its original incarnation in 1988. This page describes the major implementations, describing the algorithms and differences between each incarnation.

TCP Tahoe

Tahoe by Jacobson [Jac88a] assumed that congestion signals are represented by lost segments. It was assumed by Jacobson that losses due to packet corruption are much less probable than losses due to buffer overflows on the network. Therefore, on a loss, the sender should lower its share of the bandwidth. This is done by reducing its cwnd to half of the size at which the loss was found.

The reasoning behind this value of a half is that the decrease in throughput should be equal to the multiplicative increase of queue length in the network upon congestion.

The implementation of this multiplicative decrease is through the use of a tcp variable called ssthresh. Upon a loss, half of the value of cwnd just before the loss is recorded in ssthresh. The connection then resorts back to slow start by setting cwnd to 2 segments. Slow start grows the cwnd exponentially until it reaches ssthresh from which it will do congavoid until the same thing happens again until the connection is terminated.

In order to determine that a packet is lost, we must time the delay of the packet; from the sender putting into the network and the time at which we receive the ack for that packet. This value is known as the round trip time (RTT). From this value (and the aggregation of timed pairs), we can use a Retransmission Time-Out (RTO). Jacobson devised a more accurate way of estimating this value.

If an ack is not received before this RTO, then the sender should be confident that the packet is lost and should therefore resend the segment to enable reliable delivery and movement of the window. The method used in the Jacobson paper is based on the variation of the average RTT.

Another way of detecting a loss in TCP Tahoe is through the use of duplicate acknowledgments (dupacks). Dupacks are similar to normal acks in the sense they both acknowledge the packets sent out by the sender has reached the receiver (and has therefore left the network). Both types of acks also acknowledges all the previous packets. As such, the ack indicates to the sender the next expected packet that the receiver is expecting to receive

An example of this that if we send packets 1 and packets 2 out in that order; and we get an ack for packet 2, then it automatically acknowledges packet 1. More specifically, dupacks are acknowledgements for the same sequence number

The difference between a dupack and a normal ack is that while a normal ack acknowledges one or more previously unacknowledged packets, a dupack re-acknowledges the same packet as the previous acknowledgement. Dupacks are generated in response to packets arriving at the sink out of order.

An example of this is if a packet is lost, but all packets after the lost packets were received; given that packet n was lost, but due to the way the window grow means that we could send out many more packets. Therefore, we also send out packet number n+1, n+2 and n+3. Because we have lost packet n, there is no way of the receiver sending us an ack for that packet. As such, when packet n+1 arrives at the sender, it tells the sender that it is still expecting packet number n (by sending an ack with that number). The receiver then receives packet n+2, but it still hasn't received packet n. So it sends another ack saying that it is still expecting packet n. This acknowledgement is known as a dupack as it's a duplicate acknowledgement in the sense that it doesn't tell the sender any new information (except that the receiver still hasn't got packet n).

Conversely, if packet n wasn't lost but just badly reordered in the network, then as soon as packet n arrives at the receiver, it will send an ack to the sender indicating the next expected packet. This number does not have to be n+1 as the receiver may already have it in its buffer.

Therefore, if the sender assumes that the receipt of a number of dupacks actually indicate loss and not heavy reordering then we can quickly retransmit the (derived) lost segment without having to wait for the RTO to expire before resending the packet. This is important as often the RTO is relatively quite long and can prevent the window from movement and thus stall the tcp transfer. The segment that should be retransmitted is the segment number as specified in the dupacks.

Typically, Tahoe waits for 3 dupacks before inferring this loss and hence immediately resends the packet. This value is actually the tcprexmthresh variable.

This mechanism is called Fast Retransmit which is followed by the normal actions that occur should we have received a congestion signal. In the case of Tahoe, ssthresh is set to half of the cwnd value and then cwnd is set to 1/2 segments. This forces TCP to enter slow start again.

TCP Tahoe does not deal well with multiple packet drops within a single window of data.

TCP Reno

TCP Reno introduced major improvements over Tahoe by changing the way in which it reacts to detecting a loss through duplicate acknowledgements. The idea is that the only way for a loss to be detected via a timeout and not via the receipt of a dupack is when the flow of packets and acks has completely stopped - This would be an indication of heavy congestion.

Should the sender still be receiving acks from the receiver, then the sender should not fall back into slow start. This is the case when a loss is signaled by the receipt of x dupacks rather than a timeout. The congestion experienced is not heavy, so the sender can keep on sending since the flow still exists (as the equilibrium has not been completely destroyed). But, the sender should be sending with less vigour to utilise less resources.

As such, Reno introduces a mechanism called Fast Recovery [RFC 2001] which should be activated after a Fast Retransmit. Fast Retransmit will cause the sender to retransmit packet y+1 after receiving n dupacks. However, under Reno, it will not fall back to Slow Start, but instead, it should take advantage of the fact that the flow that currently exists should keep on sending, but using less resources.

This is implemented by first halving the size of its cwnd and storing this value in ssthresh. We then set cwnd to ssthresh plus 3*MSS (where 3 is the number of dupacks required to trigger fast retransmit). This will include only up to half of the number of segments just before the Fast Recovery. The size of the cwnd at this time is W. But the shrinking means that we have actually already sent out all the packets that are contained within the window. As such, there is no way of sending out new packets unless the window is forced to include the new ones. As the receipt of dupacks under Tahoe does not move this window, then the only way to shift the window is to inflate the size of the cwnd upon each dupack.

Therefore, upon the receipt of each dupack, the cwnd is inflated by one segment as each dupack signals the fact that another packet has left the network. After receiving W/2 dupacks, the window will be ready to include new packets.

When all the dupacks have been used up, the next ack should be the ack caused by the retransmission and therefore should ack all packets. Upon the receipt of this normal ack, the sender should exit Fast Recovery and set its cwnd to ssthresh (W/2). This value of W is still the size of the cwnd just before Fast Recovery, and the value W/2 is stored within the ssthresh variable. This means that the window contains segment number y+W to ( y+W + W/2 + 1 ). All of the packets, except for the last one has already been sent out due to the receipt of the dupacks. Therefore, only on packet needs to be sent.

By using Fast Recovery, the sender uses a cwnd that is half the size of the cwnd just before the 'loss'. As such, this forces the Reno TCP to send less packets out until it knows that it's okay to send more. Therefore, it has indeed reduced it utilisation of the network.

Although better than Tahoe TCP at dealing with single packet loss, Reno TCP is not much better when multiple packets are lost within a window of data.

[Jac88b, Ste94].

Fast recovery ensures that pipe does not become empty

Therefore, we only slow start when a packet is timed out. This is implemented by setting ssthresh to half the current cwnd size and then setting cwnd to 1 segment. This will cause the tcp connection to slow start up to ssthresh and then congavoid - as with the normal Tahoe implementation.

Motivation for improving Reno

In the Internet, packets are often transmitted in bursts (bursty nature of tcp etc) [REF?]. As a result, losses also often happen in bursts. This is primarily due to FIFO (drop tail) queues in routers.

The fundamental problem is that Fast Retransmit assumes that only one segment was lost. This can result in loss of ack clocking and timeouts if more than one segment is lost.

Reno encounters several problems with multiple packet losses in a window of data (usually in the order of half a window). This usually happens when invoking Fast Retransmit and Fast Recovery:

It invokes them several times in succession leading to multiplicative decreases of cwnd and ssthresh - impacting the throughput of the connection.

ACK starvation: this happens due to the ambiguity of duplicate acks and due to the dynamics of cwnd [jhoe]. The sender reduces cwnd when it enters Fast Retransmit, it receives dupacks that inflate cwnd so that it sends new packets until it fills its sending window. It receives a non dupack and exits Fast Recovery.

However, due to multiple losses in the past, this ack will be followed by 3 dupacks signaling that another segment was lost so Fast Retransmit is entered again after another reducing in ssthresh and cwnd. As this happens several times in succession, the left edge of the sending window advances only after each successive Fast Retransmit and the amount of data in flight (sent but not yet acked) eventually becomes more than the congestion window (halved by the latest invocation of Fast Retransmit). As there are no more acks to receive then the sender stalls and recovers from this dead-lock only through a timeout - which causes a slow start.

Two solutions are available: SACK and New Reno.

TCP New Reno

New Reno modifies the Fast Retransmit and Fast Recovery. These modifications are intended to fix the Reno problems above and are wholly implemented in the sender side.

A modification of Reno lead to New-Reno TCP which shows that Reno can be improved without the addition of SACKs but still suffers without it. Here, the wait for a retransmit timer is eliminated when multiple packets are lost from a window.

New Reno is the same as Reno but with more intelligence during fast recovery. It utilises the idea of partial acks: when there are multiple packet drops, the acks for the retransmitted packet will acknowledge some, but not all the segments send before the Fast Retransmit.

In TCP Reno, the first partial ACK will bring the sender out of the fast recovery phase. This will result in the requirement of timeouts when there are multiple losses in a window, and thus stalling the tcp connection.
In New Reno, a partial ack is taken as an indication of another lost packet and as such the sender retransmits the first unacknowledged packet. Unlike Reno, partial acks don't take New Reno out of Fast Recovery. This way, it retransmits one packet per RTT until all the lost packets are retransmitted and avoids requiring multiple fast retransmits from a single window of data.

The downside of this is that it may take many RTT's to recover from a loss episode, and you must have enough new data around to keep the ack clock running.

This is implemented as follows:

Multiple packet loss

A fix for Fast Recovery to prevent starting Fast Retransmit and Fast Recovery in succession when multiple segments are dropped in the same window.

When entering Fast Retransmit (from 3 dupacks), ave the highest sequence number sent so far. Perform retransmission and the Fast Recovery algorithm as usual (set ssthresh, inflating cwnd on dupacks). When a new ack arrives, perform the addition check if the ack covers the highest sequence number sent when Fast Retransmit was invoked. If not, this ack is a partial ack and signals that another segment was lost from the same window of data. As such, retransmit the segment reported as expected by the partial ack, reset the retransmission timer but do not exit Fast Recovery. On the other hand, if the new ack covers the highest sequence number sent and then exit Fast Recovery but setting cwnd to ssthresh and performing congavoid.

False Fast Retransmits

Record the highest sequence number ever transmitted after a retransmission timeout (normally set to 0). When ever we get 3 dupacks, we performa a test to see if we should enter Fast Retransmit. If these acks cover the sequence number saved at the previous timeout, then this is a new invocation of the Fast Retransmit. In this case, enter Fast REtransmit and perform the related actions. If they do not cover the sequence numbers (ie they ack segments sent previous to the timeout) then just acknowledge the receipt of already queued segments at the receiver. In this case, do not enter Fast Retransmit.

Sender comes out of fast recovery only after all outstanding packets (at the time of first loss) are acked.

Wed, 23 July, 2003 13:07

© 2001-2003, Yee-Ting Li, email: ytl@hep.ucl.ac.uk, Tel: +44 (0) 20 7679 1376, Fax: +44 (0) 20 7679 7145
Room D14, High Energy Particle Physics, Dept. of Physics & Astronomy, UCL, Gower St, London, WC1E 6BT