Personal Miscellaneous TCP/IP GRID Quality of ServiceMulti-Cast  
Background high-speed tcp Transfer Tests Web 100TCP Tuning  

Implementation of HSTCP

The implementation of HSTCP on a 2.4.16 kernel is relatively straight forward. The first thing was to identify the relevant files. The entire implementation appeared to be contained within a single file net/ipv4/tcp_input.c. The two functions that incorporated TCP slow start and congestion avoidance were found to be tcp_cong_avoid and tcp_cwnd_down.

What we are interested in are:

For a)

1755:  static __inline__ void tcp_cong_avoid( struct tcp_opt *tp )
1756:  {
1757:  	if (tp->snd_cwnd <= tp->snd_ssthresh) {
1758:  		/* In "safe" area, increase. */
1759:  		WEB100_VAR_INC(tp, SlowStart);
1760:  		if (tp->snd_cwnd < tp->snd_cwnd_clamp)
1761:  			tp->snd_cwnd++;
1762:  	} else {
1763:  		/* In dangerous area, increase slowly.
1764:  		 * In theory this is tp->snd_cwnd += 1 / tp->snd_cwnd
1765:  		 */
1766:  		 WEB100_VAR_INC(tp, CongAvoid);
1767:  		 if (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
1768: 			if (tp->snd_cwnd < tp->snd_cwnd_clamp)
1769:  					tp->snd_cwnd++;
1770:  		 	tp->snd_cwnd_cnt=0;
1771:    		} else
1772: 			tp->snd_cwnd_cnt++;
1773:  	}
1774:  	tp->snd_cwnd_stamp = tcp_time_stamp;
1775:  
1776:  }			
      

In this case, we are only interested in lines 1763 and below as lines 1757 to 1762 are related to slow start increments. This function, tcp_cong_avoid is called when there is an incoming ACK and if there is at least as many packets in flight as the size of the congestion window.

The congestion window gives an indication of the number of packets sent out onto the network, however, at any given time, we can't have more packets send out than what is in the congestion window. But the only way we can know how much information is in the network is by ACKS from the reciever, which updates the size of the congestion window. This implies that for any given instance, we may actually have more data in the network that what our congestion window currently allows.

So when this happens, we keep a variable, snd_cwnd_cnt to determine the total number of ACKed packets in the congestion window since the last update. If snd_cwnd_cnt is 0, it means that we have updated the congestion window and there has been no ACKs since the update. When we get an ACK, we increment snd_cwnd_cnt.

Therefore, lines 1767 to 1770 does a count comparing the snd_cwnd_cnt so that it takes congestion window many packets to increment the congestion window (in effect, each ACK packet increases the snd_cwnd by 1/snd_cwnd).

If the condition on line 1767 is not met, we simply increase snd_cwnd_cnt for the next round.

snd_cwnd_cnt could be equal to or greater than snd_cwnd is there are any duplicate ACKd (due to loss), which upon receive of the duplicate ACKs would mean that the snd_cwnd is halved, leaving snd_cwnd_cnt to be greater than snd_cwnd.

snd_cwnd_clamp: defines the maximum cwnd_size as set by tp->snd_cwnd_clamp = ~0 which is 16bits 65535 bytes. - which must be before the window scaling.

In order to add Sally Floyd's increase parameter such that w <- w + a/w, where a->a(w), we can do two possible things:

  1. Change the loop on line 1767 such that we add in a factor of a to the comparison, or
  2. Change the increment on line 1772 such that we add in a factor of a rather than the constant

Our initial investigation will do the 1st possibility. By multiplying the value of tcp_cwnd_cnt by the value of a for the current cwnd, then we in effect increase the freqeuncy in which the update occurs.

The alternative possibility would be to actually increase the tp->snd_cwnd by the a(cwnd) value in line 1769 while maintaining the same if statement.

A study into the effectiveness of each proceedure will be investigated

for b)

1340:  static void tcp_cwnd_down(struct tcp_opt *tp)
1341:  {
1342:  	
1343:  	int decr = tp->snd_cwnd_cnt + 1;
1344:  	
1345:  	tp->snd_cwnd_cnt = decr&1;
1346:  	decr >>= 1;
1347:  	
1348:  	if (decr && tp->snd_cwnd > tp->snd_ssthresh/2)
1349:  		tp->snd_cwnd -= decr;
1350:  		
1351:  	tp->snd_cwnd = min( tp->snd_cwnd, tcp_packets_in_flight(tp)+1);
1352:  	tp->snd_cwnd_stamp = tcp_time_stamp;
1353:  		
1354:  }

 

Line 1343 does a

Line 1345 is conducting a bitwise AND operator. The following function uses the bitwise AND operator to determine whether the number is odd or even. It returns a true value if decimalNumber is even, and a false value if it is odd:

function checkEven(decimalNumber) {
	return (decimalNumber & 1 == 0)
}

Hence, we are determine whether snd_cwnd_cnt is odd or even.

Line 1346 shows a bitshift to halve the value of decr. This value is then used in line 1348 and 1349 to set the snd_cwnd by halve. Note that we only do this if decr!=0, and if ( snd_cwnd > ssthresh/2 ).

Given that we have had to decrease the cwnd, line 1351 sets the cwnd size to be the min of (snd_cwnd, tcp_packets_in_flight(tp)+1)

 

Changes

Requirements

For the implemention into the code, we introduced a function to return the values of a(w) and b(w) given a value of cwnd. The kernel stores the value of cwnd in segments, so we do not need to alter the values of cwnd from the original Sally Floyd table.

In order to save memory space, we decided to store the b value as an 8bit integer. Therefore, we need to do a bit shift of 8 to divide by 256.

We define the table relating cwnd, a and b. A script to create these values can be found here (note the precision of the values matters in the creation of the values).

I also created a script to return suitable values of a(cwnd) and b(cwnd) given the intial parameters defining the windows, losses and the B value. here.

struct hstcp_entry hstcp_table[] = {
{1, 1, 128},
{38, 1, 128},
{118, 2, 112},
{221, 3, 104},
{347, 4, 98},
{495, 5, 93},
{663, 6, 89},
{851, 7, 86},
{1058, 8, 83},
{1284, 9, 81},
{1529, 10, 78},
{1793, 11, 76},
{2076, 12, 74},
{2378, 13, 72},
{2699, 14, 71},
{3039, 15, 69},
{3399, 16, 68},
{3778, 17, 66},
{4177, 18, 65},
{4596, 19, 64},
{5036, 20, 62},
{5497, 21, 61},
{5979, 22, 60},
{6483, 23, 59},
{7009, 24, 58},
{7558, 25, 57},
{8130, 26, 56},
{8726, 27, 55},
{9346, 28, 54},
{9991, 29, 53},
{10661, 30, 52},
{11358, 31, 52},
{12082, 32, 51},
{12834, 33, 50},
{13614, 34, 49},
{14424, 35, 48},
{15265, 36, 48},
{16137, 37, 47},
{17042, 38, 46},
{17981, 39, 45},
{18955, 40, 45},
{19965, 41, 44},
{21013, 42, 43},
{22101, 43, 43},
{23230, 44, 42},
{24402, 45, 41},
{25618, 46, 41},
{26881, 47, 40},
{28193, 48, 39},
{29557, 49, 39},
{30975, 50, 38},
{32450, 51, 38},
{33986, 52, 37},
{35586, 53, 36},
{37253, 54, 36},
{38992, 55, 35},
{40808, 56, 35},
{42707, 57, 34},
{44694, 58, 33},
{46776, 59, 33},
{48961, 60, 32},
{51258, 61, 32},
{53677, 62, 31},
{56230, 63, 30},
{58932, 64, 30},
{61799, 65, 29},
{64851, 66, 28},
{68113, 67, 28},
{71617, 68, 27},
{75401, 69, 26},
{79517, 70, 26},
{84035, 71, 25},
{89053, 72, 24},
{94717, 73, 23},
};
static short hstcp_max_entry=73;
	  

Of course, this requires that the header file needs to also be altered. The file in question is include/net/tcp.h. We simply add:

 struct hstcp_entry {
	__u32			cwnd;
 	__u8			a_val;
 	__u8			b_val;
 };
	  

 

We also need to define the method that actually returns the values of a and b given a cwnd value. We decided to operate a boolean search for the cwnd value to make the lookup faster. We decided not to create an index by a (even though it's linearly increasing) as the parameter space of HSTCP may not necessary create such increments in a.

The benefit of doing this as a function is that we can quite easily change the function to, for example calculate the a and b values on the fly, or more likely to grab results from either directly from the /proc, or alternatively create a lookup table on the fly upon changes in the /proc file system for the 5 HSTCP variables.

A script to demonstrate the this binary search given a file here containing the list of cwnd, a and b values here.


 struct hstcp_entry get_hstcp_val(__u32 this_cwnd) {
 	short left,right,mid;
 	left = 0;
 	right = hstcp_max_entry;
 	while (right-left) {
 		mid = (left + right)>>1;
 		if (hstcp_table[mid].cwnd < this_cwnd) {
 			left = mid + 1;
 		} else {
 			right = mid;
 		}
 	}
 	return hstcp_table[right];
 }
 	  

for a)

For this instance, we want to change the frequency of updates from 1 to a. We can do this by:

if ((tp->snd_cwnd_cnt * get_hstcp_val(tp->snd_cwnd).a_val) >= tp->snd_cwnd) {

Instead of using...

if (tp->snd_cwnd_cnt >= tp->snd_cwnd) {

 

for b)

All we want to do is to change the decrement value to equal the relevant lookup for the cwnd:

decr = (int)((decr * get_hstcp_val(tp->snd_cwnd).b_val)>>8);

instead of using

  	decr >>= 1;
	  

Err.. that's it! We also added a configuration to enable or disable HSTCP.

Files

We created a patch file for stock 2.4.16 kernels. here.

As we need indepth analysis of now the cwnd and functions, we also required that we used web100. So we also created a patch file for web100 2.4.16 files. here.

 

Thu, 7 August, 2003 3:23 Previous PageNext Page

 
 
    email me!
2001-2003, Yee-Ting Li, email: ytl@hep.ucl.ac.uk, Tel: +44 (0) 20 7679 1376, Fax: +44 (0) 20 7679 7145
Room D14, High Energy Particle Physics, Dept. of Physics & Astronomy, UCL, Gower St, London, WC1E 6BT