[p2p-hackers] UDP file transfer link speed identification

David Barrett dbarrett at quinthar.com
Thu Jul 21 23:19:30 UTC 2005


Wes Felter wrote:
> Have you read about DCCP, TFRC, and TCP-like Congestion Control?

These are great references, thank you very much.  Not only are they good 
in isolation, but they helped fill in the cracks in my understanding of 
TCP.  I'll record what I learned here for others who are similarly 
confused.  As I do so, can you clarify I'm understanding it correctly?


In essence, TCP maintains a "congestion window" (cwnd) that dictates how 
many unacknowledged packets can be "in flight".  So for a window size of 
1, the server sends one packet, and waits for an explicit ACK before 
sending the next.  Assuming a 100ms round-trip latency, this means at 
most ten packets will be sent a second.  As the server increases the 
in-flight window size (or as the latency decreases), the server sends at 
a faster pace.

The server uses an algorithm named "Additive Increase, Multiplicative 
Decrease (AIMD)" to regulate cwnd.  Basically, each time an ACK is 
received, cwnd is incremented by some amount.  However each time cwnd is 
to be decreased, it backs off by some set fraction (more on this in a 
bit).  This means it increases cautiously, and backs off aggressively.

Because the server increases cwnd in response to each ACK, and because 
each packet (typically) generates its own ACK, this results in initial 
exponential increase in send rate -- even though cwnd is only 
incrementing linearly.  It took me a bit to wrap my head around why, but 
this quote from RFC 2001 cleared me up:

> The sender starts by transmitting one segment and waiting for its
> ACK.  When that ACK is received, the congestion window is incremented
> from one to two, and two segments can be sent.  When each of those
> two segments is acknowledged, the congestion window is increased to
> four.  This provides an exponential growth, although it is not
> exactly exponential because the receiver may delay its ACKs,
> typically sending one ACK for every two segments that it receives.

Somewhat counterintuitively, this exponential rate increase mode of TCP 
is called "slow start" -- named so because it starts transmitting very 
slowly, though increases quite quickly.

Regardless, at some point it'll start sending at a faster rate than can 
be delivered.  Routers will discard packets they can't forward, and thus 
the sender discovers it's sending too fast when it fails to receive an 
ACK for a given packet.  In other words, packet loss is what notifies 
the sender that it needs to slow down.

So once the sender detects a packet has been dropped, it backs off its 
cwnd by some fraction (TCP suggests that it back off to 50%, but [1] 
recommends only to 85%) before starting to increase again.

Now, after the slow-start approach hit the ceiling and backed off some 
fraction, it could just restart with exponential growth again.  This 
would again hit the ceiling and back off indefinitely, creating the 
up-and-down graph shown in [1], reminiscent of Sisyphus and his boulder.

[1] http://www-lce.eng.cam.ac.uk/~ctk21/scalable/

However, it would be preferable to "level off" at the top send rate and 
just stick with that.  For this reason after the initial slow-start 
approach hits the ceiling, it uses a different algorithm called 
"congestion avoidance".  Thus rather than quickly shooting up to the 
ceiling with exponential growth, it just creeps up using linear growth 
until it experiences packet loss again.  At that point it just keeps 
transmitting at that rate, as graphed in [2].

[2] http://www.csm.ornl.gov/~dunigan/netperf/kelly.html

Now, all this is really simplified, and I recommend the above two 
sources, as well as RFC 2581 for more details.  Furthermore, there are 
variations on how to tweak these algorithms for better performance ([1] 
describes one in particular).  But this should get you started.


Now, with all that out of the way, here's what I don't understand:

(a) After "congestion avoidance" has been started (ie, cwnd has exceeded 
ssthresh), how does it determine when to stop incrementing the cwnd?

I see that [2] clearly shows cwnd (and hence, the send rate) level off 
at some ceiling amount.  And RFC 2581 clearly states "Congestion 
avoidance continues until congestion is detected." (ie, a packet loss 
occured).  But this is easier said than done.

Assuming TCP is a very simplified finite state machine (FSM) that starts 
in "slow start" state and transitions to "congestion avoidance" state 
when a packet loss is detected, does it likewise transition from 
"congestion avoidance" to "hold steady" once a second packet loss is 
detected?  If so, under what circumstances does it transition from "hold 
steady" back to "collision avoidance" or "slow start"?  Furthermore, 
does the cwnd get adjusted at all in "hold steady"?


Thank again Wes for the references, and I hope my summary above is 
reasonably accurate and can help others better understand the topic.

-david



More information about the P2p-hackers mailing list