[p2p-hackers] MTU in the real world

Serguei Osokine Serguei.Osokine at efi.com
Tue May 31 21:57:50 UTC 2005


On Tuesday, May 31, 2005 David Barrett wrote:
> Do you know of any similar attempts of using big MTUs over a standard
> consumer internet connection?

	Sorry, no. Typically people try to avoid the complications and
ineffectiveness related to the packet fragmentation and reassembly, so
they mostly try to avoid it. I'd even venture a guess that more often
than not people who do use big packets over a standard Internet have 
no idea that they are doing that (otherwise they'd try to avoid doing 
so) and thus cannot tell any tales about it :-)

	Best wishes -
	S.Osokine.
	31 May 2005.


-----Original Message-----
From: p2p-hackers-bounces at zgp.org [mailto:p2p-hackers-bounces at zgp.org]On
Behalf Of David Barrett
Sent: Tuesday, May 31, 2005 2:46 PM
To: Peer-to-peer development.
Subject: Re: [p2p-hackers] MTU in the real world


Ah, thanks -- this is precisely the kind of story I'm looking to hear.  
But I agree its conditions are a bit unusual.  Do you know of any 
similar attempts of using big MTUs over a standard consumer internet 
connection?

-david

On Tue, 31 May 2005 10:40 am, Serguei Osokine wrote:
> On Tuesday, May 31, 2005 David Barrett wrote:
>>  With this in mind, have you tried using a MTU bigger than 1500 bytes
>>  and been bitten by it?
>
> 	Yes. That was not your typical everyday situation, but I think
> some on this list might find it entertaining anyway:
>
> 	We tried to use UDP to transfer stuff over a gigabit LAN inside
> the cluster. Pretty soon we discovered that with small (~1500 byte)
> packets the CPU was the bottleneck, because you can send only so many
> packets per second, and the resulting throughput was nowhere close to
> a gigabit. (You have to send almost 100K such packets a second to
> achieve a gigabit throughput, and we were doing several times less
> on our 2-CPU 2.4GHz Win XP boxes.)
>
> 	So then we tried to increase the UDP datagram size. The gigabit
> switch did not support jumbo frames, by the way, so we were fragmenting
> as soon as we exceeded 1500. The throughput went up, and was pretty
> decent with 64-KB datgrams (don't remember the exact numbers, but it
> was close to a gigabit and generally everything was peachy).
>
> 	Which is when the funny things started to happen. In the middle
> of a test, the communication channel would just shut down and nothing
> would be delivered over it for a minute or two (though both the sender
> and the receiver kept looking fine and no errors were returned by the
> socket calls - sender was sending data, but the receiver recfrom()
> call was not getting it); after that pause the channel would wake up
> as if nothing happened (except for several gigabytes of lost data),
> work normally for a few minutes, after which this shutdown would be
> repeated, and so on.
>
> 	Took us a while to figure out what was going on, but here is the
> scoop: the gigabit LAN had a fairly small, but nonetheless non-zero
> packet loss rate. When one 1500-byte frame from a 64-KB datgram is
> lost, the rest of the datagram frames (all 62 KB)have to be buffered
> somewhere in case the missing frame arrives and the datagram can be
> fully reassembled. This arrival will never happen, but the socket
> layer does not know that, so it has to keep the partial datagram for
> a while, discarding all its frames if the missing frame won't arrive
> before some timeout (RFC 1122 recommends this timout value to be
> between 60 and 120 seconds, and this seems to be in line with what
> we saw).
>
> 	Now, the gigabit link sends quite a lot of data - 100MB+ per
> second, to be precise. Even with 0.01% loss rate, you're losing about
> 10,000 bytes per second. This is no big deal, but every 1500 bytes lost
> cause you to store 62KBs of partial datagrams, so with the loss rate
> above you have to store 400 KB of new data every second. If this data
> expires in 120 seconds, you need about 50 MB for the partial datagram
> storage in the socket layer - and proportionally more if your data loss
> rate is higher than 0.01%. And this amount of memory is something that
> the socket layer in Win XP simply does not have. So as soon as it runs
> out of memory for the partially assembled datagrams, it stops the data
> delivery and waits for the memory to be released. Apparently after it
> gets enough free memory, it switches the data delivery back on again.
>
> 	This approach does seem funny, and I don't see any compelling
> reason for the socket layer to handle that situation in this "trigger"
> fashion - either it works normally, or shuts down the data delivery
> completely. Might have handled this a bit more gracefully, I'd think.
> But this was Windows, and there was no arguing with it. (We were stuck
> with Windows for unrelated reasons.)
>
> 	So the bottom line was, we had to go with TCP, because there was
> no way we could make the UDP transport that would be both fast enough
> and would work on our hardware/OS combination. And the part about
> "would work" was definitely related to an attempt to send the datgrams
> that would exceed MTU. (Datagrams smaller than MTU sucked performance-
> wise when compared to TCP, but that is another story - gigabit cards
> tend to offload plenty of TCP functionality from the CPU, so it was
> not that the UDP was particularly bad, but rather that TCP performance
> was very good.)
>
> 	Best wishes -
> 	S.Osokine.
> 	31 May 2005.
>
> -----Original Message-----
> From: p2p-hackers-bounces at zgp.org 
> [mailto:p2p-hackers-bounces at zgp.org]On
> Behalf Of David Barrett
> Sent: Tuesday, May 31, 2005 3:11 AM
> To: Peer-to-peer development.
> Subject: [p2p-hackers] MTU in the real world
>
>
> I've read in multiple places that it's best to have a UDP MTU of under
> 1500 bytes.  However, it sounds like most of this is based on
> theoretical analysis, and not on real-world experience.
>
> With this in mind, have you tried using a MTU bigger than 1500 bytes 
> and
> been bitten by it?  Basically, do you know of any emperical analysis 
> (of
> any level of formality) of a real-world UDP application that supports 
> or
> refutes the 1500 byte rule of thumb?
>
> Furthermore, I've read that if you "connect" your UDP socket to the
> remote side and then start sending large packets and backing off 
> slowly,
> the socket layer will compute the "real" MTU between two endpoints, and
> you can obtain it through "getsockopt".  Do you know of anyone who's
> tried this, and the results?
>
> -david
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers at zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
> _______________________________________________
> Here is a web page listing P2P Conferences:
> http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers at zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
> _______________________________________________
> Here is a web page listing P2P Conferences:
> http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences
_______________________________________________
p2p-hackers mailing list
p2p-hackers at zgp.org
http://zgp.org/mailman/listinfo/p2p-hackers
_______________________________________________
Here is a web page listing P2P Conferences:
http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences



More information about the P2p-hackers mailing list