MSS

You are currently browsing articles tagged MSS.

One of the truly fascinating things about networking is how much of it ‘just works’.  There are so many low level pieces of a network stack that you don’t really have to know (although you should) to be an expert at something like OSPF, BGP, or any other higher level networking protocol.  One of the ones that often gets overlooked is MTU (Maximum Transmission Unit), MSS (Maximum Segment Size) and all of the funs tuff that comes along with it.  So let’s start with the basics…

image
Here’s your average looking IP packet encapsulated in an Ethernet Header.  For the sake of conversation, I’ll assume going forward that we are referring to TCP only but I did put the UDP header length in there just for reference.  So a standard IP packet is 1500 bytes long.  There’s 20 bytes for the IP header, 20 bytes for the TCP header, leaving 1460 bytes for the data payload.  This does not include the 18 bytes of Ethernet headers and FCS that surround the IP packet.

When we look at this frame layout, we can further categorize components of the frame by MTU and MSS…

image
The MTU is defined as the maximum length of data that can be transmitted by a protocol in one instance.  What makes this slightly confusing is that the MTU does NOT include the Ethernet headers required to transmit the packet on Ethernet, it only includes the IP packet information.  That is, with regards to MTU, we always assume the Ethernet interface is already taking into account the 18 bytes of Ethernet headers (sometimes we don’t even see the FCS so its 1514 not 1518). So this is pretty straight forward, nothing too exciting here.  The majority of networks assume an MTU of 1500 bytes and everything works as expected.

MSS is slightly different because it is determined by each end of the TCP connection.  During session setup a device can specify the MSS they want to use in their SYN packet.  However, the devices technically do not need to agree on an MSS and each device can use a different MSS value.  The only requirement is that the sending device must limit the size of the data it sends to the MSS it receives from it’s peer.  MSS is generally 40 bytes less than MTU to accommodate for a 20 byte IP header and a 20 byte TCP header.

Note: If you’re using Cisco routers as your end points like I am in my lab, you should be aware that in some code versions TCP sessions generated or terminated on a router use different MSS.  For local connections (same subnet) the default MSS is 1460.  For routed connections, the default MSS was 536.  To make things more ‘normal’ I set the router MSS to 1500 with ‘ip tcp mss 1500’ in global config mode. 

So this all makes good sense right?  The devices should be smart enough to know based on their MTU what the largest TCP segment they can send should be.  Let’s look at a simple lab to prove this works…

image
So let’s generate a telnet session from Device A to Device B and see what the packets that hit the wire look like…

image
If we look at the TCP header in one of these packets, we should see the MSS is set to 1460…

image
Now if I set the MTU of the interface to 1000, we should see a MSS of 960…

image
Right, so things are working as we expect.  So let’s take a moment to talk about the difference between MTU and ‘IP MTU’ on a Cisco router.  You might have noticed that under interface configuration you can set either.  The difference is simple but sometimes not as easy to understand.  MTU sets the physical interface MTU, that is the max packet size supported by an interface.  IP MTU sets the max size of an IP packet.  So there, clear enough for you?  The problem here is that we’re all used to working with just IP packets, however, that’s not always the case.  So the real difference here is between setting the max MTU for any protocol on the interface, and setting the max IP MTU.  I like to think of MTU as being the hardware MTU and the IP MTU as being the IP packet MTU.  By default these values are the same so IP MTU never shows up in the config.  Also – this should be obvious, but the IP MTU has to be equal or less than the MTU.

So let’s make sure we’re on the same page.  Let’s lower the MTU on the interface to 1200 and try setting the IP MTU to a higher value…

image
It complains, this is expected and makes total sense.  Now let’s set the IP MTU to 1100..

image
We can see that the running config now shows the MTU and IP MTU since they are difference values.  A ‘show int fa0/0’ will show the interface (hardware) MTU of 1200 and a ‘show ip interface fa0/0’ will show the IP MTU of 1100 in this case.

So now let’s talk about why any of this matters.  If you consider a network where each link has a IP MTU of 1500, you only ever need to worry about this if you’re doing tunneling (or any other encap).

Note: MPLS is a whole different animal, I’m not covering that in this post but I’d argue that it’s a tunnel all the same. 

The most common type of tunnel we see is a GRE tunnel.  Adding a GRE header on a packet makes the frame format look like this…

image
So now we have an additional 24 bytes of headers.  Another 20 for the outer IP packet (tunnel source and destination) and 4 for the GRE header itself.  So what does this mean?  Our MTU can’t increase since we’re using the max IP MTU of 1500 to match our hardware MTU.  What happens when we try and send traffic now?  Let’s modify the lab slightly so we can use a GRE tunnel more effectively…

image So now we’re going to have a user traversing a couple of network segments to download a file via HTTP on my homes super computer (luckily for me Visio had an exact stencil of what my super computer looks like).  The catch here is that we’re going to have a GRE tunnel that goes from device A to device B which the traffic will ride within.  Pretty straight forward however, how will this all work considering what we just talked about above?  Adding in a GRE header as well as the outer (GRE) IP header is going to add an additional 24 bytes to the frame that the client and server won’t know about.  Should this work?  Let’s check it out and see…

image
You might have to blow that up to see it better, but to make a long story short, it does work.  But how or why?  Let’s walk through a couple of the packets to see what’s happening…

image
The above packet shows the initial TCP SYN.  We can see it comes from the client and is destined to the server.  We also see that it has some TCP options set which include the TCP MSS which the client is setting to 1460.  Recall that TCP MSS is generally dictated by taking the MTU and subtracting 20 bytes for the IP header and 20 bytes for the TCP header.  So this looks right so far.  Let’s look at the SYN/ACK from the server…

image
So same deal here – The server thinks that it’s MSS should be 1460.  Once the TCP session is established, the client can issue it’s HTTP GET which we see in packet 336.  Then we see a bunch of ‘TCP Previous segment not captured’ frames followed by one really interesting one (frame 342)…

image
In packet 342 – the router kicks back a ICMP unreachable message telling the client that ‘Fragmentation needed’.  If we look at that packet in more detail, we see some more interesting information…

image

Ah ha!  The router is telling the server that the max MTU it can use 1476 (1500 minus 24 for GRE/IP).  The server sees this data and caches it.  We can see this on the server (Linux in this case) by issuing the following command…

image
So now the server knows to use an MTU of 1476 when talking to this client.  With the MTU now being 1476 our math now sorts out like this…

image
So using a new MTU of 1476 forces the TCP payload (MSS) down to a lower number to fit all of the headers in making the total frame size 1500 (excluding Ethernet framing)…

image

 

 

 

So in our example so far, the end devices have been the device sorting out the issue caused my the GRE tunnel.  However, it’s not always best practice to let the endpoints do this because PMTU discovery relies on ICMP.  For instance, I can very easily break this connection by disabling ICMP unreachable messages on the interface facing the server.

Note: If you’re testing this out yourself make sure you clear out the servers route cache.  As mentioned before, it will cache the lower MTU for a period of time so it doesn’t need to do PMTU constantly.  Command – ‘ip route flush cache’.

So lets disable ICMP unreachables on the interface…

image
And now try to download the file again…

image
It fails.  Without the router telling the server there’s an issue, we can’t pass traffic.  There are a couple other ways to take care of this.  One would be to let the routers fragment the traffic by clearing the DF bit on the traffic.  I think that’s a horrible idea so I won’t even take the time to discuss it.  The second option it to use what’s referred to as ‘TCP clamping’.

TCP clamping involves having the router rewrite the TCP MSS option in the SYN SYN/ACK to another value.  So in our case, we can tell the router to adjust the MSS down to 1436 to accommodate the tunnel.   Let’s  configure it on our router interface that we also disabled ‘ip unreachables’ on…

image
Now let’s try the traffic again and see what happens…

image It’s working!  Now let’s look at what hits the wire…

So the initial packet coming off the client has an MSS of 1460 set…

image
However, if we look at the packet again after it’s traversed the router ,and is in the GRE tunnel, we can see the MSS is now what we set it to…

image
I flagged the IP ID as well so you can see we’re looking at the same packet.  We’re just analyzing the same packet at different points in the network.

As you can see, the ‘middle’ of the network has very little to do with MTU and MSS.  Like most network things, you need to ensure that you’ve accounted for any additional headers at the edge so devices in the middle don’t need to respond or do crazy things.

So to solidify this point, let’s look at one last example to make we’re all on the same page.  Take this lab topology as an example…

image
Here we’re running the traffic over GRE tunnels as the traffic traverses the segment between cloud-1 and cloud-2.  If we take a look at a packet capture taken in between the two cloud routers we should see a lot of headers…

image
Yep – So we’re doing GRE in GRE there.  So now let’s try our HTTP download again and see what happens…

image
Yeah – So we’re fragmenting.  The interesting thing here is that most hosts set the DF (do not fragment) bit in their IP packet.  However, this does NOT carry over in additional IP headers generated by the router, we can see that here…

image
So here’s the breakdown of what’s happening now…

1 – Client initiates TCP session with Super Computer, sets MSS to 1460 by default.  Device A changes MSS to 1436 before the traffic enters the tunnel.
2 – Server replies in his SYN/ACK with an MSS of 1460.  Device B changes the MSS to 1436 before the traffic enters the tunnel.
3 – The TCP session is setup, each side thinks the other side has an MSS of 1436.
4 – When data starts flowing the packet size is exactly 1500 bytes when it reaches the Cloud-2 router.  Cloud-2 knows that it has to put an additional 24 bytes of headers on the packet which puts it over the MTU for it’s interface facing Cloud-1.  It’s only option at this point is to fragment the traffic.
5 – Cloud-2 receives two packets which happen to be fragmented but it doesn’t much care (or know) and happily forwards them along.
6 – Device A receives the fragmented packets and reassembles them into a single packet

The math all works out to support this.  We can see a 1514 byte packet and a 82 byte packet being sent.  The 1514 byte packet is the max that Cloud-2 can send so it sends the rest of the data in a second packet.  The second packet consists of an IP header (20 bytes), a GRE header (4 bytes), an inner IP header (20 bytes), and 24 bytes of payload (our overflow from packet 1).  Added all together it gives you 68 and you can add in 14 bytes for the Ethernet header giving you 82 bytes total.

So how do we fix this?  Same thing we did before, just lower the edge MSS by another 24 bytes to accommodate the additional headers that cloud-1 and cloud-2 are using for their GRE tunnel.

So there you have it. That was a nice refresher for me. Hope you enjoyed as well!

Tags: , ,