MTU Troubleshooting on Cisco IOS

Thx for your reply, Actually i know all of these info, I will ask my Q’s in another way
could you please answer each one independently ?

during the handshake, for example both agreed on mss 9860 ( Header =40 bytes )

Q1:How did they agreed on 9860? what are the criteria that they took? is it just the interface mtu in all the path and the header? or there is somethings to be considered also?
i know that mss= mtu-header.

Q2: is it mandatory that the packet should be 9000 in our case ( if the header =40)? or it may be less and this depending on the payload size?
i mean if the payload is small, will the segment be small as well in our case? or it is mandatory to be as aggreged ( fix size ) always.

Q3:
Q4:When my payload is just 1460 and the mss is 9860, will the padding take place here mandatory?

Hello Ali

Sure, no problem, I can respond to each question specifically.

The MSS is determined primarily based on the MTUs of the NICs of the sender and receiver. This process is further described in this NetworkLessons note. However, as mentioned in the note, the MSS can be influenced by using the TCP adjust-mss feature, where the MSS value in a SYN packet can be replaced with another value by an intervening router. So if a TCP sender and receiver agree upon 8960 bytes as the MSS, that means that their NICs support an MTU size of 9000. But this does not guarantee that the intervening network infrastructure supports this. The network designer must ensure that this is the case.

No it is not mandatory. The MSS defines only the maximum. The segment size can actually be anything up to the MSS value. So you don’t even need padding, you can have segments of varying sizes without any problem.

No there will be no padding necessary. The segment size of that particular transmission will be limited to the payload you have. Again, the MSS only defines the maximum size, not the size of all segments. Does that make sense?

I hope this has been helpful!

Laz

clear, thx
but recently i faced issue which is strange for me, i thought that maybe i understood something worng, anyway, simply i have network with 9000 mtu in all the path ( Spine, Leaf and FW), but from server side 1500, my edge FW is Mikrotik and each server can contact other servers normally, when i shifted the service to the new FW which is Fortigate ( I have experience in Security), SSH and some services freezes between Servers, but they can reach internet normally, again all the path 9000, just servers 1500, Freeze issue mostly related to MTU but as far as i know it should be no issue as long as they will communicate the mss, but when we changed the MTU from server side to 9000, All pass and cleared with no issue, what do u think the issue? i couldn’t perform packet capture due to the time, because we had production services during the operation.

1500 pass with DF from server to the FW and old FW ( Mikrotik ) as well.

Hello Ali

The first question that comes to mind is, are you sure that the problem is MTU related? You mention that the freeze issue is mostly related to MTU. The first thing to do is to determine conclusively that this is the case. Some ways you can do this is to use a ping sweep of various MTU sizes from one server to the other, to see if you do get a drop after a certain size.

This is the most obvious quick and dirty test you can do, but you may not be able to if your firewall drops ICMP (ping) packets. If this is the case, and you have control over the FW, during a maintenance window, you can temporarily allow ICMP to traverse the firewall to do that test.

Another thing that comes to mind with your description is the Path MTU Discovery (PMTUD) feature. This is a mechanism used by endpoints in a network to determine the largest MTU size that can be used for data transmission between them. If the network path between two endpoints supports an MTU size of 9000, but one endpoint is set to an MTU size of 1500, the smaller MTU size will be used for data transmission.

The problem arises when PMTUD fails, which can happen if a network device (like your Fortigate firewall) in the path between the endpoints is blocking ICMP messages. PMTUD uses ICMP to perform its function, and it would fail resulting in intermittent connectivity. Since in your case it is only happening using SSH (as far as you can see), it may be that SSH is using PMTUD and the FW is blocking ICMP packet with the results you see.

These are just some thoughts that may help you out in your troubleshooting. Doing a packet capture will give you even more information, so that will be helpful when you get a chance to do it.

Let us know how you get along, and how you ultimately got to the bottom of the problem and solved it.

I hope this has been helpful!

Laz

Hello Rene/Laz,

In this lesson if we ping R1 without setting Df bit to set we will get reply to ping packet packet up to 1505 this is because R2 egress interface will fragment packet so the maximum packet that will leave R2 will have ethernet payload of 1500 bytes and R1 has MTU of 1500 bytes everything will work since R1 will see maximum ethernet payload of 1500 bytes.

Reply to request 0 (1 ms) (size 1495)
Reply to request 1 (4 ms) (size 1496)
Reply to request 2 (1 ms) (size 1497)
Reply to request 3 (4 ms) (size 1498)
Reply to request 4 (1 ms) (size 1499)
Reply to request 5 (4 ms) (size 1500)
Request 6 timed out (size 1501)
Request 7 timed out (size 1502)
Request 8 timed out (size 1503)
Request 9 timed out (size 1504)
Request 10 timed out (size 1505)

with df bit set packet above 1500 bytes will not leave R2 because its own interface mtu is 1500 bytes and df is set. This understanding is based on the concept I tried between two windows machine. for example If I change one windows host mtu to 1000 bytes and send ping packet to other windows host with ping size 2000 bytes with df bit set it would not leave the nic and prompt will show fragmentation needed message. when sent without df bit set pc will fragment the packet and the ethernet frame payload that will leave the nic will be less than or equal to mtu configured on the nic [1000 byes for example]

Let’s consider the below scenario on the same topology you have used in the lesson to understand how receiving router R1 will behave.

if we change the MTU on R2 to say 2000 bytes and keep 1500 bytes on R1.

will ping from R2 to R1 with size 1800 bytes without df bit set will succeed ?
will ping from R2 to R1 with size 1800 bytes with df bit set will succeed ?

I am guessing without df bit set R2 still will not need to fragment the packet because it has mtu of 2000 byte .With Df bit set R2 cannot do fragmentation so in both cases R1 will see a ethernet frame that would have payload of 1800 bytes.

How R1 with mtu of 1500 bytes would react in two scenario.

R2 is sending ping with 1800 bytes with df bit set >>> will ping successful ? I think no.
R2 is sending ping with 1800 bytes without df bit set >>> will ping successful? I guess yes.

In both cases R1 is seeing ethernet frame from R2 with payload that is exceeding R1 mtu but how it responds will be different in my opinion.

Kindly share your thoughts if my understanding is correct.

Hello Muhammad

This is an interesting question. I went into the lab and tried a few scenarios and packet captures using CML. I found the following:

Both scenarios seem to fail. I thought the reason was because of the IP MTU setting which has a maximum size of 1500 on the CML device I’m using. So I changed your scenario a bit.

I set the MTU on Fa0/0 of R1 to 1000 and the MTU of Fa0/0 of R2 to 1300. I kept the IP MTU at 1500 on both.

I tried pinging from R2 to R1 using a size of 1100 without the DF bit set, and it still failed. I was only able to ping up to a size of 1010. From 1011 and higher it failed. I did a packet capture and found that as long as the DF bit was not set, I could send out echo requests of any size, and for sizes over 1000 fragmentation took place. I tried 1000, 1200, 1500, even 1700 which is above the IP MTU, and all of them successfully sent out an echo request. However, the receiving router seems to fail to respond for any size larger than 1010 (with these settings).

However, with the DF bit set, I am only able to ping up to a size of 1000. For a size of 1001 I am unable to. The packet captures show that the echo requests are sent but the replies are not received.

I believe this has to do with how ping interprets the size and the DF bit setting for the echo reply. I believed the size and the DF bit settings should be the same for both request and reply, but this may not be true. It could also have to do with CML as an emulator. It would be interesting to see this on real devices as well…

I will ask Rene to take a look as well and see what he can interpret from these results.

I hope this has been helpful!

Laz

Hello Laz,
can you please trying doing the original scenario on real router ?

I performed test in packet tracer. In packet tracer there is no option to set Df bit so without setting DF bit I was able to successfully ping 1300/1500 size from router with mtu of 1500 to destination router with mtu of 1000. Furthermore, I was able to ping a neighboring router which was connected to a destination router (R3 in diagram below) on the other end that means without df bit set router with lower mtu was accepting packet regardless whether the packet is destinated for its interface or it is destined remote router it is successfully passing the ping packet.

R 1500 mtu >>>>>>>>>> R 1000 MTU>>>>>>>>>>>>>>> R3 15000 MTU

I also tested the MTU concept between two windows devices and the results were interesting.
I reduced MTU on destination host to 1000 and kept 1500 on source host.

In both cases df bit set and unset I was getting reply for ping packet those were exceeding the destination host mtu for example 1200, 1300 1400 1500.
It is important to note that in ping response packet destination host will not put any thing on wire that is greater than it’s own mtu . so ethernet frame payload from destination will be less than or equal to 1000 bytes and it would do fragmentation. Destination is not dropping ping packet.

I wanted to see how real router would behave since ping is successful in packet tracer without df bit set. In your test in CML even without DF bit set it is not successful.

you do not need to change the IP mtu once you reduce the mtu, IP mtu will automatically change and in real world IP mtu should be equal to or less than layer 2 MTU.

Thanks

Hello Muhammad

Thanks for the additional information, all of it is quite useful. The results you got from packet tracer are as expected as you suggest, since regardless of the MTU settings, if the DF bit is not set, fragmentation takes place and the ping is successful.

My CML results are somewhat strange, so wouldn’t put too much emphasis on them. After chatting with Rene, a couple of 4331 routers were set up to do some tests. For Cisco IOS the following has been confirmed:

  1. When you set the ping size, the size is the same for both echo request and echo reply.
  2. When you set the DF bit, both the echo request and the echo reply have the DF bit set.

So knowing this, the original tests you suggested were performed:

Doing the original tests that you requested, we found the following results:

  1. When pinging with the DF bit set, the ping is successfully sent because it fits in the L2 MTU of the egress interface. However, R1 will drop the frame upon ingress because its L2 MTU is 1500 bytes.
  2. When pinging without the DF bit set, the echo request is sent as a whole frame just like in the first case, but again, R1 will still drop the packet upon ingress because the L2 MTU is too large.

The idea here is that even if the DF bit is not set, you can’t fragment an incoming frame. you can only fragment it in an outgoing direction. Does that make sense?

It is possible that other operating systems like Windows or Linux behave differently, but Cisco IOS 4331 routers drop it. You can find more detailed information in this NetworkLessons note that details the test.

I hope this has been helpful!

Laz

When I perform a ping sweep on a GRE over IPSec tunnel with the DF bit set to prevent fragmentation, the sweep appears to only hit the GRE tunnel interface. This results in an MTU calculation (1476) that only reflects the GRE tunnel MTU and not the combined - nested tunnel MTU.

Is there some additional configuration that will enable the ping sweep to see the combination MTU? Or is this something we simply have to live with because of the ESP encapsulation?

That’s an interesting experiment, and it touches upon the nuances of how GRE and IPsec interact. Indeed, the effective MTU for GRE over IPsec is usually in the range of 1432–1440 bytes, depending on the algorithms in use. So why do we see a “real” or “observed” MTU of 1476 on a ping sweep in such a scenario?

When GRE is combined with IPsec, either transport mode or tunnel mode can be used. Assuming ESP is used:

  • In transport mode, the ESP header is inserted between the existing IP header and the GRE header (not the TCP header as normal, since the payload is GRE).
  • In tunnel mode, IPsec adds a completely new outer IP header.

More info on how this works can be found in this section of the IPsec lesson.

For transport mode, when ESP is applied, the DF bit on the outer IP header is typically cleared by default on IOS/IOS-XE. For tunnel mode, the newly added outer IP header doesn’t inherit DF from the inner header. In both cases, this allows fragmentation to occur without violating the DF setting of the original packet.

It’s also important to note that GRE encapsulation happens first, and only then is IPsec applied. This means your ping sweep is testing the GRE MTU, not the GRE-over-IPsec MTU. Since IPsec (ESP) clears DF after GRE encapsulation, fragmentation can occur after encryption, so no ICMP “fragmentation needed” is sent back to GRE.

Put another way: with GRE over IPsec (usually in transport mode), ESP is added after GRE. If IPsec clears DF (as is the default in many IOS/IOS-XE builds), ESP packets can be fragmented post-encryption. That’s why your sweep only reflects the GRE MTU and doesn’t show the additional IPsec overhead. Does that make sense?

I hope this has been helpful!

Laz

1 Like

Lazarus - thank you for that excellent explanation. I went on a wild goose chase trying to find a description of this behavior on the web - perhaps the AI didn’t understand my formulation of the problem…

How on Earth do you come across these details? I will go back to CML to perform the packet captures again on the GRE IPSec tunnel to see if I can observe that DF bit reset you described.

Sandro

Hello Sandro

I’m glad the explanation was helpful! When you get deep into the behavior of these technologies, and especially how they interact with each other, you can’t always find documentation that describes it thoroughly. Much of it comes from experience, which you have just gained by digging deeper and by labbing it up! I’d be interested to hear about the results of your experimentation on CML!

I hope this has been helpful!

Laz

1 Like

This came to my attention over the weekend - there is an interface-mode-command that can be placed under the GRE Tu 0 interface to solve the F bit “hand-off” problem.

The command is as follows - tunnel path-mtu-discovery

The snapshot shows a ping sweep performed on a GRE-o-IPSec tunnel configured in transport mode. The reported MTU is now 1434 as opposed to 1476 without the presence of this command.

Hello Sandro

Yes, that’s exactly it! One of the results of that command is to ensure that the DF bit is not cleared in the IP header of the transport mode IPsec transmission. The interesting thing here is that the actual purpose of the command is not to simply preserve the DF bit . The behavior of the DF bit is actually a side effect of the command.

This command is used to allow a GRE-over-IPsec tunnel to use the PMTUD feature. For more info about PMTD, take a look at this NetworkLessons note. The PMTUD feature requires that the DF bit be set, otherwise it will not function!

So although the command does result in not clearing the DF bit, it is a result of enabling the PMTUD feature as a whole. Make sense?

I hope this has been helpful!

Laz

1 Like

Yes absolutely. Thanks for that explanation and the resource in the note.

1 Like

This is a silly question, but I’m having some difficulties trying to understand this:
If I have a size packet of 15000 bytes then fragmentation will occur because max MTU is 1500, right? With 1500 bytes in MTU I will have 1460 in TCP-MSS.
With this numbers I have to fragment my packet 10 times.
So, the question is…”If I reduce my TCP-MSS I’m not increasing fragmentation?

I think I have a misunderstanding with fragmentation and dividing the packet or something similar to it

Hello Daniel

This is not a silly question, but an opportunity! I have found that I still sometimes need to get my head around these MTU concepts to clarify them, so this is just another chance to do that!

I think the key confusion here is mixing up TCP segmentation (which is good) with IP fragmentation (which we want to avoid). Let’s break it down.

The MTU = 1500 bytes on standard Ethernet. This is the maximum size of the entire IP packet (headers + data) that is encapsulated within the frame. This is used during encapsulation from L3 into L2.

The TCP MSS = MTU - IP header - TCP header which for standard Ethernet is 1500 - 20 - 20 = 1460 bytes. This is the maximum amount of application data in one TCP segment. This is used at L4 for the process of segmentation.

Now, for your 15,000-byte example, when you send 15,000 bytes of data with an MSS of 1460, TCP divides or segments 15,000 bytes into approximately 11 segments (15,000 ÷ 1,460 ≈ 10.27). Each segment is encapsulated into one IP packet with 1460 data + 20 TCP header + 20 IP header = 1500 bytes. The result is 11 separate packets, but zero fragmentation!

Now let’s look at your core question: “If I reduce my TCP-MSS I’m not increasing fragmentation?”

That is correct! Reducing TCP MSS actually prevents fragmentation, not increases it. TCP segmentation at L4 is good. It happens at the source host before sending. TCP intelligently divides application data into MSS-sized chunks and it creates multiple complete, independent packets.

IP Fragmentation at L3 is bad. It occurs at intermediate routers when an IP packet exceeds the link MTU. A router must break one packet into multiple fragments, which causes performance problems.

When you reduce MSS, it makes the underlying IP packets smaller, thus reducing the chance of fragmentation. Indeed, TCP already respects the MSS before creating packets, so fragmentation never happens in normal TCP communication unless something changes the path MTU. Make sense?

I hope this has been helpful!

Laz

1 Like

Thank you so much as always

1 Like