Hi evryone;
I just want ask if someone know which tools i can use to diagnose slowness in my network? Thanks in advance for your answers.
Hi evryone;
I just want ask if someone know which tools i can use to diagnose slowness in my network? Thanks in advance for your answers.
Hello Daoud
Troubleshooting a slow network can be tricky. There is no single process to follow since a slow network can be due to many different things. A slow network can be due to:
There are several strategies you can use to diagnose these including:
Having a network monitoring system is critical for such situations. It’s not easy to troubleshoot such problems using the CLI, beyond very basic diagnosis tasks. Using monitoring systems increase visibility, and will warn you whenever any thresholds you set, such as the upper limit of allowed latency, are surpassed.
If you’d like us to go over something more specific, please let us know.
I hope this has been helpful!
Laz
Thank you for your feedback lagapides.
Let me explain the problem I am having
I have 8 ESX servers that are connected to a NESUS 5000 switch via FEX modules. I have a nagios supervisor that monitors the ESX servers. This nagios supervisor returns “ERROR CRC” errors for all 8 esx servers. But when I check on the interfaces of the NESUS switch on which the 8 esx are connected I don’t see any anomaly. No CRC errors on the interfaces. Hence I don’t know where these errors come from. And there is a slowness to send data to the ESX servers. Do you have any idea how to solve CRC errors? or how to identify its cause? Thanks in advance for your feedback
Hello Daoud
CRC errors on your network can definitely be a cause of a slow or sluggish network, so seeing those on Nagios does give a reasonable cause to your slow network. However, what is Nagios actually reporting? Is it monitoring the servers or the switch, or both? And where does the ERROR CRC take place, on the switch or on the ESX server?
If all eight servers are returning the same errors, you should take a look and see what commonalities there are between them. Could it be a common setting on their NICs? Dig a little deeper by looking at the stats on the ESX NICs as well.
In addition, you could take a look at the way the FEX is handling traffic. There are some cases where CRC errors are logged but are not counted in the expected interface. This has to do with Nexus 5K switching packets before CRC is being checked, so the actual CRC errors may be marked on another interface.
Take a look at this Cisco community thread which may shed some more light on your issue.
Keep us posted about your troubleshooting is going and how you’re getting along.
I hope this has been helpful!
Laz
Once again, thank you for your feedback. I was a bit busy with other challenges otherwise I would have already answered your answer. I will answer all your questions
1- Nagios displays CRC errors
2- Nagios only monitors the ESX server interfaces
3- On the NEXUS switch level, when I check, I don’t see any error, everything looks fine
4 on the other hand, at the ESX level, the checking shows me errors. Here is the result of the checking on one of the interfaces
[root:~] esxcli network nic stats get -n vmnic1
NIC statistics for vmnic1
Packets received: 810661
Packets sent: 50367
Bytes received: 37641693004
Bytes sent: 165401052
Receive packets dropped: 0
Transmit packets dropped: 5
Multicast packets received: 189214093
Broadcast packets received: 214728114
Multicast packets sent: 711173
Broadcast packets sent: 537
Total receive errors: 242025
Receive length errors: 0
Receive over errors: 0
Receive CRC errors: 242025
Receive frame errors: 0
Receive FIFO errors: 0
Receive missed errors: 0
Total transmit errors: 0
Transmit aborted errors: 0
Transmit carrier errors: 0
Transmit FIFO errors: 0
Transmit heartbeat errors: 0
Transmit window errors: 0
So I think that the problem may be on the side of ESX interfaces and not FEX modules.
Do you think for example if an update of the ios of the NEXUS can improve something?
Thanks in advance for your feedback
Sincerely
Hello Daoud
Thanks for the additional information, it looks like the issue is on the hardware NIC of the ESX server as you suggested. Based on the output you shared, the packets received are 810661 and those with CRC errors are 242025, which is more than 25%. This is significant. Since you see similar behavior on all 8 ESX server NICs, it is unlikely that it is a cabling issue. Since you don’t see any errors on the Nexus devices, then it’s not a problem with corrupted packets that may be exiting the Nexus devices.
Based on a post at this site, I suggest you do the following:
rx_crc_errors are caused either by faults in layer 1, or issues with jumbo frames on the network. If that packet has an MTU over what is configured on the interface, it will cut off the packet at the designated MTU, causing the server to receive a malformed packet, which will throw a CRC error.
For more info about MTUs in networking, take a look at this lesson:
I hope this has been helpful!
Laz