I’ve been around in the networking field for 2 years from now and I always struggle when someone is complaining about general network performance issues.
I have an example for you:
A Branch has complained that they experience pretty bad performances when copying medium to large files from the local PC to a fileshare.
I reviewed and saw that they are all connected using FastEthernet so we decided to test our new product (Cisco 9400) at this site. (Obviously oversized)
I thought this should solve all the problems but they are still complaining about performance issues. I saw that some clients still negotiate to 100Mbit Full and I saw some CRC and Input Errors but the largest number is Output Drops.
Someone that has negotiated to 1Gbit Full also has complains but I think this is a client problem.
Every time I go through such a case I think like I don’t have a good roadmap to follow in case of such problems … I almost think I’m not structured in my troubleshooting process.
Can someone help me to create a roadmap to follow in such cases?
Yes, this is usually one of the most difficult problems to diagnose and resolve. It’s not always a matter of sheer speed. Here are a few general principles to keep in mind when troubleshooting such issues:
- Users are subjective. First of all, make sure that the problem being described is actually a problem. Users can be somewhat ambiguous when describing performance. Compare experiences of multiple users at a site, and if possible, take a look for yourself and see first hand what is being experienced.
- Once you’ve verified that there is indeed a problem with network performance, check, as you did, that computers are connecting at the best possible speeds (1 Gig whenever possible) and that speed and duplex settings are correct. Check for CRC errors and packet drops, once again, as you did. If you have errors on the interface, check cables, faulty QoS mechanisms, as well as the possibility of a faulty port or NIC on the PC.
- If you have routers in the mix, do some ping and traceroute tests with various sizes in order to see response times as well as to check for optimal routing paths.
- With your ping tests, check to see what kind of MTU sizes are being allowed between source and destination. If there is a smaller MTU somewhere and packets are being fragmented, this could slow down your network. If packets are set not to be fragmented, then this could result in packet loss.
- Specifically for your issue, if you are using SMB for file sharing, try setting up an FTP server on the file server and do a test transfer of a large file from a client computer to the server. If the transfer goes smoothly, then the network may not be at fault, but some configuration of the file sharing server software setup.
- If users are connecting at 100 Mbps, it could be that this speed is just not enough. Consider upgrading to gigabit ethernet NICs. Try upgrading one and test the results to see if this affects the overall performance.
These are just some thoughts that will hopefully get your creative juices flowing. Troubleshooting such issues all comes down to experience. If you’ve seen something like it before, you have a deeper understanding and your mind approaches the solution faster. It does take time though. Problems that result in a complete disconnection of the network might sound more devastating, but are definitely easier to diagnose and solve than issues like the one you describe. But not to worry, with time, a lot of reading, and personal experience, it does get easier and more intuitive.
Some lessons that might help you in your quest include the following:
I hope this has been helpful!
Thank you so much spending your time to give me some advice.
Seems like you have a lot of experience in troubleshooting those issues.
This is exactly what I needed, can’t wait to receive the next call about performance issues
I will definetly look at those lessons!
Is the file share that the branch users are copying to located across a WAN circuit or is it local? If it’s across a circuit, then you may need to traffic shape for the circuit speed. Also, even if you do traffic shape, understand that your bottleneck would be the WAN and so long as your copying is via TCP, then the end programs will eventually adjust to whatever max throughput they can get away with, even without traffic shaping.
In the end, when I’ve had to troubleshoot this kind of issue, I’ve had to take packet captures to see what was causing congestion, then setup QoS to prioritize traffic to the branchs - it took weeks to get it right.
The Fileshare is located at the SAN and this building is connected to our datacenter with 10Gbit/s single mode directly to the backbone but you made a good point when troubleshooting Branches that are connected over a WAN circuit. Capturing Data to see which one causes congestion is a good idea, this will lead me learning more about qos and it’s features.