Layer 2 and Layer 3 real world problem scenario and solution


(Krunal P) #1

I am new to networking, and I am just curious what are real world common and complex Layer 2 and Layer problem scenarios and what method did you approach to troubleshoot it ?
If any network engineering pro would like to share some stories from their experience would be highly appreciated ?


(Kevin W) #2

Hello Krunal,

Welcome to the wonderful world of networking! As far as methods of troubleshooting go I use the bottom-up approach. I start at the bottom (physical layer) of the OSI model and work my way up.

Here are some of the things I look for at each layer
Layer 1

  1. Do all devices involved have power and are turned on?
  2. Are all the cables connected properly?
  3. Are the proper cables being used? (Rare since we have auto-mdix)
  4. Are the cables good or bad? (Use a cable tester)

Layer 2

  1. Verify ARP tables
  2. Verify I can ping a device on the same subnet
  3. Verify the device is in proper VLAN/subnet
  4. Check CAM table to see if the proper entries exist or to see what interface a host connects to
  5. Verify ACLs (if any exist)

Layer 3

  1. Can I ping the device in question
    (this device would live on a different subnet since we are working at layer 3)
  2. Verify Dynamic routing protocols
  3. Is there a route in the routing table for the network in question?
  4. If so are there return routes back to the source?
  5. Verify ACLs
  6. Check for sub-optimal routing

This is not an extensive list and there is probably a ton of stuff I missed but this a Majority of what I look at when I am troubleshooting.

As far as the most common problems, I would say they are layer 1 problems rather than layer 2 or 3. I can’t count how many problems I have solved by simply turning on the device in question. Funny enough a common issue at layer 2 can be simple or complex, take spanning tree, for example, this protocol at its base is simple and easy to configure, but large spanning tree topologies can get very complex and difficult to troubleshoot. From my experience NEVER EVER EVER skip the simple stuff. Take your time and make sure you hit every possibility at each layer. I could have followed my own advice recently so seriously if you ever take anything away from this post it should be two things. Work bottom-up from the OSI model to solve issues, and make sure to look at all possible issues at each layer starting from simple and going to more complex issues. Also I did not go above layer 3 because you did not really ask about layers 4-7.
I hope this helps,
Scott


(Lazaros Agapides) #3

Hello Krunal

@wellerk.scott 's step by step bottom up troubleshooting process is indeed an excellent example of how to approach issues and problems you may face on a network. His advice is also very sound. Keep it in mind.

An issue I had with a layer 2 problem which frustrated me for a while had to do with some CIsco IP phones at a municipality. These phones were set up in a remote location while the Call Manager or IP PBX was set up in the central building of the municipality. The connection between the buildings was created using the city’s muinicipal area network (MAN) composed of fiber optic infrastructure. There were two models of phones in this building. All the phones of one model connected and registered successfully to the Call Manager. The other phone model did not register. There were no other network problems reported either.

It was a strange problem because it had to do with the specific models of phones, so something was happening differently on those models than the other ones. After a couple of days of troubleshooting, we found out that the voice VLAN that was connecting via the MAN, had an MTU of 1500. The MAN was using QinQ which is essentially double tagging of VLANs over trunks which used an additional 4 bytes of data in the header, decreasing the allowed MTU size of frames placed on it to 1496 bytes. Now the one model of IP phones used frames of a smaller size to perform registration, while the other used frames larger than 1496 bytes, thus causing the registration to fail. When we increased the MTU size on the MAN to accommodate these larger MTU sizes, the phones registered immediately.

It is these kinds of strange issues that come up that really test the troubleshooting skills and procedures of network professionals. Although reading about the experience of others is indeed helpful, the ultimate way to gain this skillset is just by dealing with such issues day after day. The more problems you face the better you get at solving them…

I hope this has been helpful!

Laz