Hi all,
we have a medium-sized network with about 150 switches (many VLANs) and we have experienced a major outage yesterday. Obviously due to some failed network device we got mac-flapping on one Vlan and within a few seconds almost the whole network went down because of many of the switches uplinks (configured as trunks) had been err-disabled. Now my question is: There are many err-disable recovery options in IOS but I cannot find out if one of those will address the issue we have experienced. To me it seems none of them does. Can anyone confirm this or teach me better?
Thanks
Daniel
Hi Daniel,
A big flat L2 network is a big risk. When something happens, it can bring the entire network down like you witnessed.
Which exact error message put your interfaces in errdisable mode?
The “%SW_MATM-4-MACFLAP_NOTIF” error is notified through syslog but it doesn’t put the interfaces in err-disable mode.
Rene
Hi Rene,
thanks for your reply. That’s what I thought also, but the only message in the switches logs is the MACFlap, i. e. “Host f0de.f181.119c in vlan 1 is flapping between port Te1/0/1 and port Gi2/0/16” and the uplinks got err-disabled. I had to go to each switch and manually do a shutdown and no shutdown which took me several hours. The question is not so much why it happened but more if it is possible to recover from it automaticallyl. But of course I understand when the cause cannot be determined you cannot tell how to recover from it. 
Daniel
Hi Daniel,
I wasn’t sure so I did a little experiment last night
One switch, two hosts with the same MAC/IP address sending pings to some IP address. This produces the “%SW_MATM-4-MACFLAP_NOTIF” error through syslog non-stop but it doesn’t cause an interface to go into err-disable mode. I left it running for hours 
Do you use an external syslog server? The local log of your switches (show logging) probably got swamped with MAC flapping messages so that’s probably why you don’t see the logging line why the interfaces went in errdisable mode.
You can enable autorecovery but you will need to know the exact reason why they went in errdisable. It has to be something from this list:
Switch#show errdisable recovery
ErrDisable Reason Timer Status
----------------- --------------
arp-inspection Disabled
bpduguard Disabled
channel-misconfig (STP) Disabled
dhcp-rate-limit Disabled
dtp-flap Disabled
gbic-invalid Disabled
inline-power Disabled
l2ptguard Disabled
link-flap Disabled
mac-limit Disabled
link-monitor-failure Disabled
loopback Disabled
oam-remote-failure Disabled
pagp-flap Disabled
port-mode-failure Disabled
pppoe-ia-rate-limit Disabled
psecure-violation Disabled
security-violation Disabled
sfp-config-mismatch Disabled
storm-control Disabled
udld Disabled
unicast-flood Disabled
vmps Disabled
psp Disabled
dual-active-recovery Disabled
evc-lite input mapping fa Disabled
Recovery command: "clear Disabled
If you don’t use logging, setup an external logging server so you can see it next time
Or enable some of these autorecovery features beforehand 
Rene