2960、4500、3850等的Cisco交换机完全支持我们的网络环境。虚拟环境正在使用Citrix Xen和Vmware产品。

从几个月前开始,在Xen环境升级到7.2之后,我们面临着交换机端口err-disable的问题。

不同Cisco 2960s,2960x交换机上的几个端口不断进入err-disable模式,从而导致服务器中断。没有网卡绑定或绑定的绑定端口和访问端口都发生了这种情况。

SW-MGMT1#sh int g0/12
GigabitEthernet0/12 is down, line protocol is down (err-disabled) 
  Hardware is Gigabit Ethernet, address is 189c.5d6b.3c0c (bia 189c.5d6b.3c0c)
  Description: l-TEST1-2 10.9.12.27
  MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec, 
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Auto-duplex, Auto-speed, media type is 10/100/1000BaseTX
  input flow-control is off, output flow-control is unsupported 
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input never, output 3w3d, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 0 bits/sec, 0 packets/sec
  5 minute output rate 0 bits/sec, 0 packets/sec
     1901183 packets input, 485681442 bytes, 0 no buffer
     Received 23 broadcasts (6 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 6 multicast, 0 pause input
     0 input packets with dribble condition detected
     4523326 packets output, 1258124208 bytes, 0 underruns
     0 output errors, 0 collisions, 4 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out


SW-MGMT1#sh ver 思科公司  IOS  软件 , C2960S  软件  (C2960S-UNIVERSALK9-M), Version 12.2(55)SE7, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2013 by  思科公司  Systems, Inc.
Compiled Mon 28-Jan-13 10:28 by prod_rel_team
Image text-base: 0x00003000, data-base: 0x01B00000

ROM: Bootstrap program is Alpha board boot loader
BOOTLDR: C2960S Boot Loader (C2960S-HBOOT-M) Version 12.2(55r)SE, RELEASE SOFTWARE (fc1)

SW-TEST1-FWMGMT1 uptime is 13 weeks, 2 days, 21 hours, 31 minutes
System returned to ROM by power-on
System restarted at 15:18:10 EDT Fri Nov 3 2017
System image file is "flash:/c2960s-universalk9-mz.122-55.SE7/c2960s-universalk9-mz.122-55.SE7.bin"


This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of  思科公司  cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws.  通过  using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.

A summary of U.S. laws governing  思科公司  cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html

If you require further assistance please contact us by sending email to
[email protected]

cisco WS-C2960S-24TS-S (PowerPC) processor (revision J0) with 131072K bytes of memory.
Processor board ID FOC1647V1LN
Last reset from power-on
1 Virtual Ethernet interface
1 FastEthernet interface
26 Gigabit Ethernet interfaces
The password-recovery mechanism is enabled.

512K bytes of flash-simulated non-volatile configuration memory.
Base ethernet MAC Address       : 18:9C:5D:6B:3C:00
Motherboard assembly number     : 73-12423-09
Power supply part number        : 341-0328-03
Motherboard serial number       : FOC1647101J
Power supply serial number      : DCA1644M7WA
Model revision number           : J0
Motherboard revision number     : A0
Model number                    : WS-C2960S-24TS-S
Daughterboard assembly number   : 73-11933-04
Daughterboard serial number     : FOC16467T1G
System serial number            : FOC1647V1LN
Top Assembly Part Number        : 800-32448-04
Top Assembly Revision Number    : B0
Version ID                      : V04
CLEI Code Number                : COMGJ00ARD
Daughterboard revision number   : A0
Hardware Board Revision Number  : 0x01


Switch Ports Model              SW Version            SW Image                 
------ ----- -----              ----------            ----------               
*    1 26    WS-C2960S-24TS-S   12.2(55)SE7           C2960S-UNIVERSALK9-M     

在此SW-Mgmt 2960x上,有两个交换机端口连接到Citrix Xen服务器。另一个端口似乎很好。而且它总是g0 / 12处于err-disable状态。

根据Citrix帖子: //discussions.citrix.com/topic/391523-after-upgrade-70-to-72-load-cisco-switch-ports-shutdown-due-to-err-disabled/

“当keepalive数据包循环回到发送keepalive的端口时,就会发生环回错误。缺省情况下,交换机向所有接口发送keepalive信息。
设备可以将数据包循环回源接口,这通常是因为网络中存在逻辑环路而生成树尚未阻塞,这通常会发生。
源接口接收它发送的keepalive数据包,而交换机禁用该接口(errdisable)。
发生此消息的原因是,keepalive数据包被循环回到发送keepalive的端口:
%PM-4-ERR_DISABLE:在Gi4 / 1上检测到环回错误,使Gi4 / 1处于err禁用状态
默认情况下,Keepalive在所有接口上发送。”

如果软件(IOS或CatOS)在端口上检测到错误情况,则可以以禁用错误的方式结束交换机端口。如果为错误情况指定了恢复计时器,则将端口有效关闭,直到手动或自动重新启用为止。

一种此类错误情况是由端口上存在环回导致的。交换机向所有接口发送keepalive数据包。如果在从其发送的同一接口上接收到一个保持活动的数据包,则存在一个尚未被阻止的循环 生成树协议。如果发生这种情况,将生成这些消息

发生这种情况时,日志显示检测到环回错误。


000305: Nov  5 21:23:59.177 EST: %ETHCNTR-3-LOOP_BACK_DETECTED: Loop-back detected  上  GigabitEthernet0/12.

000306: Nov  5 21:23:59.177 EST: %PM-4-ERR_DISABLE: loopback error detected  上  Gi0/12, putting Gi0/12 in err-disable state

000307: Nov  5 21:24:00.179 EST: %LINEPROTO-5-UPDOWN: Line protocol  上  Interface GigabitEthernet0/12, changed state to down

000308: Nov  5 21:24:01.186 EST: %LINK-3-UPDOWN: Interface GigabitEthernet0/12, changed state to down

解决方案:

1.快速/临时修复
快速修复很容易,只需关闭界面即可,无需再次关闭。


SW-MGMT(config)#int g0/12
SW-MGMT(config-if)#shu
SW-MGMT(config-if)#
SW-MGMT(config-if)#no shu
SW-MGMT(config-if)#


005014: Feb  5 11:55:13.089 EST: %PARSER-5-CFGLOG_LOGGEDCMD: User:admin  logged command:interface GigabitEthernet0/12 
005015: Feb  5 11:55:18.594 EST: %PARSER-5-CFGLOG_LOGGEDCMD: User:admin  logged command:shutdown 
005016: Feb  5 11:55:19.863 EST: %PARSER-5-CFGLOG_LOGGEDCMD: User:admin  logged command:no shutdown 
005018: Feb  5 11:55:24.980 EST: %LINK-3-UPDOWN: Interface GigabitEthernet0/12, changed state to up
005019: Feb  5 11:55:25.982 EST: %LINEPROTO-5-UPDOWN: Line protocol  上  Interface GigabitEthernet0/12, changed state to up


SW-MGMT#sh int g0/12
GigabitEthernet0/12 is up, line protocol is up (connected) 
  Hardware is Gigabit Ethernet, address is 189c.5d6b.3c0c (bia 189c.5d6b.3c0c)
  Description: l-TEST1-2 10.9.12.27
  MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec, 
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
  input flow-control is off, output flow-control is unsupported 
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input never, output 00:00:00, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 1000 bits/sec, 1 packets/sec
  5 minute output rate 1000 bits/sec, 1 packets/sec
     1901190 packets input, 485682616 bytes, 0 no buffer
     Received 24 broadcasts (6 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 6 multicast, 0 pause input
     0 input packets with dribble condition detected
     4523338 packets output, 1258126740 bytes, 0 underruns
     0 output errors, 0 collisions, 5 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out

2.永久修复 
到目前为止,还没有来自Citrix的官方补丁/修复程序。对于思科解决方案,有两种不同的解决方法:

2.1在特定接口上禁用Keepalive

发出  没有keepalive  interface命令为了禁用那些接口上的keepalive数据包。禁用keepalive可以防止接口的errdisable,但不会删除循环。问题是您将需要收集以前发生问题的那些端口。

2.2一旦发生此类禁用错误的问题,请启用自动恢复
这样,您将无需知道哪个端口出了问题。这是交换机上的全局命令,会影响所有端口。

errdisable recovery interval 600
errdisable recovery cause link-flap
errdisable recovery cause udld
errdisable recovery cause bpduguard
errdisable recovery cause loopback
errdisable recovery cause psecure-violation
errdisable recovery cause dcbx-error
errdisable recovery cause pause-rate-limit
errdisable recovery cause inline-power

通过 约翰

发表评论